jeudi 11 décembre 2014

Exclude Metadada of PDF Files From Crawling



I need to know if it is possible to exclude PDF files from being crawled.


I've found this link that suggests exclusion rule like so:



http://www.contoso.com/*.pdf



It worked. Search results don't include links to PDF files. But this is is not enough, sinse Display, New and Edit forms related to PDF files still get indexed:



The main problem we get now is that crawling takes more than 60 hours to finish and 90% of it is PDF files that noone ever searches for. This is why a solution involving search scopes will not fix the problem in this case. We need to somehow exclude PDF-related forms from being crawled in the first place.


I was thinking about changing all forms for PDF-content type to "PdfFDispForm", "PdfEditForm" and "PdfNewForm". Then I could create an exclusion rule:



http://ift.tt/1BdH0zn


http://ift.tt/1yTqfu5



But this is a paintful solution since the site has many site collections and webs.


I would be very grateful if someone can suggest some neat solution.








0 commentaires:

Enregistrer un commentaire