Remove Joomla PDFs from Google and Yahoo search results |
| Tuesday, 24 June 2008 | ||||
|
As you already know, Joomla has a built-in PDF generator. The problem with PDF's is that sometimes Google places the PDFs in search results(SERPs) instead of the original Joomla HTML content article. Somehow, the PDFs are more optimized than the HTML, probably because their keyword density is higher, and they don't include the navigation and modules usually found on a Joomla HTML page. When visitors search google and find the PDF instead of the article, you may lose them, because they have no navigation menu, no site search, and so on. They just get annoyed waiting for Adobe's reader browser plugin to load. The solution is simple, you need to alter your robots.txt (found in site root) and add these lines to prevent PDF's from being crawled and included in Google's index Joomla 1.0.x, with or without SEF:User-agent: Googlebot
Disallow: /index2.php?option=com_content&do_pdf=1*Here are another 2 lines to block Yahoo Slurp crawler from indexing Joomla generated PDFs User-agent: Slurp
Disallow: /index2.php?option=com_content&do_pdf=1*
Joomla 1.5.x, with or without SEF:We're also disabling the print version indexing, anf the "mail to friend" window User-agent: Googlebot Disallow: /index.php?view=article*&format=pdf Disallow: /index.php?view=article*&print=1* Disallow: /index.php?option=com_mailto* Disallow: /component/mailto/* User-agent: Slurp Disallow: /index.php?view=article*&format=pdf Disallow: /index.php?view=article*&print=1* Disallow: /index.php?option=com_mailto* Disallow: /component/mailto/*
If using a third party SEF extension, you need to identify the url part that only appears when a pdf/print version is required, and enclose it in asterisks(*) . The robots.txt line that needs to be added will look like this:
Disallow: /*/pdf_string/*
Google webmasters provides a robots.txt testing tool, so you can check your robots.txt against URLs to make sure your setup works as intende. Google/Yahoo allow wildcard matches in robots.txt, while other search engine robots may not. This technique will yeld its results when Google reindexes your site. Resources: Google Webmaster help center I don't want to list every file that I want to block. Can I use pattern matching?
Good article. I see a lot of advice to simply turn off PDF and printing icons to improve SEO. Using robots.txt is really the way to go. Also thanks for the links to the Google testing tool! Write Comment |
||||
| Last Updated ( Thursday, 18 September 2008 ) | ||||
Joomla stuff
Newsletter
Auto tags
joomla robots.txt
joomla pdf generator
joomla pdf google
joomla robots
joomla disable pdf
joomla google pdf
joomla robots nofollow
joomla
robots.txt joomla
joomla robot.txt
joomla robots pdf
joomla google index pdf
robots.txt for joomla
disable pdf in joomla
google joomla pdf
disable joomla pdf
robot.txt joomla
joomla pdf
joomla pdf search
disable pdf joomla
joomla robots txt
pdf generator joomla








