Remove Joomla PDFs from Google and Yahoo search results

Tuesday, 24 June 2008

As you already know, Joomla has a built-in PDF generator. The problem with PDF's is that sometimes Google places the PDFs in search results instead of the original Joomla HTML content article. Somehow, the PDFs are more optimized than the HTML, probably because their keyword density is higher, and they don't include the navigation and modules usually found on a Joomla HTML page.

When visitors search google and find the PDF instyead of the article, you may lose them, because they have no navigation menu, nosite search, and so on. They just get annoyed waiting for Adobe's reader browser plugin to load.

The solution is simple, you need to alter your robots.txt (found in site root) and add these 2 lines to prevent PDF's from being crawled and included in Google's index  

User-agent: Googlebot
Disallow: /index2.php?option=com_content&do_pdf=1*

Here are another 2 lines to block  Yahoo Slurp crawler from indexing Joomla generated PDFs

User-agent: Slurp
Disallow: /index2.php?option=com_content&do_pdf=1*

Google/Yahoo allow wildcard matches in robots.txt, while other search engine robots may not.

This technique will yeld its results when Google reindexes your site.


Google Webmaster help center  I don't want to list every file that I want to block. Can I use pattern matching?

Yahoo robots.txt guide 

Add this page to your favorite Social Bookmarking websites
Reddit!! StumbleUpon! Yahoo! Swik!

Be first to comment this article
RSS comments

Write Comment
  • Please keep the topic of messages relevant to the subject of the article.
  • Please don't use comments to plug your web site. Links are rel='nofollow'-ed
  • Please refresh the page if you're having trouble with the security image code

:) :grin ;) 8) :p
:roll :eek :upset :zzz :sigh
:? :cry :( :x
Code:* Code

Powered by AkoComment Tweaked Special Edition v.1.4.2

Last Updated ( Wednesday, 25 June 2008 )


Subscribe to TeachMeJoomla's newsletter

Joomla books

Auto tags


joomla pdf generator

block pdf from Google

how to block pdfs from google

joomla index2.php security

joomla google indexes pdf

robots.txt block pdfs

remove pdf from joomla

code into joomla google search

html page to pdf in joomla

add pdf joomla