Recently a tweet by Peter Martin made me look a bit deeper into the robots.txt file. It showed a screenprint of a presentation he followed where Google clearly stated the following:
Note: Since July 2014 the lines for /templates/ and /media/ have been removed from robots.txt, so this article mostly applies to sites installed before that date (thanks to Sander Potjer for the tip). However, you may have more resources that should be unblocked, especially if you use extensions that combine your resources (JCH-Optimize, JBetolo, Scriptmerge, etc.)
What does this mean for Joomla?
Before I realized that for new sites, robots.txt has already been updated, I wanted to find out what this means for existing Joomla sites and see if I could improve my SEO. The tweet by Peter would suggest that robots.txt should not block the /templates/ folder, as it did by default for new installations (see the related checklist-article that I wrote). I then checked some reading on the subject, like a recent blogpost from Google's own Webmaster Central blog, an article on PRWEB.com and an article on Forecheck.com. Using the Webmaster blog's advice, I decided to check how one of my sites (this one) looked like in my Webmaster Tools account. If you follow along, you can check if you have blocked resources or not.
Fetch as Google
Go to your Webmaster dashboard >> Crawl >> Fetch as Google. This option allows you to check how the Google bot interprets your site on the web. I set the drop-down option to Mobile - Smartphone:
Then click Fetch and Render and let the google bot do it's work. After a few seconds, it showed the result:
Apparently I get a green flag, so it's not too bad, but when you click on the Partial text, it reveals how the bot actually thinks the site looks like:
Next step was to see if I could improve this by allowing the bot to crawl the template folder for my specific template. I guess you could simply open up the entire /templates/ folder, but why offer non-relevant stuff to Google. So I added an additional line in robots.txt, allowing the exact template I use:
Then I attempted to use Fetch and Render again to see if anything changed. Unfortunately the status still showed as Partial, but apparently the main issue was solved, as the site looked a bit better now, and the structure loaded fine now:
Actually for most sites your site should be fine with just the previous action, but in my case, I wanted to see if I could get things perfect. The reason why the site was still not 100% were displayed at the bottom of the Webmaster screen:
Apparently some extension related stuff was still blocked. Whether all this is really necessary for the Google bot I'm not quite sure, but for the sake of the exercise I decided to unblock 2 more locations (try to be as specific as possible):
Now when I fetch the site, the status shows: Complete! Now you cannot click on this anymore to details, you should be confient your status is fine.
Well, after I initially wrote this blogpost I learned that for new sites, the /templates/ and /media/ folder are no longer blocked, but for older sites you should definitely check those lines and remove them (or comment them out). Also use the Fetch and Render feature to see if there are no other resources that should be unblocked. This might just be one of those little SEO improvements that can help you to beat your competition in Google!