Keep your Joomla site clean for Google
One way to keep your site healthy from an SEO perspective is to only deliver quality content to Google. Anything non-relevant 'dilutes' the quality of your site. Often I see Joomla sites where this goes wrong. There are roughly 3 major reasons for this: Really crappy content, poor articles with only very little text, and in itself relevant content that is useful for users, but that should not go in the Google index.
Now maybe you think, this is not relevant for your site, but chances are pretty high it probably is. Whatever the reason is that this content is there, make sure to review it and decide to either remove it or tell Google not to use it in its index. Let's discuss this in more detail, and also describe how to fix this:
Which content is irrelevant?
I already mentioned there are a number of possible reasons for irrelevant content. In some more detail, these are the major 3 reasons:
- Really crappy content: content that should not be there at all. I see many sites where this is the case. Very often, this concerns the sample data that is automatically installed with Joomla or your template (if you choose to do so). People often forget about this or they think that Google will not see it, as there is no menu link pointing to it: not true!!!. The same counts for test articles or obsolete articles.
- Poor articles with only very little text. Usually, you will know about these. Maybe you created a very short article just for SEO purposes, as you think having at least a URL with some content helps for SEO or whatever other reason there could be for these short articles. Google sees this as thin content, which could make you a possible victim of the Google Panda algorithm.
- In itself relevant content that is useful for users, but that should not go in the Google index: Login-pages, Create-an-account pages, but also pages like your Terms and Conditions, etcetera. These usually are not the most harmful ones, but filtering them out could still help.
Items 1 and 2 should really be avoided at all costs while item 3 is not too bad but needs to be addressed too.
How to find out whether you have irrelevant content?
Even though you may think your site is healthy enough, you would be surprised to find out that Google has indexed some pages that you even didn't know you had. A possible reason for this is the URLs Joomla creates automatically, even without setting up menu links for this. As an example, on every Joomla site, there is a link to a login page at the following URL: /index.php?option=com_users&view=login. And there are more.
Finding out whether these URLs isn't too hard, there are multiple methods for this:
- Ask Google: just type the following command in the search box: site:example.com
This shows all (or at least most) of the URLs Google knows for your site. Besides all other methods, I always use this one too.
- Use a crawler tool to check your site. You can use the Screaming Frog SEO Spider tool (free, desktop) or use an online tool.
Now tell me there isn't a URL here and there that you didn't know about...
Remove the crap or ask Google not to index it
Now that you know what should go, it's time to do something about it. Anything that is really crap, like sample content that you forgot about: simply remove it, and don't forget to check internal links to the removed content. If you simply remove it, Google will encounter 404 errors for this for some time. For stuff like this, this is perfectly fine. A 404 is a valid code, which simply means that the page is no longer there. Eventually, Google will then update its index. If you think these items had some relevance though, you can always 301-redirect it to a valid page.
The relevant stuff like the items listed under item 3 (sign-in, register, terms & conditions, etc.): of course do not remove it. However, these pages should not show up in the search results. People should see records for the great content you have, like this shiny laptop that you are selling in your webshop. If they're really interested, they will be able to click on your terms and conditions from within your site. The best option for these URLs is to set a Noindex attribute using the robots-metatag. This will simply tell Google not to put this URL in the index.
Usually, you can set the tag in the Joomla article or menu item. This counts for actual articles you created. The exception are the funny non-SEF URLs for the sign-on page and stuff like that. A possible way to deal with those is to create a (hidden) menu for these options and then use the menu item to Noindex them. And finally, make sure your sitemap only includes the relevant links.
You can also read this Moz.com blogpost. It is the article that inspired me to write this actual post, and it gives you some more advanced tips too.