Like, the www or non-www issue, any kind of duplicate content in your site could be a hazard for your search engine rankings. Of course, you will have to make sure that your content is unique and is not copied from somewhere else, or re-used in other parts of your sites, but you will also have to make sure that the same page cannot be accessed through multiple URL's.
A lot of open source CMS's have possible issues with this, and Joomla is one of them. Even when you have SEF links turned on in your Joomla global configuration, the non-SEF URL still exists. This means 2 URL's with the same content, and often there are more. Duplicated URL's can exist because of the following reasons:
Having pages being reachable from multiple URL's could harm your rankings, so it's best to prevent this. This can be done in many different ways. Some can be used on their own, but you can also combine techniques to totally get rid of your duplicates:
One very common reason for duplicates is if you link one article from multiple menu-items. This is a very common thing to do: sometimes an article that is reached from the main menu must also be reachable from a footer menu-item. In this case, Joomla builds a URL for both menu-items. Let's compare 2 examples:
Apart from some small stuff, like a breadcrumb path, or module assignments, these pages are identical, and are real duplicate content issues. Partly, this is because of the way Joomla works, but you can work around this in many cases:
With some creativity, you can sometimes even think of more solutions like this.
Anyone serious about SEO will sonner or later have to work with redirects: they are often needed to solve small issues, but sometimes you will have to apply them massively, say, after a site redesign or transfer to another domain.
Using 301-redirects means that you tell anyone who accesses such a URL: This link has permanently (the 301 is used for that) moved, please go here. As an example: if somebody goes to:
http://joomlaseo.com/index.php?option=com_content&Itemid=125&catid=15&id=18&lang=en&view=article
he is forwarded to:
http://joomlaseo.com/Checklist/avoid-duplicate-url-s
You can achieve 301 redirects either in your .htaccess file, or using an extension, like ReDJ, which is a very nice and simple extension for this. More on 301-redirects and how to set up .htaccess for this can be found in the article about re-routing old URL's.
There are other types of redirects, but these are only for specific cases. An example is a 302-redirect, which is a temporary redirect.
Setting a canonical URL can be the solution to tell Google that, even though there are multiple URL's for the same content, there is only one variant that should be indexed. I always translate "canonical" with "preferred", that might make more sense to you. If you set the canonical correctly, all possible duplicates of a Joomla page have the HTML for that in the head section, lpointing to the preferred version. As an example, let's look the page you are currently looking at. It can be reached in 2 ways:
The first URL is currently rerouted, but if it wasn't, configuring a canonical URL will tell Google that it is the same page as the SEF URL:
<link href="/Checklist/avoid-duplicate-url-s" rel="canonical"/>
Using this technique, you can prevent having duplicate URL's indexed by Google, even when they are still accessible.
Currently, in Joomla, a canonical tag is only applied for the non-preferred URL's. The actual preferred URL does not get a canonical pointing to itself. Frankly, I think that there should also be a self-referencing canonical for those URL's. Besides other advantages, it would automatically solve issues with pages having anchor links inside the page, like the tabs described earlier.
In Joomla, there is not much you can configure about canonicals. The only option you can set in Joomla is in the settings for the System - SEF plugin. It allows you to set a Site Domain. However, it is only useful if you make the same website available through multiple domains (parked domains).
If you need to set canonicals (and you know what you're doing), you should use an extension. SH404SEF is perfect for this, but may be a bit complicated for some. However, I can fully recommend PWT SEO for this. It allows you to set canonicals globally as self-referencing ones, but you can also set individual canonicals:
Using your Joomla .htaccess file you can solve quite a few of your duplicate URL issues (provided URL-rewriting is on). We already discussed how to reroute www and non-www URL's and creating 301-redirects, but you can also use it to get rid of many other types of issues. Just an example: say your URL's are accessible both with and without a trailing slash, meaning /page1/ and /page1 have the same content. You can mass-redirect the version with the trailing slash to the version without with just a short piece of code:
RewriteEngine On
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.+)/$ http://%{HTTP_HOST}/$1 [R=301,L]
Again, test if the trailing slash is indeed removed AND whether your site actually still works! Allways be careful with .htaccess changes! Similar issues could arise because of parameters, like setting a font size, leading Google to think that 2 different pages exist:
If I need solutions for this, I often go to the Stackexchange forums, there is a lot of useful information there.
You can set-up your robots.txt file in such a way that it disallows any URL with a query string, i.e. a '?' from being indexed, see the article about robots.txt for the code. It both prevents issues with duplicate UR's because of non-SEF URL's, but also real query strings, like these:
For smaller sites, preventing issues can easily be done by configuring .htaccess, robots.txt, and possibly a small extension for 301-redirects, but for larger sites, using a SEF extension is probably more efficient. It takes some time to learn how these extensions work, so start trying it out on a site that is not that important. If used correctly, it will ban all duplicate URL issues from your site. However, if used incorrectly, it could have the total opposite.
Some well-known SEF-extensions:
Check the extensions section of this site for information about these and others.
Using Google Search Console is an alternative way of getting rid of duplicate URL's. Preferably you should use any of the discussed techniques to prevent issues showing up in your Search Console account, and even if they do, first go back and review your set-up. However, sometimes you may not be able to prevent duplicates from showing up.
Please note: Don't panic when you see issues like this as warnings in Webmaster Tools. Especially with new sites, Google often encounters these issues, but usually, especially with parameters, it learns that this is not a separate page, and the warnings disappear after a few weeks.... Deal with the remaining issues, but remember that this is an advanced topic. For more information read our article on this subject.
But if all fails and there's no way around it, you can even ask Google to remove URL's from their index: https://support.google.com/webmasters/answer/1663419. But frankly, I never used this method.
Joomlaseo.com is fully built and written by Simon Kloostra, Joomla SEO Specialist and Webdesigner from the Netherlands. I have also published the Joomla 3 SEO & Performance SEO book. Next to that I also sometimes blog for companies like OStraining, TemplateMonster, SEMrush and others. On the monthly Joomla Community Magazine I have also published a few articles.