Ever heard of Orphan pages? These are pages that are outside the navigation paths of your site but that are still reachable on your site by extensive crawlers (like Google....). Often you may not even know that they are there, but they can seriously harm your SEO efforts. Make sure you are aware of this risk.
First of all: having orphan pages in your site in itself is not necessarily bad, as long as you are aware and mitigate any risks in your workflow. You may think this article won't effect you, but I'm pretty sure it does. Just let me give you some examples, and then look at your site again.... Almost every site has at least a few of these pages, but this is usually not that much of a worry. However, some sites have loads of them, and then you may have issues.
Examples of Orphan pages
There are plenty of situations where you have orphan pages:
- If you have articles or categories published in the backend, but you do not provide a menu-item to show these, they will definitely have a URL. Google will sooner or later find them and index them. However, if the article is not worth showcasing in the menustructure, they apparently are not very important to you. So why would they be of interest for Google? Only publish valuable content and always make the content reachable, either as an individual article menu link, or in a blog- or list-overview.
- Many sites use articles for the sake of using them to publish in module positions. This can be done using extensions like Articles Anywhere by Regularlabs or other extensions / tricks. As an example, many site owners may have an article in a module position for end-users that only have access to articles, without the complicated module-administration. Now they can easily change the module's content by changing the article. But again, even though the article is "published" in a module position, it is also owning a URL of it's own. Again a URL that may be indexed by Google. I see many, many sites that have this issue.
- Similarly, you may have an article-slider, showing articles from a category. Same issue here.
- Joomla will often generate a number of pages due to the nature of the CMS. There will always be a frontend login-page, a page to reset your password, a search page, etcetera. Even if you did not set these up specifically, they will simply be there.
- Badly coded image-sliders may create individual URL's for every image in the slider. These individual URL's may again be indexed. If you notice this, simply look for a better slider. While the other issues in this list can be solved, for these sliders, there is hardly a solution.
The big problem with all these situations is that the URL's Google has indexed for your site is heavily polluted with either duplicated blocks of content or with pages with hardly any real content (so-called thin-content pages).
How to detect Orphan pages
Basically you need to ask Google... You can do so in 2 ways. If your site is not too big, just go to Google.com and type in the following query: site:example.com. This will bring up a list with all URL's that Google knows for your site. Even Google may not always be 100% complete, but it will do for our purpose:
This looks perfectly fine, Google will probably start the list with the important pages. However, scrolling down, look out for obscure stuff. A result like the one below will make me suspicious:
In this case, it turned out to be HTML page generated by a slider extension. The site contains hundres of them, while there were only a few dozen "real" pages. Just check the same for your site and compare the search results with what you would expect to see.
For smaller sites, this method works fine, but for larger sites, it helps to use a tool. Personally, I use the Website Auditor by SEO Powersuite (free to use with some limitations). It crawls your entire site, but in the advanced settings, you can also have it check the Google index to see if it contains pages that are outside the navigation. Once finsihed, check the list of URL's, looking at the columns "Orphan pages":
You can also easily export the list for further investigation.
What to do next?
If you found any orphan pages, what should you do? It depends on the nature of the pages you found. The image for the image slider is a headache issue and requires cumbersome work, like 301-redirecting the useless pages to the page where the slider is sitting. Better would be totally ditch extensions like this...
However, if you found pages that you deliberately set up to build up pages from modules or similar, you can quite easily solve this without any damage. As an example of valid use, take a look at the documentation for the PWT extensions, like PWT SEO: extensions.perfectwebteam.com/pwt-seo/documentation:
This looks like one page / article, but actually we created an article for every section. The content-table on the right is used for in-page navigation using anchor-links. So this page is built up using 20 or so articles, that all have a separate URL. Googl only indexed the main documentation page though. The solution: give each article used for this page Robots-setting of Noindex, Nofollow or Noindex, Follow (Publishing-tab):
You see, the solution is really easy. You can simply keep using building your sites the way you did, as long as you are aware what can happen. Also, make sure that your sitemap excludes pages with a Noindex tag. Otherwise you are giving Google conflicting instructions. Most situations with orphan pages are caused by this type of set-up and can easly be fixed.
There are always other situations that require alternative solutions. One more frequent example: even if you have not configured the front-end sign-in page, it will be there. Just type in http://example.com/index.php?option=com_users&view=login for your site. Often this page is indexed by Google. To remove it, create a menu-item of type Login and set it Noindex.
You see, there are lot of possible issues, but most can be solved quite easily. So, check your site and see if you improve your SEO in a few simple steps!