Watch out for Orphan pages in your site
Ever heard of Orphan pages? These are pages that are located outside the navigation of your site but are still reachable on your site by extensive crawlers (like Google....). Often you may not even know that they are there, but they can seriously harm your SEO efforts. Make sure you are aware of this risk.
First of all: having orphan pages on your site in itself is not necessarily bad, as long as you are aware of and mitigate any risks in your workflow. You may think this article won't affect you, but I'm pretty sure it does. Just let me give you some examples, and then look at your site again... Almost every site has at least a few of these pages, but this is usually not that much of a worry. However, some sites have loads of them, and then you may have issues.
Examples of Orphan pages
There are plenty of situations where you have orphan pages:
- If you have articles or categories published in the backend, but you do not provide a menu item to show these, they will definitely have a URL. Google will sooner or later find them and index them. However, if the article is not worth showcasing in the menu structure, they apparently are not very important to you. So why would they be of interest to Google? Only publish valuable content and always make the content reachable, either as an individual article menu link or in a blog- or list overview.
- Some sites use articles for the sake of using them to publish in module positions. This can be done using extensions like Articles Anywhere by Regularlabs or other extensions/tricks. As an example, many site owners may have an article in a module position for end-users that only have access to articles, without the complicated module administration. Now they can easily change the module's content by changing the article. But again, even though the article is "published" in a module position, it is also owning a URL of its own. Again a URL that may be indexed by Google. I see many, many sites that have this issue.
- Similarly, you may have an article-slider, showing articles from a category. Same issue here.
- Joomla will often generate a number of pages due to the nature of the CMS. There will always be a frontend login page, a page to reset your password, a search page, etcetera. Even if you did not set these up specifically, they will simply be there.
- Badly coded image sliders may create individual URLs for every image in the slider. These individual URLs may again be indexed. If you notice this, simply look for a better slider. While the other issues in this list can be solved, for these sliders, there is hardly a solution.
The big problem with all these situations is that the URLs Google has indexed for your site are heavily polluted with either duplicated blocks of content or pages with hardly any real content (so-called thin-content pages).
How to detect Orphan pages
Basically, you need to ask Google... You can do so in 2 ways. If your site is not too big, just go to Google.com and type in the following query: site:example.com. This will bring up a list of all URLs that Google knows for your site. Even Google may not always be 100% complete, but it will do for our purpose:
This looks perfectly fine, Google will probably start the list with the important pages. However, scrolling down, look out for obscure stuff. A result like the one below will make me suspicious:
In this case, it turned out to be an HTML page generated by a slider extension. The site contains hundreds of them, while there are only a few dozen "real" pages. Just check the same for your site and compare the search results with what you would expect to see.
For smaller sites, this method works fine, but for larger sites, it helps to use a tool. Personally, I use theWebsite Auditor by SEO Powersuite (free to use with some limitations). It crawls your entire site, but in the advanced settings, you can also have it check the Google index to see if it contains pages that are outside the navigation. Once finished, check the list of URLs, looking at the columns "Orphan pages":
You can also easily export the list for further investigation.
What to do next?
If you found any orphan pages, what should you do? It depends on the nature of the pages you found. The image for the image slider is a headache issue and requires cumbersome work, like 301 redirecting the useless pages to the page where the slider is sitting. Better would be totally ditching extensions like this...
However, if you found pages that you deliberately set up to build up pages from modules or similar, you can quite easily solve this without any damage. As an example of valid use, take a look at the documentation for the PWT extensions, like PWT SEO: extensions.perfectwebteam.com/pwt-seo/documentation:
This looks like one page/article, but actually, we created an article for every section. The content table on the right is used for in-page navigation using anchor links. So this page is built up using 20 or so articles, that all have a separate URL. Google only indexed the main documentation page though. The solution: give each article used for this page Robots-setting of Noindex, Nofollow or Noindex, Follow (Publishing-tab):
You see, the solution is really easy. You can simply keep using building your sites the way you did, as long as you are aware of what can happen. Also, make sure that your sitemap excludes pages with a noindex tag. Otherwise, you are giving Google conflicting instructions. Most situations with orphan pages are caused by this type of setup and can easily be fixed.
There are always other situations that require alternative solutions. One more frequent example: even if you have not configured the front-end sign-in page, it will be there. Just type in https://example.com/index.php?option=com_users&view=login for your site. Often this page is indexed by Google. To remove it, create a menu item of type Login and set it Noindex.
You see, there are a lot of possible issues, but most can be solved quite easily. So, check your site and see if you improve your SEO in a few simple steps!