Orphan Pages
1. Orphan pages contradictory definitions
Alas, there is no universally accepted “orphan pages” term definition in the technical SEO world. Instead, there are all sorts of definitions, more or less belonging to three categories.
First, we have people who call "orphan pages" any pages which can't be found by crawling links from the home page.
An orphan page is a page that cannot be found by crawling the internal links of a website from the start page.
That is a good definition, even though we prefer a different one. No problems here.
Orphan pages are URLs that cannot be found on your website during a crawl.
This is less clear, since a crawl may use some other source of URLs instead of or in addition to the links from the home page - for example, an XML sitemap or even just a text file with URLs.
Second, there are folks calling "orphan pages" any pages having no incoming internal links.
Orphan pages are pages that aren't linked to anywhere on your site.
Dubious wording. One may have to re-read this several times to understand what did they try to say here.
An orphan page is a webpage with no internal links directing users or search engines to it, meaning it has no “parent” page.
Orphaned pages are essentially any page on your site that doesn't have a link from ANY other page on your site.
Third, there are foggy and/or contradictory definitions given in the technical SEO articles apparently written by professional copywriters.
Orphan pages are pages on your website that are not linked internally to any of your other pages.
So if a page has no incoming internal links but outgoing internal links are present, it's suddenly not orphaned?
Orphan pages are indexable pages that have no internal links.
What internal links? Incoming? Outgoing? Both? And what if the page is not indexable? Why wasn't indexability mentioned in the previous definition then?
Orphan pages are webpages that don’t have links from anywhere else on your site.
That would be actually fine, if not for the contradicting illustration right next:
We see that some pages indeed do have links from somewhere else on the site, but are still called orphan pages.
An orphan page is a page of a website which does not point to any link from another site.
Since this currently occupies 5th place in the SERP for "orphan pages" keyword, we can safely conclude that one man's incomprehensible text is another man's gem of an article. Just look at this:
It's... it's flawless!
2. Our definition of orphan pages
We stick with the “orphan page is a page with no incoming internal links” definition since it is simple and allows separating orphan pages from orphan clusters.
3. Orphan clusters
Keep in mind that there might be pages with some incoming internal links, and yet unreachable from the website's home page. These form orphan clusters - groups of pages interlinked with each other, but not accessible from the main page (or other crawl start points, not belonging to the same cluster).
4. How to find orphan pages and clusters
A crawler tool has to rely on some source of URLs other than the website itself, since orphan pages and clusters are, by definition, unreachable by crawling from the home page. Some companies that are supposed to be experts, like Botify and FandangoSEO, insist on using a log analyzer to find orphan pages. Don’t do that - if search engines know nothing about your orphaned pages, or are just unwilling to crawl them for whatever reason, their crawler hits won’t be present in your web server logs. Instead, stick to these principles of clusterizing your pages:
- Put the pages you want to be indexed and appear in SERPs into your XML sitemap(s).
- Keep the pages you want to be indexed but not appear in SERPs out of the XML sitemap(s). These might include pages that exist to improve crawlability, but are of little to no value to users. Or there might be pages providing value, just not meant to be found via search engines.
- Prevent pages you don’t want indexed from being indexed using meta robots noindex or the X-Robots-Tag: noindex HTTP header.
This way you can feed your XML sitemap(s) to a crawler, and find orphan pages (and clusters) where it matters the most․ You won't have to rely on search engines knowing about those pages and to find a way to upload your web server logs to a crawler tool.