Online Marketing Lesson 31: The PageRank Algorithm

OnlineMarketingBanner_2

The PageRank was the breakthrough that enabled Google to dominate the search engine industry.

The idea was simple yet revolutionary: to analyze the relationships between all the web pages and how they were linked together, with the purpose of determining the relative importance of each page.

The PageRank literally changed the Web, and it is, therefore, important for you to understand how it works. In this lesson, we will explain the PageRank concept, illustrate how the calculations are done and dismiss some common misconceptions about it.

Understanding the PageRank

The PageRank started being developed in 1995 by Larry Page (hence the name Page-Rank), while he was still at Stanford University. After some time Sergey Brin joined him, and within two years they launched the first prototype of the Google search engine.

So what is the PageRank? It is an algorithm. That is, a set of instructions that is aimed to solve a particular problem or perform a particular task. The PageRank algorithm was designed to estimate the relative importance of web pages.

This was a necessary step to produce good search results.

Before the page rank, we already had search robots that were able to crawl the body of web pages, meaning that it was possible to index all the pages that contained a specific keyword. In other words, the search engines could already gauge the relevance of web pages to certain search queries.

If a user searched for that keyword, however, in what order would the search engine serve all the thousands of pages that contained it?

The method that the first search engines used was based on keyword density. If the user searched for “money,” the search engine would put on top of the results page the web page that used the word “money” more frequently in its body. Then the second page that used “money” more frequently and so on.

This method was obviously very rustic, and it would produce mediocre results. Being able to gauge relevancy alone was not enough. To produce good search results, a search engine would need to gauge both relevance and importance (as a measure of authority).

There is where the PageRank came in.

The basic idea behind the algorithm is the fact that a link (also called hyperlink or backlink) from page A to page B can be seen as a vote of trust from page A to page B. The higher the number of links (weighted to their value) to a page, the higher the probability that such page is important among the set of pages being analyzed.

Notice that we said “weighted to their value” because the PageRank algorithm is not linear but rather recursive. In other words, not all links are equal. A link from a page that has many links pointing to itself will have more value than a link from a page that has few links pointing to itself.

Similarly, a link from a page that has 100 outgoing links inside it will probably have less value than a link from a page that only has 2 outgoing links.

Once Google started using the PageRank, therefore, it was able to see the relative importance of all the pages on the Web. When a user searched for “money,” for example, Google would now be able to gauge both the relevance and the importance of the pages. It would end up serving pages that both contained the keyword “money” prominently and frequently in their bodies, and that received many links from other pages.

The results were far superior to what other search engines could produce.

The Equation

Here is the original equation that was published on the Stanford research paper titled

The Anatomy of a Large-Scale Hypertextual Web Search Engine:

PR(A) = (1-d) + d(PR(t1)/C(t1) + … + PR(tn)/C(tn))

Now let’s explain the equation:

PR(A) > This means the PageRank of page A

d > d is the damping factor, which is used to make sure that the PageRank of pages will not keep growing forever. This is necessary due to the recursive nature of the algorithm. Without a damping factor, if page A links to page B and page B links to Page A, their PageRank would need to be calculated with infinite iterations. Once page A links to page B, the PageRank of page B will increase. That in turn will increase the value of the link from page B to page A, so the PageRank of page A will also increase. But that will again increase the value of the link from page A to page B, so the PageRank of page B increases again. So on and so forth, in a never-ending cycle. With the damping factor, however, at each iteration the value of the link is smaller and smaller, and the PageRank of the two pages will converge to a finite number.

t1….tn > The interval between t1 and tn represents all the pages linking to page A. You could say that t1 is the first page linking to page A, and tn is the last page linking to page A, and between them you have all the other pages linking to page A. If there are 5 pages linking to page A, for example, you would have t1, t2, t3, t4 and t5 in the equation.

PR(t1)/C(t1) > PR(t1) is the PageRank of page t1, while C(t1) is the number of outgoing links on page t1. If Page t1 has a PageRank of 5 and 10 outgoing links, therefore, each of those links would have a value of 0.5.

No one knows the exact damping factor that is used by Google, but most people agree that it is around 0.85, which was the original value proposed on the Stanford research paper.

A simpler way of proposing that formula, therefore, would be:

PR(A) = 0.15 + 0.85 (weighted value of all the backlinks of that page)

The weighted value of each backlink, as we explained before, is the PageRank of the page where the link is coming from divided by the number of outgoing links on that page.

Applying The Numbers

Let’s use a numerical example to illustrate the formula. Suppose that we want to calculate the PageRank of page A, which is linked from page B, page C and page D. Our equation would look like this:

PR(A) = 0.15 + 0.85 ( weighted value of B link + weighted value of C link + weighted value of D link)

We know that the weighted value of the link coming from a certain page is the PageRank of that page divided by the number of outgoing links on that page. So our formula can be seen as:

PR(A) = 0.15 + 0.85 ( PageRank of B / Outgoing links on B + PageRank of C / Outgoing links on C + PageRank of D / Outgoing links on D)

We now need to know the PageRank of each of the pages and the number of outgoing links on them. Let’s assume that:

  • Page B has a PageRank of 5 and 1 outgoing link
  • Page C has a PageRank of 8 and 5 outgoing links
  • Page D has a PageRank of 3 and 3 outgoing links

Applying the numbers to the formula we get:

PR(A) = 0.15 + 0.85 ( 5 / 1 + 8 / 5 + 3 / 3)

Which is equal to:

PR(A) = 0.15 + 0.85 ( 5 + 1.6 + 1)

Which is equal to:

PR(A) = 0.15 + 0.85 (7.6)

And finally:

PR(A) = 6.61

Nominal vs. Real PageRank

Google has a toolbar that users can install on their browsers to see the PageRank of the pages that they are visiting. That PageRank is called “nominal” or “toolbar” PageRank, and it used by Google only to give an indication of the importance of the page in question.

The nominal PageRank is a whole number between 0 and 10, where 0 is the lowest PageRank and 10 the highest. Google.com, for instance, has a nominal PageRank of 10, while a new website would start at 0 (or “unranked”).

The real PageRank, however, is believed to work as a floating-point system, which basically supports a wider range of values. Here is a comment that Matt Cutts, Google’s head of Web Spam, wrote about the PageRank:

It’s more accurate to think of it as a floating-point number. Certainly our internal PageRank computations have many more degrees of resolution than the 0-10 values shown in the toolbar.

The nominal PageRank is updated once every three months, more or less, while the real PageRank is calculated permanently as the Google bots crawl the web and find new web pages and new backlinks.

A Logarithmic Scale

Another important point to keep in mind is that the PageRank uses a logarithmic scale. That is, its measurement is done with a logarithm, and not with the absolute value of the PageRank itself.

A logarithm of a given number is the exponent that we must use for a certain base to reach that number. For example, the logarithm of 4 to the base 2 is 2 (because 2 to the power of 2 equals 4). The logarithm of 8 to the base 2 is 3 (because 2 to the power of 3 equals 8). The logarithm of 16 to the base 2 is 4 (because 2 to the power of 4 is 16). As you can see, the absolute value that we are using grows much faster than the logarithm. The logarithm of 256 to the base 2, for instance, is 8. While the absolute value jumped from 4 to 256, the logarithm only jumped from 2 to 8.

Logarithmic scales, therefore, are very useful for measuring data that has a very large range of values, which is the case of the PageRank.

The simple meaning of this is that each PageRank level is exponentially harder to achieve.

We can use the numbers above to illustrate the point. If we assume that each link has the same value and that the PageRank uses a base 2 on its logarithmic scale (which is not the case, but it makes the demonstration easier to understand), we conclude that a web page would need:

  • 2 backlinks to achieve a PageRank 1,
  • 4 backlinks to achieve a PageRank 2,
  • 8 backlinks to achieve a PageRank 3,
  • 16 backlinks to achieve a PageRank 4,
  • 32 backlinks to achieve a PageRank 5,
  • 64 backlinks to achieve a PageRank 6,
  • 128 backlinks to achieve a PageRank 7,
  • 256 backlinks to achieve a PageRank 8,
  • 512 backlinks to achieve a PageRank 9, and
  • 1024 backlinks to achieve a PageRank 10.

In other words, you would need to gain only 4 new backlinks to move from a PR2 to a PR3, but if you wanted to move from a PR3 to a PR4 you would need to gain 8 new backlinks. If you wanted to move from a PR 7 to a PR 8 you would need to gain 128 new backlinks.

If we used a larger base, the exponential increase would be even more significant. For example, if we used a base 8, a web page would need:

  • 8 backlinks to achieve a PageRank 1,
  • 64 backlinks to achieve a PageRank 2,
  • 512 backlinks to achieve a PageRank 3,
  • 4,096 backlinks to achieve a PageRank 4,
  • 32,768 backlinks to achieve a PageRank 5,
  • 262,144 backlinks to achieve a PageRank 6,
  • 2,097,152 backlinks to achieve a PageRank 7,
  • 16,777,216 backlinks to achieve a PageRank 8,
  • 134,217,728 backlinks to achieve a PageRank 9, and
  • 1,073,741,824 backlinks to achieve a PageRank 10.

Achieving a PR1 here is not that difficult, and would require just 8 backlinks. If you wanted to move to a PR3, however, you would need 504 new backlinks. If then you wanted to move to a PR4, you would need 3,584 new backlinks. Reaching a PR7 would require over 2 million backlinks.

No one knows for sure what logarithmic scale is used by Google, and not all links are treated equally as we saw before. The basic principle, though, is the one illustrated above.

Common Misconceptions

The Internet is filled with misconceptions about the PageRank, so we will clarify the most common ones below.

1. PageRank Is Not Equal to Search Ranking

Having a high PageRank does not assure that the page will rank high on the search results page. This is because Google uses over 200 factors to determine the search rankings, and the PageRank, despite being important, is only one of them.

2. Content Doesn’t Affect PageRank

The content of a website will not directly affect it is PageRank. In other words, having quality or frequently updated content will not improve the PageRank of a page. Indirectly it might affect it, because quality content tends to attract backlinks, but that is a separate issue.

3. Backlink Age and Relevancy Don’t Affect PageRank

The age and relevancy of backlinks do not affect the PageRank. They are among the 200 factors considered by Google when determining search rankings, however, and that is why many people get confused.

4. There Are No “Special” Links

Many people believe that .edu or .gov backlinks carry a PageRank bonus. Similarly, they believe that a link from a respected directory like DMOZ or Yahoo! will improve their PageRank automatically. This is not true. Those sources of links might help with the search rankings, but as far as PageRank is concerned they will be treated like any other link under the algorithm.

Action Points

  1. Make sure to understand why search engines need to gauge both relevancy and importance of the pages that they will serve in the search results.
  2. Read or scan through the original “The Anatomy of a Large-Scale Hypertextual Web Search Engine” research paper.
  3. Review the PageRank equation and understand how it works.
  4. Install the Google Toolbar or the Search Status extension on your browser, so that you will be able to see the nominal PageRank of the pages that you visit.

Navigation Links

Previous Lesson: On-Site Search Engine Optimization

Next Lesson: Off-Site Optimization