Friday, 20 May 2011

Google PageRank, Local Rank and Hilltop Algorithms

When estimating websites, crawler-based search engines usually consider many factors they can find on your pages and about your pages. Most important for Google are PageRank and links. Let's look closer at the algorithms applied by Google for ranking Web pages.

Google PageRank

Google PageRank (further referred to as PR) is a system for ranking Web pages used by the Google search engine. It was developed by Google founders Larry Page and Sergey Brin while they were students at Stanford University. PageRank ("PageRank" written together is a trademark that belongs to Google) is the heart of Google's algorithm and makes it the most complex of all the search engines.

PageRank uses the Internet's link structure as an indication of each Web page's relevancy value. Sites considered high quality by Google receive a higher Page Rank and – as a consequence – a higher ranking in Google results (the interdependence between PageRank and site rankings in the search results is discussed later in this lesson). Further, since Google is currently the world's most popular search engine, the ranking a site receives in its search results has a significant impact on the volume of visitor traffic for that site.

You can view an approximation of the PageRank value currently assigned to each of your pages by Google if you download and install Google's toolbar for Microsoft Internet Explorer (alternatives also exist for other popular browsers). The Google toolbar will display the PageRank based on a 0 to 10 scale, however a page's true PageRank has many contributing factors and is known only to Google.

For each of your pages PageRank may be different, and the PageRanks of all the pages of your site participate in the calculation of PageRank for your domain.

For each of your pages, the PR value is almost completely dependent upon links pointing to your site, reduced, to some degree, by the total number of links to other sites on the linking page. Thus, a link to your site will have the highest amount of impact on your PR if the page linking to yours has a high PR itself and the total number of links on that page is low, ideally, just the one link to your site.

The actual formula (well, an approximate one, according to Google's official papers) for PR is as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where pages T1...Tn all point to page A. The parameter d is a damping factor which can be set between 0 and 1. Google usually sets d to 0.85. C(T) is defined as the number of links going out of page T.

Thus, a site with a high PR but a large number of outbound links can nullify its own impact on your PR. To increase your PageRank, get as many links to your site from pages with a high PR and a low number of total links. Alternatively, obtain as many links pointing to your site as you can, no matter what their PageRank is, as long as they are ranked. It depends on each specific case which variant will get the best out of the PR formula.

Those of you interested in the mathematical aspect will see that the formula is cyclic: the PR of each page depends on the PR of the pages pointing to it. But we won't know what PR those pages have until the pages pointing to them have their PR calculated and so on. Google resolves this by implementing an iterative algorithm which starts without knowing the real PR for each page and assuming it to be 1. Then the algorithm runs as many times as needed and on each run it gets closer to the estimate of the final value.

Each time the calculation runs, the value of PageRank for each page participating in the calculation changes. When these changes become insignificant or stop after a certain number of iterations, the algorithm assumes it now has the final Page Rank values for each page.

Real Page Ranks range from 0.15 (for pages that have no inbound links at all) up to a very large number. The actual value changes every time Google does re-indexing and adds new pages to its database. Most experts agree on the point that the interdependence of toolbar PR and real PR are based on the logarithmic scale. Here's what it means if we assume that the base for the algorithm is, for instance, 10:

Toolbar PageRank
(log base 10)
Real PageRank
0
0 .15 – 10
1
100 – 1,000
2
1,000 – 10 , 000
3
10,000 – 100,000
4
100,000 – 1,000,000
5
1,000,000 – 10,000,000
6
10,000,000 – 100,000,000
7
100,000,000 – 1,000,000,000
8
1,000,000,000 – 10,000,000,000
9
10,000,000,000 – 100,000,000,000
10
100,000,000,000 – 1,000,000,000,000
Although there is no evidence that the logarithm is based on 10, the main point is that it becomes harder and harder to move up the toolbar, because the gaps to overcome become larger and larger with each step. This means that for new websites, "toolbar" PR values between 1 and 3 may be relatively easy to acquire, but getting to 4 requires considerably more effort and then pushing up to 5 is even harder still.

As you may have figured out from the formula above, every page has at least a PR of 0.15 even if it doesn’t have any inbound links pointing to it. But this may only be in theory – there are rumors that Google applies a post-spidering phase whereby any pages that have no incoming links at all are completely deleted from the index.

Local Rank

Local Rank is an algorithm similar to PR which is written by Krishna Bharat of the HillTop project. Google applied for a patent in 2001 and received it in early 2003. To sum it up, this algorithm re-ranks the results returned for a certain user's query by looking at the inter-connectivity between the results. This means that after a search is done, the PR algorithm is run among the result pages only, and the pages that have the most links from other pages in that set will rank highest.

Essentially, it's a way of making sure that links are relevant and ranking sites accordingly. Please note that this algorithm does not count links from your own site – or, to be more exact, links from the same IP address.

Assuming that it is used by Google, make sure that you first get links pointing to you from other pages that rank well (or rank at all) for the keyword that you are targeting. Directories such as Yahoo! and DMOZ would be a good place to start – they tend to rank well for a wide range of keywords. Also, keep in mind that this is about pages, not sites. The links need to be from the pages that rank well – not other pages on sites that rank well.

Hilltop

Hilltop is a patented algorithm that was created in 1999 by Krishna Bharath and George A. Mihaila of the University of Toronto. The algorithm is used to find topic relevant documents to the particular keyword topic. Hilltop operates on a special index of " expert documents".

Basically, it looks at the relationship between the "Expert" and "Authority" pages. An "Expert" is a page that links to lots of other relevant documents. An "Authority" is a page that has links pointing to it from the "Expert" pages. Here they mean pages about a specific topic and having links to many non-affiliated pages on that topic. Pages are defined as non-affiliated if they are authored by authors from non-affiliated organizations. So, if your website has backlinks from many of the best expert pages it will be an "Authority".

In theory, Google finds "Expert" pages and then the pages that they link to would rank well. Pages on sites like Yahoo!, DMOZ, college sites and library sites can be considered experts.

Google acquired the algorithm in February 2003.

Site Structure and PageRank

PageRank can be transmitted from page to page via links across different pages of your site as well as across all the sites in the Web. Knowing this, it’s possible to organize your link system in such a way that your content-rich pages receive and retain the highest PageRank.

The pages of your site receive PageRank from outside through inbound links. If you've got many inbound links to different pages of your site, it means PageRank enters your site at many points.

Such "PageRank entry points" can pass PageRank further on to other pages of your site.

The idea that you should keep in mind is that the amount of PageRank that a page of your site is able to give to another page depends on how many links the first (linking) page itself contains. This page only has a certain amount of Page Rank, which is going to be distributed over several other pages that this page links to.

The best way to obtain a good PR on all of your pages is to have a well thought-out linking structure for your site.

What this means is that every page on your site should have multiple links from your other pages coming into it. Since PR is passed on from page to page - the higher the PR that a page has, the more it has to pass on. Pages with a low number of links on them will pass relatively more PR per link. However, on your own site, you want all of your pages to benefit - usually. Also, PR is passed back and forth between all of your pages - this means that your home page gets an additional boost because, generally, every page on your site links to your home page.

Let's look at the prototypes of site linking schemes that may be beneficial in terms of PR distribution.

1. Simple hierarchy.

Simple hierarchy

The boxes denote separate pages and the figures in them denote the PR value calculated with the help of a simple algorithm that takes into consideration only these pages. With a site structure like this, it's pretty easy to get a high PR for your home page; however this is an ideal situation which is difficult to recreate in real life: you will want to get more cross-linking then just links from all your pages to the home page.

2. Linking to external pages that return backlinks

Linking to external pages that return backlinks

This just means creating a link directory page on your site and benefit a bit from link exchange with the external pages. Link exchanges are dealt with in the next lesson.

3. Site with inbound and outbound links

Site with inbound and outbound links

This is very similar to the first scheme, however, here there is an external site (Site A) passing its PR to your home page which then distributes it to child pages. You can see that both a homepage's PR and that of the child pages have significantly increased. It doesn't matter how many pages you have in your site, your average PR will always be 1.0 at best. But a hierarchical layout can strongly concentrate votes and, therefore the PR, into the home page.

So here are some main conclusions you should keep in mind when optimizing the link structure of your site for better PR distribution.
  • If a particular page is very important – use a hierarchical structure with the important page at the "top".
  • When a group of pages may contain outward links – increase the number of internal links to retain as much PR as possible.
  • When a group of pages do not contain outward links – the number of internal links in the site has no effect on the site's average PR. You might as well use a link structure that gives the user the best navigational experience.

How your PageRank influences your rankings

While the exact algorithm of each search engine is a closely guarded secret, search engine analysts believe that search engine results (ranking) are some form of a multiplier factor of Page relevance (which is determined from your multiple of "on-page" and "off-page" factors) and PageRank. Simply put, the formula would look something like –

Ranking = [Page Relevance] * [PageRank]

The PR logic makes sense since the algorithm seems invulnerable to spammers. The search results of Google search have demonstrated high relevance and this is one of the main reasons for their resounding success. Most other major search engines have adopted this logic in their own algorithms in some form or other, varying the importance they assign to this value in ranking sites in their search engine result pages.

What you should remember from this lesson:

  1. PageRank was developed by Google to estimate the absolute (keyword-independent) importance of every page in its index. When Google pulls out the results in response to a Web surfer's query, it does something similar to multiplying the relevance of each page by the PR value. So, PageRank is really worth fighting for.
  2. PageRank depends on how many pages out there link to yours (the more, the better) and how many other links these pages contain (the less, the better).
  3. You may try to optimize the link structure of your site for better PageRank distribution. Most simply, you should create a site map, get many cross-links between your pages and organize a hierarchy link structure with the most important pages on the top.

No comments:

Post a Comment