Showing posts with label Page Rank. Show all posts
Showing posts with label Page Rank. Show all posts

Friday 3 February 2012

Is PageRank Important?

Just Enough Knowledge to be Dangerous

One of the bigger problems with learning in the field of SEO is that there are a lot of people who have a nugget of information. And they spread it far and wide without the proper context needed to evaluate the potential risks and rewards of any given strategy. So new SEOs end up thinking topic x is the most important, then topic y, then topic z. And then someone debunks one of those. Many false facts are taken as truths when the people with a nugget of information (that they found from some source) spread it as fact.

Accurate Answers Need Context

As the structure of the web changes and search engine relevancy algorithms change then so must the field of SEO. This means that the right answer to questions can change frequently, and information from many years ago may not be correct. Does PageRank matter? When I first got in SEO it was crucially important, but over the years other pieces of the relevancy algorithms (like domain age, domain name, domain trust, domain extension, link anchor text, searcher location, search query chains, word relationships, search personalization, other user data, result re-ranking based on local inter-connectivity, input from 10,000+ remote quality raters, and even a wide array of penalties & filters) have been layered over the top of the core relevancy algorithm.

If that sounds like a lot, it is because it is!

Yes, PageRank is important to driving indexing, but for rankings it is nowhere near as important as it once was. SEO has become a much more refined art. In an October 2009 interview, Google's Amit Singal stated:

No one should feel, if I dismantle the current search system, someone will get upset. That’s the wrong environment. When I came, I dismantled [Google cofounders] Larry and Sergey’s whole ranking system. That was the whole idea. I just said, That’s how I think it should be done, and Sergey said, Great!

Great SEO Service is Interactive

Search keeps innovating - as it must. Each layer of innovation creates new challenges and new opportunities.

Not only does SEO strategy change over time, but it also varies from site to site. A large corporate site has a different set of strengths and weaknesses than a small local business website. The best SEO advice must incorporate all of the following
  • where you are
  • where you want to be
  • the resources you have to bridge the gap between the above 2 (domain names, brand, social relations, public relations, capital, etc.)
  • what the competition is doing
  • your strengths and weaknesses relative to your market

That is why having an interactive SEO Community is so important. It allows us to look for competitive strengths and weaknesses, and offer useful tips that fit your market, your website, and your business.

Even Search Engineers Don't Know All the Search Algorithms

The algorithms are so complex that sometimes even leading search engineers working for Google are uncertain of what is going on. Search engineers can't know every bit of code because Google has made over 450 algorithm changes in a single year.

When I first wrote about a new algorithmic anomaly that I (and others) saw, I got flamed with some pretty nasty words on public SEO sites...a few of which are highlighted below:

SEO Company.

The above people were:

  • confident
  • rude
  • wrong

And that is part of the reason I stopped sharing as much research publicly. Sharing publicly meant...

  • spending long hours of research and writing (for free)
  • creating more competition for myself (from the people who listen to my tips and advice)
  • watching my brand get dragged through the mud by people who didn't have the experience or capacity needed to understand and evaluate what I was writing about (but who had enough time to hang out in a free forum and blast me).

Whereas if we share that sort of information in our exclusive member forums we...

  • help our customers
  • get to share information and learn from each other's experiences
  • don't get blasted by the trolls hanging out on the public forums

Google's Matt Cutts Confirmed I Was Right

In early 2008 Google's Matt Cutts (one of the top 4 search engineers working at Google) wrote about the above issue that he did not know existed (even AFTER he was alerted to it).

Matt Cutts on Position 6 Issue.

But take notice that Matt would not confirm the issue until he claimed it had been corrected. So if you wanted to research that issue to better learn the relevancy algorithms it was already gone.

SEO professionals either captured the opportunity early or missed it. And, if they waited for the official word from Google, they missed it.

Algorithmic anomalies & new penalties are often written off by the industry & then months or years later the truth is revealed.

Back to PageRank

So PageRank...is it important? Yes, primarily for

  • determining the original source of content when duplicates of a page exist
  • selecting the initial set of results (before re-ranking them based on other factors)
  • establishing the crawl priority and crawl depth of a site

But when determining which site ranks better than the next, link diversity is typically far more important than raw PageRank. And even though PageRank is supposed to be query independent, Google warps their view of the web graph where necessary to improve relevancy, like when localizing search results:

Q: Anything you’ve focused on more recently than freshness?

A: Localization. We were not local enough in multiple countries, especially in countries where there are multiple languages or in countries whose language is the same as the majority country.

So in Austria, where they speak German, they were getting many more German results because the German Web is bigger, the German linkage is bigger. Or in the U.K., they were getting American results, or in India or New Zealand. So we built a team around it and we have made great strides in localization. And we have had a lot of success internationally.

The above quote shows how they look at far more than PageRank and links when trying to determine relevancy.

3 Common SEO Approaches

There are 3 basic ways to approach search engine optimization

  • a mechanical strategy, where you try to outsmart the search engines and stay ahead of their relevancy algorithms
  • a marketing-based approach, where you try to meet ranking criteria by creating the types of content that other people value and making sure you push market it aggressively
  • a hybrid approach, where you take the easy mechanical wins and study general algorithmic shifts...but are primarily driving your decisions based on fundamental marketing principals

Comparing the 3 Strategies

For most people the first approach is simply too complex, risky, and uncertain to be worth the effort. Not only do the sites get burned to the ground, but it happens over and over again, so it is quite hard to build momentum and a real business that keeps growing. In fact, most of the top "black hat" SEOs have "white hat" sites that help provide stable income in case anything happens to their riskier sites. Some people are great at coming up with clever hacks, but most people would be better off focusing on building their business using more traditional means.

If search engineers have access to the source code and still don't know everything then how can people outside the company know everything? They can't. Which is why we take a hybrid approach to SEO.

The approach we teach is the hybrid approach - a marketing-based strategy with some easy mechanical wins mixed in. Our customers take some of these easy wins to help differentiate their strategy from uninformed competitors, and then use marketing principals to build off of early success.

The Paradox of SEO

In using a marketing based approach you build up many signals of trust and many rankings as a side effect of doing traditional marketing. If people are talking about you and like your products then you are probably going to get some free high-quality links. And this leads us to the paradox of SEO: "the less reliant your site is on Google the more Google will want to rely on your site."

If you want good information to find out what is working and what is not, you can use our site searchto find answers to most common SEO questions, and know you are getting answers from a trust-worthy source. The information we give away is of higher value than what most people sell.

Saturday 21 May 2011

Google PageRank, Local Rank and Hilltop Algorithms

When estimating websites, crawler-based search engines usually consider many factors they can find on your pages and about your pages. Most important for Google are PageRank and links. Let's look closer at the algorithms applied by Google for ranking Web pages.

Google PageRank

Google PageRank (further referred to as PR) is a system for ranking Web pages used by the Google search engine. It was developed by Google founders Larry Page and Sergey Brin while they were students at Stanford University. PageRank ("PageRank" written together is a trademark that belongs to Google) is the heart of Google's algorithm and makes it the most complex of all the search engines.

PageRank uses the Internet's link structure as an indication of each Web page's relevancy value. Sites considered high quality by Google receive a higher Page Rank and – as a consequence – a higher ranking in Google results (the interdependence between PageRank and site rankings in the search results is discussed later in this lesson). Further, since Google is currently the world's most popular search engine, the ranking a site receives in its search results has a significant impact on the volume of visitor traffic for that site.

You can view an approximation of the PageRank value currently assigned to each of your pages by Google if you download and install Google's toolbar for Microsoft Internet Explorer (alternatives also exist for other popular browsers). The Google toolbar will display the PageRank based on a 0 to 10 scale, however a page's true PageRank has many contributing factors and is known only to Google.

For each of your pages PageRank may be different, and the PageRanks of all the pages of your site participate in the calculation of PageRank for your domain.

For each of your pages, the PR value is almost completely dependent upon links pointing to your site, reduced, to some degree, by the total number of links to other sites on the linking page. Thus, a link to your site will have the highest amount of impact on your PR if the page linking to yours has a high PR itself and the total number of links on that page is low, ideally, just the one link to your site.

The actual formula (well, an approximate one, according to Google's official papers) for PR is as follows:

PR(A) = (1-d) + d (PR(T1)/C(T1) + ... + PR(Tn)/C(Tn))

where pages T1...Tn all point to page A. The parameter d is a damping factor which can be set between 0 and 1. Google usually sets d to 0.85. C(T) is defined as the number of links going out of page T.

Thus, a site with a high PR but a large number of outbound links can nullify its own impact on your PR. To increase your PageRank, get as many links to your site from pages with a high PR and a low number of total links. Alternatively, obtain as many links pointing to your site as you can, no matter what their PageRank is, as long as they are ranked. It depends on each specific case which variant will get the best out of the PR formula.

Those of you interested in the mathematical aspect will see that the formula is cyclic: the PR of each page depends on the PR of the pages pointing to it. But we won't know what PR those pages have until the pages pointing to them have their PR calculated and so on. Google resolves this by implementing an iterative algorithm which starts without knowing the real PR for each page and assuming it to be 1. Then the algorithm runs as many times as needed and on each run it gets closer to the estimate of the final value.

Each time the calculation runs, the value of PageRank for each page participating in the calculation changes. When these changes become insignificant or stop after a certain number of iterations, the algorithm assumes it now has the final Page Rank values for each page.

Real Page Ranks range from 0.15 (for pages that have no inbound links at all) up to a very large number. The actual value changes every time Google does re-indexing and adds new pages to its database. Most experts agree on the point that the interdependence of toolbar PR and real PR are based on the logarithmic scale. Here's what it means if we assume that the base for the algorithm is, for instance, 10:

Toolbar PageRank
(log base 10)
Real PageRank
0
0 .15 – 10
1
100 – 1,000
2
1,000 – 10 , 000
3
10,000 – 100,000
4
100,000 – 1,000,000
5
1,000,000 – 10,000,000
6
10,000,000 – 100,000,000
7
100,000,000 – 1,000,000,000
8
1,000,000,000 – 10,000,000,000
9
10,000,000,000 – 100,000,000,000
10
100,000,000,000 – 1,000,000,000,000
Although there is no evidence that the logarithm is based on 10, the main point is that it becomes harder and harder to move up the toolbar, because the gaps to overcome become larger and larger with each step. This means that for new websites, "toolbar" PR values between 1 and 3 may be relatively easy to acquire, but getting to 4 requires considerably more effort and then pushing up to 5 is even harder still.

As you may have figured out from the formula above, every page has at least a PR of 0.15 even if it doesn’t have any inbound links pointing to it. But this may only be in theory – there are rumors that Google applies a post-spidering phase whereby any pages that have no incoming links at all are completely deleted from the index.

Local Rank

Local Rank is an algorithm similar to PR which is written by Krishna Bharat of the HillTop project. Google applied for a patent in 2001 and received it in early 2003. To sum it up, this algorithm re-ranks the results returned for a certain user's query by looking at the inter-connectivity between the results. This means that after a search is done, the PR algorithm is run among the result pages only, and the pages that have the most links from other pages in that set will rank highest.

Essentially, it's a way of making sure that links are relevant and ranking sites accordingly. Please note that this algorithm does not count links from your own site – or, to be more exact, links from the same IP address.

Assuming that it is used by Google, make sure that you first get links pointing to you from other pages that rank well (or rank at all) for the keyword that you are targeting. Directories such as Yahoo! and DMOZ would be a good place to start – they tend to rank well for a wide range of keywords. Also, keep in mind that this is about pages, not sites. The links need to be from the pages that rank well – not other pages on sites that rank well.

Hilltop

Hilltop is a patented algorithm that was created in 1999 by Krishna Bharath and George A. Mihaila of the University of Toronto. The algorithm is used to find topic relevant documents to the particular keyword topic. Hilltop operates on a special index of " expert documents".

Basically, it looks at the relationship between the "Expert" and "Authority" pages. An "Expert" is a page that links to lots of other relevant documents. An "Authority" is a page that has links pointing to it from the "Expert" pages. Here they mean pages about a specific topic and having links to many non-affiliated pages on that topic. Pages are defined as non-affiliated if they are authored by authors from non-affiliated organizations. So, if your website has backlinks from many of the best expert pages it will be an "Authority".

In theory, Google finds "Expert" pages and then the pages that they link to would rank well. Pages on sites like Yahoo!, DMOZ, college sites and library sites can be considered experts.

Google acquired the algorithm in February 2003.

Site Structure and PageRank

PageRank can be transmitted from page to page via links across different pages of your site as well as across all the sites in the Web. Knowing this, it’s possible to organize your link system in such a way that your content-rich pages receive and retain the highest PageRank.

The pages of your site receive PageRank from outside through inbound links. If you've got many inbound links to different pages of your site, it means PageRank enters your site at many points.

Such "PageRank entry points" can pass PageRank further on to other pages of your site.

The idea that you should keep in mind is that the amount of PageRank that a page of your site is able to give to another page depends on how many links the first (linking) page itself contains. This page only has a certain amount of Page Rank, which is going to be distributed over several other pages that this page links to.

The best way to obtain a good PR on all of your pages is to have a well thought-out linking structure for your site.

What this means is that every page on your site should have multiple links from your other pages coming into it. Since PR is passed on from page to page - the higher the PR that a page has, the more it has to pass on. Pages with a low number of links on them will pass relatively more PR per link. However, on your own site, you want all of your pages to benefit - usually. Also, PR is passed back and forth between all of your pages - this means that your home page gets an additional boost because, generally, every page on your site links to your home page.

Let's look at the prototypes of site linking schemes that may be beneficial in terms of PR distribution.

1. Simple hierarchy.

Simple hierarchy

The boxes denote separate pages and the figures in them denote the PR value calculated with the help of a simple algorithm that takes into consideration only these pages. With a site structure like this, it's pretty easy to get a high PR for your home page; however this is an ideal situation which is difficult to recreate in real life: you will want to get more cross-linking then just links from all your pages to the home page.

2. Linking to external pages that return backlinks

Linking to external pages that return backlinks

This just means creating a link directory page on your site and benefit a bit from link exchange with the external pages. Link exchanges are dealt with in the next lesson.

3. Site with inbound and outbound links

Site with inbound and outbound links

This is very similar to the first scheme, however, here there is an external site (Site A) passing its PR to your home page which then distributes it to child pages. You can see that both a homepage's PR and that of the child pages have significantly increased. It doesn't matter how many pages you have in your site, your average PR will always be 1.0 at best. But a hierarchical layout can strongly concentrate votes and, therefore the PR, into the home page.

So here are some main conclusions you should keep in mind when optimizing the link structure of your site for better PR distribution.
  • If a particular page is very important – use a hierarchical structure with the important page at the "top".
  • When a group of pages may contain outward links – increase the number of internal links to retain as much PR as possible.
  • When a group of pages do not contain outward links – the number of internal links in the site has no effect on the site's average PR. You might as well use a link structure that gives the user the best navigational experience.

How your PageRank influences your rankings

While the exact algorithm of each search engine is a closely guarded secret, search engine analysts believe that search engine results (ranking) are some form of a multiplier factor of Page relevance (which is determined from your multiple of "on-page" and "off-page" factors) and PageRank. Simply put, the formula would look something like –

Ranking = [Page Relevance] * [PageRank]

The PR logic makes sense since the algorithm seems invulnerable to spammers. The search results of Google search have demonstrated high relevance and this is one of the main reasons for their resounding success. Most other major search engines have adopted this logic in their own algorithms in some form or other, varying the importance they assign to this value in ranking sites in their search engine result pages.

What you should remember from this lesson:

  1. PageRank was developed by Google to estimate the absolute (keyword-independent) importance of every page in its index. When Google pulls out the results in response to a Web surfer's query, it does something similar to multiplying the relevance of each page by the PR value. So, PageRank is really worth fighting for.
  2. PageRank depends on how many pages out there link to yours (the more, the better) and how many other links these pages contain (the less, the better).
  3. You may try to optimize the link structure of your site for better PageRank distribution. Most simply, you should create a site map, get many cross-links between your pages and organize a hierarchy link structure with the most important pages on the top.