|
How Often Does The Google Dance
Happen?
The name "Google Dance" is
often used to describe the period when a major index update of the
Google search engine is being implemented. These major Google index
updates occur on average every 36 days - or 10 times per year. It can
easily be identified by significant changes in search results, and by
an updating of Google's cache of all indexed pages. These changes can
be evident from one minute to the next. But the update does not
proceed as a change from one index to another like the flip of a
switch. In fact, it takes several days to finish the complete update
of the index.
Because Google, like every other search engine,
depends on their customers knowing that they deliver authoritative
reliable results 24 hours of the day, seven days a week, updates pose
a serious issue. They can't 'shut down for maintenance' and
they cannot afford to go offline for even one minute. Hence, we have
the Dance. Every search engine goes through it, some more or less
often than Google. However, it is only because of Google's reach that
we pay attention to its rebuild more than any other engines'
During this period, the index is constantly in
flux, and search results can vary wildly, because it is also during
the Dance that Google makes any algorithm adjustments live, and
updates the PageRank and Back Links for each web site it has indexed.
Do Search Results Only Change
During The Google Dance?
No, in fact,
during any month there will be minor changes in rankings. This is
because Google's bot or spider is always running and finding new
material. It also happens because the bot may have detected that a web
site no longer exists and needs to be deleted from the index. During
the Dance, the Googlebot will revisit every site, figure out how many
sites link to it, and how many it links out to, and how valuable these
links are.
Because Google is constantly crawling and updating
selected pages, their search results will vary slightly over the
course of the month. However, it is only during the Google Dance that
these results can swing wildly. You also need to consider that Google
has 8 data centers, sharing more than 10,000 servers. Somehow, the
updates to the index that occur during the month, and outside of the
Google Dance have to get transferred throughout. It's a constant
process for Google and every other search engine. These ongoing,
incremental updates only affect parts of the index at any one time.
Checking the Google Dance
You may know
that Google has 8 main www servers online, which are as follows:
· www-ex.google.com - (where you get when you type
www.google.com)
· www-sj.google.com - (can also be accessed at www2.google.com)
· www-va.google.com - (can also be accessed at www3.google.com)
· www-dc.google.com
· www-ab.google.com
· www-in.google.com
· www-zu.google.com
· www-cw.google.com
During the Google Dance, you can check the 8 Google
servers, and they will display sometime wildly differing results, thus
they are said to be "dancing", and hence the name "Google Dance".
The easiest way to check if the Google Dance is
happening is to go to www.google.com, and do a search. Look at the
blue bar at the top of the page. It will have the words "Results 1
- 10 of about 626,000. Search took 0.48 seconds". Now check the
same search on www2.google.com, and www3.google.com. If you are seeing
a different number of total pages for the same search, then the Google
Dance is on. You can also check all the variations above. www2 is
really www-sj, and www3 is www-va. We have found that all the others
need their full www-extension.google.com in the url if you want to
test them properly. Once the numbers, and the order of results on all
8 www's are the same, you know the dance is over.
Importance of The Google Dance
For most people,
this event in and of itself is not important. However for anyone in
the search engine optimization industry it is a period of note. First
off, we always get lots of calls from clients during the Dance. Pages
get temporarily dropped. Sometimes it lasts a day. People panic. Then
when they are re-added, they are better placed than before, and things
calm down. It's interesting to see how overpoweringly important this
one engine is.
The Technical Background of the
Google Dance
The Google
search engine pulls its results from more than 10,000 servers. This
means that when you type a question or query into Google, that request
is handled by one of 10,000 computers. Whichever server gets the query
has to have an answer for you within a fraction of a second. Imagine
putting all the books in the Library of Congress on the floor of an
airplane hanger and then asking for 'sun tzu art of war', and
expecting to be able to find the correct result in the blink of an
eye. Impossible to imagine isn't it? Yet we ask the search engines to
do this for us every day.
Google uses Linux servers. When the rebuild
happens, all 10,000 of these servers are updated. Naturally, there
will always be some variation from one index to the next - just
because there always are new sites being added, and content changes
being made that affect the placement of some websites. But during the
Google Dance, these variations are dramatic. One server after the
other is updated with portions of the new index, until eventually,
they are all updated with a completely new index database.
Google Dance and DNS
Not only is
Google's index spread over more than 10,000 servers, but also these
servers are in eight different data centers. These data centers are
mainly located in the U.S.
G oogle
uses multiple data centers to get results to the end user faster. If
you access a data center that is physically close to you, then in
theory, your connections need to make less hops - or navigate less
intersections - to get to the data center and back. Each data center
has its own IP address (numerical address on the internet) and the
Domain Name System (DNS) system manages the way that these IP
addresses are accessed. The system instantly routes your request to
the nearest, or least congested data center. It's then routed within
that data center facility to an idle server. In this way, Google is
using a two step form of load balancing by its use of the DNS tables
and then internalized traffic management. Therefore, the distance for
data transmissions can be reduced and the speed of response improved.
During the Google Dance period, all the servers in
all the data centers cannot receive the new index at the same time. In
fact, only portions of the new index can be transferred to each data
center at one time, and each portion is transferred to one after the
other. Different portions are uploaded to each server farm at
different times, which also affects results. When a user queries
Google during the Google Dance, they may get the results from a data
center which still has all or part of the old index in place one
minute and then data from a data center which has new data a few
minutes later. From the users perspective, the change took place
within seconds.
Building up a completely new index every month or
so can cause quite a bit of trouble. After all, the search engines
have to spider and index billions of documents and then process the
resulting data compiled into one cohesive unit. That's no small feat.
During the period outside of the Dance, there may
also be minor fluctuations in search results. This is because the
index at the various data centers can never be identical to each
other. New sites are constantly being added, old ones deleted, etc...
It is estimated that over 8 million new web pages are created every
day. Some of them are added to the search engines, and thus affect
search results.
Now, if you want Google's definition of the Google
Dance visit their page about the
Google Dance.
Looks like fun, I'd go!
(After an article wrote by Richard Zwicky)
|