Google is among the world’s most popular search engines. Millions of internet users visit the site every day to perform research or to look up and discover new websites.
But how does Google “know” about all those websites and web pages? Thousands of new internet sites and pages are developed every day, and even more are altered, updated and redesigned. So many web page creators and internet surfers wonder what keeps the Google search engine in the know.
Simply stated, Google’s spider is the key to the site’s efficacy as a search engine.
What is a “Spider” or “Crawler?”
Search engines such as Google develop software programs that are designed to “crawl” the millions of websites and web pages that comprise the internet. These programs are known as “spiders” or “crawlers.” Googlebot is the term used to refer to Google’s spider program.
According to software developer and internet enthusiast Alan Sparks, the term “spider” arose like this: “The internet is also called ‘the web’ and software programs like Googlebot navigate the web pages and websites that comprise the web. What navigates and walks around on a web? A spider – that’s how that term arose. And spiders don’t really walk per se – they crawl, hence the term ‘crawlers.’”
How Does the Googlebot Spider Work?
Googlebot never sleeps. Googlebot is constantly crawling new sites and re-visiting existing sites, updating Google’s “memory” or index as it goes.
Sparks explained how it all works: When a new website is created, the webmaster submits the website address to Google and other search engines like Yahoo! and MSN. The website is added to a list of new sites that Googlebot will visit. It can take several hours up to several weeks for the spider to pay its initial visit to a newly created website.
When the spider reaches the website, it automatically navigates through the site, scanning keywords and meta tags, navigating the inbound and outbound links, along with the other various site components. As Googlebot visits and “crawls” through the website, the software essentially forms a snapshot of the website and all its individual web pages. That snapshot or “memory” of the website and its individual web pages is cached or “filed.”
The cached information is then added to Google’s memory banks, also known as the index. The index is Google’s “memory,” and when a visitor types in a search term, Google searches its memory for websites and web pages that fit the bill.
At various intervals, Googlebot will revisit the websites in its index. The spider software will “crawl” the various components of the website again, forming a new snapshot. This new snapshot is then added to the index, thereby keeping Google’s memory very close to current.
How Does Googlebot Affect Website Developers and Website Visitors?
The best websites on the internet, are dynamic and ever-changing. But there is a delay between the time when the webmaster changes the website and when the new content appears in search results. Simply stated, it takes time for Google to “learn” about the changes on a web page, and it’s the Google spider that goes out and checks pages, searching for new content and updates.
When conducting a search on Google, the search results reflect the information that was available during Googlebot’s last crawl of the site.
Theoretically, the webmaster must take into consideration that he can change the content, layout, links or other components of the website, but Google will not “know” about these changes until the Googlebot revisits the site. So if the site is crawled every hour on the hour, and it’s visited by Googlebot at 1:00 p.m., any changes made after that time will not be evident to Google until the spider’s next crawl of the site at 2:00 p.m.
Notably, Google visitors have the option to view the cached page when looking at the websites and web pages in the search results. This option allows for faster loading, but this cached version of the page reflects what the page looked like when Googlebot last crawled the site, so any changes that have been made since then will not appear unless visitors click on the website’s link to actually visit the site.
Related Reading
Readers who enjoyed this article may also enjoy other Suite101 articles, including What is Twitter?, How Do I Create a Website and What is an RSS Feed?
Join the Conversation