Webmaster Papers








60 Day Sandbox for Google & AskJeeves; MSN Indexes Quickest, Yahoo Next


Search engine listing delays have come to be called the Google Sandbox effect are actually true in practice at each of four top tier search engines in one form or another. MSN, it seems has the shortest indexing delay at 30 days. This article is the second in a series following the spiders through a brand new web site beginning on May 11, 2005 when the site was first made live on that day under a newly purchased domain name.

First Case Study Article

Previously we looked at the first 35 days and detailed the crawling behavior of Googlebot, Teoma, MSNbot and Slurp as they traversed the pages of this new site. We discovered the each robot spider displays distinctly different behavior in crawling frequency and similarly differing indexing patterns.

For reference, there are about 15 to 20 new pages added to the site daily, which are each linked from the home page for a day. Site structure is non-traditional with no categories and a linking structure tied to author pages listing their articles as well as a "related articles" index varied by linking to relevant pages containing similar content.

So let's review where we are with each spider crawling and look at pages crawled and compare pages indexed by engine.

The AskJeeves spider, Teoma has crawled most of the pages on the site, yet indexes no pages 60 days later at this writing. This is clearly a site aging delay that's modeled on Google's Sandbox behavior. Although the Teoma spider from Ask.com has crawled more pages on this site than any other engine over a 60 day period and appears to be tired of crawling as they've not returned since July 13 - their first break in 60 days.

In the first two days, Googlebot gobbled up 250 pages and didn't return until 60 days later, but has not indexed even a single page in 60 days since they made that initial crawl. But Googlebot is showing a renewed interest in crawling the site since this crawling case study article was published on several high traffic sites. Now Googlebot is looking at a few pages each day. So far no more than about 20 pages at a decidedly lackluster pace, a true "Crawl" that will keep it occupied for years if continued that slowly.

MSNbot crawled timidly for the first 45 days, looking over 30 to 50 pages daily, but not until they found a robots.txt file, which we'd neglected to post to the site for a week and then bobbled the ball as we changed site structure, then failed to implement robots.txt in new subdomains until day 25 - and THEN MSNbot didn't return until day 30. If little else were discovered about initial crawls and indexing, we have seen that MSNbot relies heavily on that robots.txt file and proper implementation of that file will speed crawling.

MSNbot is now crawling with enthusiasm at anywhere between 200 to 800 pages daily. As a matter of fact, we had to use a "crawl-delay" command in the robots.txt file after MSNbot began hitting 6 pages per second last week. The MSN index now shows 4905 pages 60 days into this experiment. Cached pages change weekly. MSNbot has apparently found that it likes how we changed the page structure to include a new feature which links to questions from several other article pages.

Slurp gets strangely inactive then alternately hyperactive for periods of time. The Yahoo crawler will look at 40 pages one day and then 4000 the next, then simply look at the home page for a few days and then jump back in for 3000 pages the next day and back to only reviewing robots.txt for two days. Consistency is not a curse suffered by Slurp. Yahoo now shows 6 pages in their index, one an errors page and another is a "index/of" page as we have not posted a home page to several subdomains. But Slurp has crawled easily 15,000 pages to date.

Lessons learned in the first 60 days on a new site follow:

1) Google crawls 250 pages on first discovery of links to site. Then they don't return until they find more links and crawl slowly. Google has failed to index new domain for 60 days.

2) Yahoo looks for errors pages and once they find bad links will crawl them ceaselessly until you tell them to stop it. Then won't crawl at all for weeks until crawling heavily one day and lightly the next in random fashion.

3) MSNbot requires robots.txt files and once they decide they like your site, may crawl too fast, requiring "crawl-delay" instructions in that robots.txt file. Implement immediately.

4) Bad bots can strain resources and hit too many pages too quickly until you tell them to stay out. We banned 3 bots outright after they slammed our servers for a day or two. Noted "aipbot" crawled first then "BecomeBot" came along and then "Pbot" from Picsearch.com crawled heavily looking for image files we don't have. Bad bots, stay out. Best to implement robots.txt exclusions for all but top engines if their crawlers strain your server resources. We considered excluding the Chinese search engine named Baidu.com when they began crawling heavily early on. We don't expect much traffic from China, but why exclude one billion people? Especially since Google is rumored to be considering a possible purchase of Baidu.com as entry to Chinese market.

The bottom line is that we've discovered all engines seem to delay indexing of new domain names for at least thirty days. Google so far has delayed indexing THIS new domain for 60 days since first crawling it. AskJeeves has crawled thousands of pages, while indexing none of them. MSN indexes faster than all engines but requires robots.txt file. Yahoo's Slurp crawls on again off again for 60 days, but indexes only six of total 15,000 or more pages crawled to date.

We seem to have settled that there is a clear indexing delay, but whether this site specifically is "Sandboxed" and whether delays apply universally is less clear. Many webmasters claim that they have been indexed fully within 30 days of first posting a new domain. We'd love to see others track spiders through new sites following launch to document their results publicly so that indexing and crawling behavior are proven.

© Copyright July 18, 2005 Mike Banks Valentine

Mike Banks Valentine is a search engine optimization specialist who operates WebSite101 eCommerce Tutorial and will continue reports of case study chronicling search indexing of Publish101 Article Resource

Click to Contact Mike Valentine

RELATED ARTICLES


How To Conceal Your Website?s ?Fingerprint? From The Search Engines
The internet marketing industry is now flooded with various page and portal generators.
Search Engine Optimisation ? Why And How To Do It
In the following article I will give you some basic guidelines to follow in order to help get your web site ranked at the top of popular organic search engines. More detailed information can be found in our DIY Internet Marketing Guide "Start at the Beginning" Click here for an excerpt: http://www.enable-uk.co.uk/html/book_2.html
So, Where Has Your Search Engine Been Today?
Visit Google, Yahoo, MSN or one of the lesser search engines, and you get a few million results for just about any search term. Despite this impressive depth of results, most users consider only a few of the WebPages being pointed to. A lot of research indicates that most searchers exit search engine result pages to visit one of the top three results. That raises the question: What about the remaining million plus results?
Link Popularity Explained and How To Build Links
Link popularity is the single most influential factor for determining how well a web site will perform in search engine rankings. A web site's link popularity is computed from the number and more importantly, the quality of links pointing to a web site.
Basic Search Engine Promotion
I can't tell you how many times people have asked me, "How to do I get my site to come up in the top ten of the search engines so I can get hits and sell my product?"
Search Engine Optimization Tips For 2005 - Part Three
Welcome to part three of our series of articles on search engine optimization. In the third and final part of our series of articles on search engine optimization we cover the topic of links, the types of links and what makes them so important.
Are Your Keywords Making Money for You?
I built my website, it's perfect. My chosen subject of the website is Computer Support Services. Of course this is an example, but moving along, what should my keywords be?
The Myth of Guaranteed #1 Ranking in Search Engine Marketing
You've seen the ads: Guaranteed #1 Ranking! There are no guarantees in search engine marketing and website promotion. If anyone tells you different, you should check quickly to make sure they don't have their hand in your wallet.
2 Lesser Known Ways to Brainstorm for Internet Home Business Keywords
Search Engine Optimization (SEO) doctrine states that you should always find a keyword that has little to no competition and that has high demand so that you can rise to the top of the search engine results and dominate that particular keyword. The problem with the internet home business area is that many people online are creating new websites with the typical keywords like "work from home" and "home business" so as a result it is almost impossible to rise to the top of the search engines or risking having to pay lots of money for pay per click advertising.
Get More Cosmetic Surgery Patients From The Web
It's no surprise that dominant cosmetic surgery practices also have a dominant web site and presence. More than ever, cosmetic surgery patients utilize the Internet to help select a credible surgeon. If you want more quality leads and patients, it's time to upgrade your web site marketing efforts.
How To Design A Search Engine Friendly Website
There are many websites that fail to target their required traffic, even if they've had some search engine optimisation work done. One of the main causes for this is simply because the website isn't search engine friendly. This is a basic essential that needs to be incorporated into the design of all websites at the outset ? think of it as the foundation to establishing your search engine optimisation strategy.
Search Engine Optimization: Get the Low-down
Been hearing the words Search Engine Optimization lately? I know I have. But what is it, you wonder? Well, that's what I'm here to talk to you about.
Why Top Search Engine Placements Never Move?
#1 question when it comes to web advertising is how do I get top 10 search engine placements for the terms I wish to aquire?
How To Rank High On MSN Search
The new MSN Search is quickly gaining popularity among internet search engine users. Google still being the #1 search engine with Yahoo in #2, and MSN in a strong #3. Each of the three major search engines has their own unique algorithm, and MSN is no different. Over the past several months I have had amazing success with MSN. With a few tweaks and tricks I have all of my sites ranked top 3 (mostly #1's) with my targeted key words. This article will give you a quick look at how I have managed to rank so high so often.
How Search Engines Work
Before anyone can start optimizing a web site, you must understand how search engines work.
No Cost Search Engine Marketing
As a matter of fact, I recommend NOT wasting money on pay inclusion in most cases because it doesn't offer enough of an advantage (and many times the fees give you absolutely no advantage - the only exception are the few sites that guarantee placement within a specific timeline). Focus your online marketing and gain positive and targeted traffic without paying out for "expedited listings" or "submission software."
An Ethical Alternative to Doorway Pages
Definition: A doorway page is content created specifically for the purpose of garnering high placements in the search engines.
STOP Writing for Search Engines
Back when I was starting out with my first internet venture, I did a crazy thing. I subscribed to a Search Engine Optimization newsletter. These guys send a weekly email with their bundle of latest tips. For the first few months, I actually followed what they said. Now, I just keep my subscription to get a few laughs.
All You Need to Know About Your Website Traffic is Contained in 5 Key Stats
You built that content rich web page optimized for your Keyword phrases but is it really working? By looking at these 5 Stats readily available from your CPanel you will quickly learn what you are doing well and what you need to improve on.
One Way Linking Campaigns II
There is a way to generate links with the content that you have not as yet created. For this contact the established authorities (writers, publishers ) in your domain area & let them know that you are available as a resource for researching & writing on any topic from the chosen domain. When they will use you they will credit you for it. Also submit your articles to them. If they ever quote you they will link to you & the added advantage will be that their articles will get published in good places. Imagine an incoming link to your site from TIME or NATIONAL GEOGRAPHIC.