How Much Time Does It Take Googlebot To Spider 29.5 Million URLs?

September 12th, 2007 | RSS Feed



If you're new here, you may want to subscribe to our Full RSS feed to get a daily digest of news around search engine industry.

How long for Google to spider 29.5 million pages? asks a forum in Webmaster World.

dberube posts:

"We launched a new site about 2 weeks ago that has about 29.5 million unique, search engine friendly, URLs. Since we launched Googlebot has visited us 22 times and has hit just over 2,000 files. My question is, how long do you think it would take Google to spider the entire site and get them into their index?"

Some of the answers are:

Pageoneresults writes, "PageRankā„¢ is the determining factor when working with that volume of pages. Your best bet is to release groups of pages for indexing as you garner more PR. Block all but the most important ones right now. Without the PR, you're going to have 29.49 million pages in the Supplemental index at the end of those 56.52 years."

mbennie writes,

"I have been helping a friend with a new site that has 6.8 million uri's of reasonably original content. The site went live 3 weeks ago. G-Bot started grabbing about 1K pages/day after 10 days. It kept on that schedule until a few days ago when it increased to 30K pages/day. I expect that speed will increase once again before too long. The fact is that G-Bot could crawl the entire site in less than a day if it wanted to. G-Bot is very conscious of whether or not it is going to crash a server and from what I have seen it won't take more than about 150 pages/second at peak – but right now its throttled down to 1 page/2 seconds. I suspect it also grabs a small set of pages and checks them for spam/duplicate content before deciding to crawl a site more aggressively."

Discussion continued at Webmaster World.

Click here to subscribe to our RSS feed to get a daily digest of news around search engine industry. PageTraffic SEO Blog is updated four times a day and is ranked as one of the best search engine resources blog by Pandia!


 


Comments

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

Leave a Reply

Back to Top

Connect with us

Connect us on twitter
Connect us on facebook
Connect us on flickr
Connect us on youtube

Life@PageTraffic on Flickr

Washroom AreaCafe f5Gallery outside bay areas


More >>

Subscribe To Our SEO Blog


Enter your email address:

Delivered by FeedBurner

Search


PageTraffic on Facebook
SEO Blogs - Blog Catalog Blog Directory
Feedback Form