Tips To Optimize Crawling and Indexing

August 11th, 2009 | 900 views RSS Feed



If you're new here, you may want to subscribe to our Full RSS feed to get a daily digest of news around search engine industry.

Issues like website architecture, crawling and indexing, as well as ranking issues always revolve around one central issue, i.e. 'How easy is it for search engines to crawl your site?' Google Webmaster Central Blog had discussed this topic many times and once again it has come up with a presentation and some key points to be considered: Here is the slideshow:

Every time new content is being created and uploaded on the Internet. But, with limited number of resources Googlebot can only find and crawl a definite percentage of content, out of the infinite number of content available online. And only a portion of the crawled content is then indexed by Google. Then comes the URLs. Well, URLs can be called as the bridges between a website and a search engine's crawler. Crawlers should be able to find and crawl the URLs in order to get to a site's content. Now, if a site's URLs are complicated or they have excessive words, search engine crawlers tend to spend more time tracing and retracing similar steps, but if the URLs are organized and lead directly to a particular content, then crawlers can find more time to access the content rather than crawling through empty pages, or crawling the same content over and over via different URLs. In the slides above, some examples of what not to do in this regard are given. Below are some recommendations regarding the complicated issue of URL crawling. By considering these you can help crawlers find your website's content faster. They are:

  • Remove user-specific details from URLs: Consider removing URL parameters like session IDs or sort order from the URL and put them into a cookie. Putting this information in a cookie and 301 redirecting to a “clean” URL can help you retain the information and at the same time help reduce the number of URLs pointing to that same content.
  • Rein in infinite spaces: If a website boasts a calendar that links to an infinite number of past or future dates (each with their own unique URL) or if it has paginated data that returns a status code of 200 when &page=3563 to the URL is being added, even if there aren't that many pages of data, this may be indication of the presence of an infinite crawl space on the website. In this case, crawlers could be wasting their bandwidth trying to crawl it all. These tips will help you know how to rein in infinite crawl spaces.
  • Disallow actions Googlebot can't perform:Using robots.txt file , one can disallow crawling of login pages, contact forms, shopping carts and other pages whose sole functionality is something that a crawler can't perform. (Crawlers are notoriously cheap and shy, so they don't usually "Add to cart" or "Contact us.") This will allow crawlers to spend more of their time crawling content.
  • One URL, one set of content: There should be one URL that leads to a unique piece of content or each piece of content can only be accessed via one URL. The one-to-one pairing between URL and content can help streamline a site for effective crawling and indexing. However, if your CMS or current site setup makes this difficult, you can always use the rel=canonical element to indicate the preferred URL for a particular piece of content.

For more information on optimizing a site for crawling and indexing, visit Webmaster Help Forum.

Click here to subscribe to our RSS feed to get a daily digest of news around search engine industry. PageTraffic SEO Blog is updated four times a day and is ranked as one of the best search engine resources blog by Pandia!


 


Comments

4 Responses to “Tips To Optimize Crawling and Indexing”

  1. agnes Says:

    Thanks for providing the very great tips.Great information.Really Nice.

  2. Donna G. Fraley - pass drug test Says:

    Thanks for those tips. I was not aware of using the tip of robots.txt file. That seems to be very helpful. thanks again

  3. seo company San Diego Says:

    Very nice write up and very well explained. Removing user-specific details from URLs is very effective and very helpful. BTW, nice slide. Thanks for sharing this information. More power.

    Richard

  4. lee Says:

    hi i have been waiting for google and bing to index my blog http://www.diyanswerdirect.com/blog for two weeks now and am getting very frustrated with them. aswell as iam still waiting for them to update my site http://www.diyanswerdirect.com. My question is when do they update and how can i get them to crawl or spider my site.

Leave a Reply

Back to Top

Connect with us

Connect us on twitter
Connect us on facebook
Connect us on flickr
Connect us on youtube

Life@PageTraffic on Flickr

Enjoying drinksGet me more drinks please!Relaxing time


More >>

Subscribe To Our SEO Blog


Enter your email address:

Delivered by FeedBurner

Search


PageTraffic on Facebook
SEO Blogs - Blog Catalog Blog Directory
Feedback Form