Tips To Optimize Crawling and Indexing
August 11th, 2009 | 900 views RSS Feed
Issues like website architecture, crawling and indexing, as well as ranking issues always revolve around one central issue, i.e. 'How easy is it for search engines to crawl your site?' Google Webmaster Central Blog had discussed this topic many times and once again it has come up with a presentation and some key points to be considered: Here is the slideshow:
Every time new content is being created and uploaded on the Internet. But, with limited number of resources Googlebot can only find and crawl a definite percentage of content, out of the infinite number of content available online. And only a portion of the crawled content is then indexed by Google. Then comes the URLs. Well, URLs can be called as the bridges between a website and a search engine's crawler. Crawlers should be able to find and crawl the URLs in order to get to a site's content. Now, if a site's URLs are complicated or they have excessive words, search engine crawlers tend to spend more time tracing and retracing similar steps, but if the URLs are organized and lead directly to a particular content, then crawlers can find more time to access the content rather than crawling through empty pages, or crawling the same content over and over via different URLs. In the slides above, some examples of what not to do in this regard are given. Below are some recommendations regarding the complicated issue of URL crawling. By considering these you can help crawlers find your website's content faster. They are:
- Remove user-specific details from URLs: Consider removing URL parameters like session IDs or sort order from the URL and put them into a cookie. Putting this information in a cookie and 301 redirecting to a “clean” URL can help you retain the information and at the same time help reduce the number of URLs pointing to that same content.
- Rein in infinite spaces: If a website boasts a calendar that links to an infinite number of past or future dates (each with their own unique URL) or if it has paginated data that returns a status code of 200 when &page=3563 to the URL is being added, even if there aren't that many pages of data, this may be indication of the presence of an infinite crawl space on the website. In this case, crawlers could be wasting their bandwidth trying to crawl it all. These tips will help you know how to rein in infinite crawl spaces.
- Disallow actions Googlebot can't perform:Using robots.txt file , one can disallow crawling of login pages, contact forms, shopping carts and other pages whose sole functionality is something that a crawler can't perform. (Crawlers are notoriously cheap and shy, so they don't usually "Add to cart" or "Contact us.") This will allow crawlers to spend more of their time crawling content.
- One URL, one set of content: There should be one URL that leads to a unique piece of content or each piece of content can only be accessed via one URL. The one-to-one pairing between URL and content can help streamline a site for effective crawling and indexing. However, if your CMS or current site setup makes this difficult, you can always use the rel=canonical element to indicate the preferred URL for a particular piece of content.
For more information on optimizing a site for crawling and indexing, visit Webmaster Help Forum.
Click here to subscribe to our RSS feed to get a daily digest of news around search engine industry. PageTraffic SEO Blog is updated four times a day and is ranked as one of the best search engine resources blog by Pandia!
Did you like this article?
Related Posts
Comments
4 Responses to “Tips To Optimize Crawling and Indexing”
Leave a Reply
Connect with us
SEO Tools
FEATURED CATEGORIES
- adCenter (84)
- AdSense (113)
- AdWords (304)
- Analytics (54)
- AOL (5)
- Ask (101)
- Bing (35)
- Blogging (19)
- Copywriting (1)
- Directory (6)
- Google (1887)
- Industry News (812)
- Keyword Research & Targeting (22)
- Link Building (1)
- Link Popularity (60)
- Live (78)
- Local SEO (7)
- Microsoft (132)
- Mobile Search (13)
- MSN (170)
- PageTraffic Happenings (6)
- Panama (21)
- Pay Per Click (33)
- Reputation Management (1)
- Search Engine Conferences (174)
- Search Engines (95)
- SEO (223)
- SEO Tools (41)
- Social Media (19)
- Tips & Tricks (12)
- Web Marketing (4)
- Yahoo! (572)
- Yahoo! Search Marketing (66)










August 12th, 2009 at 09:34
Thanks for providing the very great tips.Great information.Really Nice.
August 14th, 2009 at 19:12
Thanks for those tips. I was not aware of using the tip of robots.txt file. That seems to be very helpful. thanks again
August 17th, 2009 at 04:41
Very nice write up and very well explained. Removing user-specific details from URLs is very effective and very helpful. BTW, nice slide. Thanks for sharing this information. More power.
Richard
September 11th, 2009 at 21:27
hi i have been waiting for google and bing to index my blog http://www.diyanswerdirect.com/blog for two weeks now and am getting very frustrated with them. aswell as iam still waiting for them to update my site http://www.diyanswerdirect.com. My question is when do they update and how can i get them to crawl or spider my site.