OCR For Indexing Scanned Articles– Google!
October 31st, 2008 | RSS Feed
Earlier the user had to create a PDF(text based and not image based) in order to make a PDF file indexed by Google. Otherwise it was not possible for Googlebot to recognize the content.
According to an official announcement done by Google, the case is no longer the same.
Google says: “This Optical Character Recognition (OCR) technology lets us convert a picture (of a thousand words) into a thousand words — words that can be searched and indexed, so that these valuable documents are more easily found.
While we've indexed documents saved as PDFs for some time now, scanned documents are a lot more difficult for a computer to read. Scanning is the reverse of printing. Printing turns digital words into text on paper, while scanning makes a digital picture of the physical paper (and text) so you can store and view it on a computer. The scanned picture of the text is not quite the same as the original digital words, however — it is a picture of the printed words. Often you can see telltale signs: the ring of a coffee cup, ink smudges, or even fold creases in the pages.”
“To see our new system at work, click on these search queries. Note the document excerpt in the search results, along with the full text presented after the 'View as HTML' link:
[repairing aluminum wiring]
[spin lock performance]
[Mumps and Severe Neutropenia]
[Steady success in a volatile world]” -Google
After clicking on the first example, I found out that the first result is a Consumer Product Safety Commission PDF. It was very easily scanned as an image.
Click here to subscribe to our RSS feed to get a daily digest of news around search engine industry. PageTraffic SEO Blog is updated four times a day and is ranked as one of the best search engine resources blog by Pandia!
- del.icio.us
- Digg
- Furl
- Rojo
- StumbleUpon
- Technorati
- Yahoo!
Did you like this article?
Related Posts
Comments
Leave a Reply
Connect with us
SEO Tools
FEATURED CATEGORIES
- adCenter (82)
- AdSense (113)
- AdWords (298)
- Analytics (53)
- AOL (5)
- Ask (101)
- Bing (33)
- Blogging (19)
- Copywriting (1)
- Directory (6)
- Google (1876)
- Industry News (805)
- Keyword Research & Targeting (21)
- Link Building (1)
- Link Popularity (60)
- Live (78)
- Local SEO (7)
- Microsoft (131)
- Mobile Search (13)
- MSN (170)
- PageTraffic Happenings (6)
- Panama (21)
- Pay Per Click (33)
- Reputation Management (1)
- Search Engine Conferences (153)
- Search Engines (95)
- SEO (222)
- SEO Tools (40)
- Social Media (19)
- Tips & Tricks (12)
- Web Marketing (4)
- Yahoo! (567)
- Yahoo! Search Marketing (66)









