New Google Process for Detecting Near Duplicate Content!

February 28th, 2008 | 1,357 Views RSS Feed



If you're new here, you may want to subscribe to our Full RSS feed to get a daily digest of news around search engine industry.

For many webmasters, duplicate content always has been a persistent issue. Google even went ahead and got a duplicated content detection patent. However, now Google went even further and has come up with another patent application, developed by Monika H. Henzinger.

This new Google patent application explores how duplicate and near duplicate content might be detected at different web addresses. It further uses some different and existing methods for detecting near duplicate content. With the increasing popularity of blogs and RSS syndication, duplication of copies has also gone many folds high. With the result that now search engine companies including Google are doing their best to cut on the duplication of copies.

The patent research provides citations to a number of documents on the Web that explore the topic of duplicate and near duplicate content, including one of the processes developed by Moses Charikar, a Princeton professor, who is listed as the inventor of a Google patent, granted early last year. It discusses ways to detect similar content on the Web – Methods and apparatus for estimating similarity.

From those documents, Dr. Henzinger tests and explores approaches from each. While there were differences in how effective these approaches were according to tests run, the conclusion about their effectiveness in the patent application was that “neither of the algorithms worked well for finding near-duplicate pairs on the same Website, though both achieved high precision for near-duplicate pairs on different Websites.â€

The patent research paper concluded that, “These near-duplicate detection techniques performed well, particularly when analyzing Web pages from the same Website. These techniques did so without sacrificing much in the number of returned correct pairs.â€

Though we can not say that with this new patent duplication and near duplication would fully come to an end, but yes, things are going to be better. As they say, “Something is better than nothing.â€

Click here to subscribe to our RSS feed to get a daily digest of news around search engine industry. PageTraffic SEO Blog is updated four times a day and is ranked as one of the best search engine resources blog by Pandia!


 


Comments

This website uses IntenseDebate comments, but they are not currently loaded because either your browser doesn't support JavaScript, or they didn't load fast enough.

Leave a Reply

Back to Top

Life@PageTraffic on Flickr

Gallery outside bay areasSweet ReceptionGallery


More >>

Subscribe To Our SEO Blog


Enter your email address:

Delivered by FeedBurner

Search


PageTraffic on Facebook
SEO Blogs - Blog Catalog Blog Directory
Feedback Form