This is a Static Page

By John Davi
October 15, 2013

Diffbot, inventors of computer vision technology that sees web pages like humans do, today announced the release of its Product API, which automatically identifies and extracts product data from any shopping web page.

Diffbot also announced updates to its Crawlbot spidering service, which can accurately determine which pages on a shopping site are product pages. Diffbot now offers a turnkey solution for retrieving the entire catalog from any e-commerce site -- without need of a published API or any action on the part of the retailer.

Developed over the course of two years, the Product API’s pioneering algorithm is built on Diffbot’s core vision technology which has accurately extracted structured data from billions of web pages. The API advances Diffbot’s machine learning, natural-language processing and computer vision systems to identify and structure information regardless of a site’s design, layout, markup or even its (human) language.

The Product API automatically makes available data such as price, discount/savings, shipping cost, product description, images, SKU and manufacturer's product number. The technology allows developers to immediately use product data from any e-commerce site in their web or mobile applications.

The Product API will enable developers to rapidly build applications that can:

"E-commerce is one of the most popular activities on the web. With 28% of US internet users shopping on a daily basis, we figured we should teach our robot how to understand products," said Mike Tung, CEO of Diffbot.[1] "The Product API represents our latest advances in pushing the capabilities of automated page extraction. We are one step closer to the imminent goal of making the entire web machine-readable."

Last year, Diffbot conducted a study which found that 8% of links shared on Twitter are for product pages -- a total of more than eight million product links per day.[2] [3] [4] Just as with news articles, intelligent automation to help sift through the vast quantities of products offered and shared online is something needed by consumers and businesses alike.

The Product API joins Diffbot's previous computer vision APIs, including the Frontpage API (for extracting content from home pages), the Article API (for extracting news article and blog post content), the Image API, and its Page Classifier API, which automatically determines the type of page of any web link.