startups

The Web’s All-Seeing Eye: This Startup Is Getting In On Google’s Game By Searching A Trillion Facts – Forbes


Cisco wanted to provide users of its conferencing system profiles on everyone who signed on to a call—say, where they used to work or if they turned up in a news article. It’s using a Mountain View, California, startup called Diffbot to make that happen.

Diffbot scours the Web and serves up its findings in customized bites for companies. For Cisco, it scrapes articles for mentions of conference call participants. For a sneaker company, it culls consumer reviews and discussion threads. For a business software firm, it finds prospective clients. Gathering this kind of data usually takes time or can be incomplete. Diffbot claims it scrapes nearly all of the public Web—and can produce search results in less than a second.

“It’s not possible for humans to learn everything,” says Diffbot founder Mike Tung. “So we have to design a system that can.”

Businesses in every industry are beefing up their data science teams and using artificial intelligence to model product demand, analyze competitive threats and source new customers. But this AI-powered analysis is only as good as the data that goes into it, and that’s where Diffbot comes in: It promises to provide better data, faster.

Tung, 36, says that after nearly ten years of false starts and technical breakthroughs, Diffbot has created an index that’s analyzed more than 90% of the public Web, tracking 1 trillion facts and counting.

That’s a lot, even compared to Google. The search engine’s knowledge graph, the source of all the answers that appear at the top of results when users ask specific questions, only had 70 billion facts the last time the company revealed the number in late 2016.

Like Google, Diffbot constantly crawls billions of Web pages, but instead of using that index to give people the best links to information, it provides businesses with data that they can then plug into their own analytics tools.

With $12 million in funding, Diffbot has nabbed big-name customers like Salesforce, eBay, Snapchat and Intel, and made a profit on last year’s nearly $5 million in revenue. Tung expects sales to double as more companies seek out large-scale datasets.

“Diffbot is like the secret sauce for a lot of companies,” he says.

Google vs. Diffbot

Tung was information-obsessed from an early age. He was born in Taiwan, but his family came to the U.S. for his father to attend college. When they were living in Marlborough, Massachusetts, his mother would take him to the library and he would methodically move from shelf to shelf. He didn’t choose books based on his interests: He wanted to read every single one.

“I was very comprehensive—at least until I got to a bigger library,” he quips.

Growing up, he reprogrammed video games on a Microsoft QuickBASIC compiler to score extra lives and wrote a computer model predicting stock moves, dabbling in day trading while attending high schools in Pennsylvania and Georgia. His tech chops impressed Microsoft during an internship, and it hired him to work on the Windows Vista team right after he graduated from high school. With a year of work experience under his belt, he bounced back to school for a computer science degree at the University of California, Berkeley, and then headed to Stanford for his Master’s in artificial intelligence and started to pursue a Ph.D. Throughout, he remained fixed on the idea of organizing human knowledge.

Just as it was impossible for him to read and synthesize all the books in the bigger libraries when he was a kid, sifting the public Web for information ran into limits. A big one: The Web was developed by humans to be understood by humans. Tung wanted to find a way to take the mess of data scattered across the public Web and organize it in a structure that computer systems could read.

He envisioned a search engine that would only deliver concrete answers. Instead of spitting out links where people could find information themselves, he wanted every search to automatically surface either an exact piece of information or a massive dataset for analysis.

So while working as a patent agent and doing search-related projects for eBay and Yahoo, Tung began to mold this vision into an actual company. After a few failed iterations, Diffbot finally gained traction with a tool that could retrieve and sort news articles. AOL signed on as a customer. With several hundred thousand dollars starting to flow in, Tung dropped his side hustles (and his Ph.D. research) to focus on Diffbot full-time in 2012.

A breakthrough came  later that year when he scored a meeting with billionaire Andy Bechtolsheim, the Sun Microsystems cofounder who made one of the first investments in Google. His pitch convinced Bechtolsheim to write a check for $100,000, the same amount he originally invested in Google founders Larry Page and Sergey Brin in 1998. Later that day, Bechtolsheim emailed to say that he wanted to double his investment.

Bechtolsheim’s check kicked off $2 million in angel investments and an eventual $10 million Series A led by Felicis Ventures and Chinese internet giant Tencent. The company says it’s currently valued at over $100 million.

“This is an incredibly hard problem, and Mike and his team of AI researchers have done a great job to deliver,” Bechtolsheim says today.

Diffbot now operates out of a cluttered office near the Mountain View Caltrain station, though it will soon be moving to a bigger space in Menlo Park, California. More than 20 of the team’s 30 employees are artificial intelligence researchers and engineers; Tung plans to double Diffbot’s workforce over the next 18 months.

Customers pay Diffbot a monthly fee, with different tiers depending on how much they use the service, ranging from $299 per month to custom pricing for large enterprises.  Cisco, for example, uses Diffbot to pull information from news articles about participants in its WebEx conferencing system. Privacy-focused search engine DuckDuckGo partners with Diffbot to enhance its queries. Diffbot says Amazon is using it to find prospective customers for its cloud-computing business. The company recently inked the company’s biggest deal yet: A seven-figure annual agreement with a government contractor.

While business has grown, so has the competition. The information economy has produced a host of other startups that gather and clean huge amounts of data. Companies like Import.io and WebHose have their own methods for scraping data from across the Web, and the space has already had a few exits, too: Palantir snapped up Kimono Labs and IBM bought AlchemyAPI.

Svetlana Sicular, a Gartner analyst who covers data management and AI, says that the breadth of Diffbot’s database may set it apart.

“I think Diffbot will be growing in importance,” she says, “Because they figured out how to sort the entire Web.”



READ SOURCE

Leave a Reply

This website uses cookies. By continuing to use this site, you accept our use of cookies.