Highlights:

  • Galileo rapidly identifies incorrect/bad unstructured data (mislabeling, imbalance, drifting data, etc.) with actions and integrations to correct them, all inside a single platform.
  • Galileo also announced that it had raised USD 18 million in Series A investment, bringing the total amount raised to USD 23.1 million.

Galileo, the first Machine Learning (ML) data intelligence company for unstructured data, has introduced Galileo Community Edition, a free version of its platform that enables data scientists working on Natural Language Processing (NLP) to build high-performing ML models with better quality training data.

Today, more than 80% of the world’s data is unstructured (text, image, speech, etc.). Before Galileo’s launch six months ago, there was no tool on the market for debugging and fixing unstructured data during the ML workflow, so data scientists spent most of their time data debugging in Excel sheets and Python scripts, resulting in the productionization of high-quality models taking months.

Vikram Chatterji, co-founder and CEO of Galileo, said, “While data powers ML, debugging unstructured data is incredibly manual and time-intensive. My co-founders, Atindriyo Sanyal and Yash Sheth, and I noticed a complete absence of data-focused tooling for unstructured data ML while at Apple, Google, and Uber AI. We repeatedly heard the same from data science teams across the globe. This is why we started Galileo – to build ML unstructured data tooling. Today we are making Galileo available for free through the Galileo Community Edition for any data scientist to sign up and get the superpowers to fix their ML data instantly.”

Galileo rapidly identifies incorrect/bad unstructured data (mislabeling, imbalance, drifting data, etc.) with actions and integrations to correct them, all inside a single platform. Galileo reduces the time required for data scientists to curate a high-quality training dataset from weeks to minutes by eliminating data mistakes and identifying the highest-value production data.

With Galileo Community Edition, anybody can join up for free, contribute a few lines of code while training their model with labeled data or during an inference run with unlabeled data, and then use the sophisticated Galileo UI to check, discover, and correct data problems, or pick the valid data to label next.

Galileo’s Demo Hour

Galileo’s online event begins on November 15 at 10 a.m. PT with a fireside conversation with Anthony Goldbloom (creator of Kaggle), lightning presentations by customers on how they are quickly debugging unstructured data and generating stronger ML models, and a live demonstration of Galileo Community Edition.

In addition, Galileo also announced that it had raised USD 18 million in Series A investment, bringing the total amount raised to USD 23.1 million. This investment was headed by Battery, with participation from prior investor The Factory, new investors Walden Catalyst and FPV Ventures, and industry heavyweights Anthony Goldbloom, Pegah Ebrahimi (former COO at Morgan Stanley), and Wesley Chan (former general partner at Google Ventures). Galileo wants to utilize the new funding to develop its platform to handle new data modalities such as Computer Vision and to expand its technical and sales staff (CV).

Dharmesh Thakker, a general partner at Battery Ventures and Galileo board member, said, “It’s no secret that the ML training and data quality problems are ballooning along with the rise in ML adoption. The Galileo team has been laser-focused on this problem and has taken a unique approach to providing quick time-to-value with a category-defining product. Going forward, ML data intelligence will be table stakes for ML teams, and we feel Galileo is extremely well positioned to capitalize on this trend.”

Lip-Bu Tan, founding managing partner of Walden Catalyst and Galileo board member, said, “At Walden Catalyst, we’ve observed an exponential adoption of ML with unstructured data in enterprises as models get commoditized and ML accuracy is now increasingly dependent on the quality of the data the models are fed. At Apple, Google, and Uber AI, the founders of Galileo faced the challenges of not having any solutions while working with unstructured data to find and fix ML data errors fast. They tackle this fundamental problem head-on with a first-to-market solution. This is a huge and critical problem in a rapidly growing enterprise market, and we are excited to back them.” Tan also sits on Intel’s board and has seen 130 companies he invested in IPO.