Data Warehouses += Unstructured Data
Much of the world’s net new data is unstructured. More than 500 hours of video content is uploaded to YouTube every minute. An estimated 100k tracks are added to Spotify every day. More than 2 billion documents and emails are created daily in Microsoft 365.
Unstructured data is intended for and easily consumed by humans. However, business intelligence (BI) tools, like data warehouses, often require structured tabular information and expose a programming language interface to allow users to gain insights from this data. Foundation models now present an opportunity to understand and impart a schema on text, image, audio, and video content. This opportunity means that businesses can use BI tools across a broader surface area of information. Earlier this year, we met the founders of Roe AI, Richard and Jason, and were excited by their vision to build a data warehouse to carry out unstructured data analysis.
Traditionally, enterprises would have to invest in building custom OCR pipelines (via software like Tesseract) to extract information in images or videos. Further, they would have to build custom machine learning classifiers to transform data to match their business logic. As foundation models continue to improve (InstructBLIP, LLaVA, GPT-4, Gemini) they present the chance to consolidate this extract-transform software stack into a simple natural language prompt. Every organization now has the opportunity to leverage unstructured data analysis which was previously afforded by a team of machine learning engineers and scientists.
There is early evidence unstructured data processing represents a massive workload shift across data teams. This is seen in both consumption of unstructured data up 17x YoY at Snowflake and the amount of data in these warehouses growing 304% YoY at Databricks. The type and amount of usable data within enterprises is growing as well as the reflex to use it. Roe is rethinking the full stack of software enabling this workload shift from data storage to APIs.
Data-driven decision-making allows enterprises to listen to and act upon the voice of their customers. Roe is enabling businesses to widen their aperture. We continue to be excited by their offering and encourage you to try it out.
We are pleased to announce that we have led Roe’s seed funding round with participation from Ardent Ventures, Y Combinator, Orange Collective, key executives from Snowflake, and data leaders like Gu Xie from Group 1001 and Daniel Svonava from Superlinked. Roe’s platform is already being used by Arc, Revere, Crossmint and in pilot with Fortune 2000 companies.