即将到来的Databricks 的IPO是最受期待的IPO之一。Databricks成立于2013年，提供AI赋能的开源数据分析平台，如其联合创始人Ali Ghodsi所述，“该平台获得巨量的企业数据，通过机器学习和数据科学做出预测”。已有超过5000家公司使用Databricks的开源湖仓一体架构来处理、编程和分析其非结构化和半结构化数据，Databricks必定大有作为。
Ali：伯克利分校有一位非常出色的计算机科学教授--Dave Patterson，他向学生开放实验室和办公空间，让我们进行头脑风暴和协作。我们当中有计算机科学家、工程师、数学家和ML专家，大家一起工作看看我们能创造出什么，从这里诞生了Apache Spark。最初版本是为了将巨量的数据集加载到内存中。Spark为Databricks的大部分工作奠定了基础。因为自2009年在伯克利时我就一直致力于核心技术，因此2013年我和联合创始人成立Databricks时，也深度参与到产品的创建和编程中。
How Databricks CEO And Cofounder Ali Ghodsi Bet Big On The Cloud To Build A $28B Company
Databricks is one of the most anticipated upcoming IPOs. Founded in 2013, Databricks is an AI-enabled, open source data analytics platform company that, as co-founder Ali Ghodsi describes it, “takes massive, massive amounts of enterprise data and does machine learning and data science on top of it to predict things.” Databricks must be onto something since more than 5,000 companies use the company’s open source-driven lakehouse architecture to process, engineer, and analyze their unstructured and semi-structured data. Databricks’ success is evident by its most recent $1.6 billion financing, which valued the company at more than $38 billion. I recently sat down with Ali to discuss how he turned an open source project he helped start as a researcher at UC Berkeley—where he still serves as an adjunct professor in the computer science department—into a multibillion-dollar company, and what lessons entrepreneurs can learn from his journey. Here are some excerpts from our conversation.
Glenn：Tell us a little bit about how Databricks is helping companies analyze their data.
Ali：There are really an infinite number of ways to use, and we’ve been blown away by all the cool things our customers are doing. For example, Regeneron uses our ML algorithms to detect the gene in DNA that's responsible for chronic liver disease, and then they were able to develop a drug that targeted that particular gene. Or a company like Comcast uses Databricks to make their voice-activated remote controls work. When you talk to the remote control, that voice data goes into the cloud for Databricks to process using machine learning, and it figures out what you said and directs the TV to the right channel. And during the pandemic, hospitals used Databricks to get a real-time picture of how full their ERs were so they could redirect patients in ambulances to different hospitals that had space. Financial services firms are analyzing satellite data to make predictions about which global sectors and companies to invest in. Shell uses Databricks to monitor sensor data from 200 million valves to predict if any are going to break, so they can replace them ahead of time to keep systems running, save money, and ensure employees stay safe.
You helped build the foundational open source code for Databricks as a visiting scholar at UC Berkeley. Tell us about the journey of hacker-to-founder.
There is an incredible computer science professor at Berkeley, Dave Patterson, who just opened up labs and office space to students and said let’s brainstorm and collaborate. We had computer scientists, engineers, mathematicians, and ML experts, all just working together to see what we could create and out of that came Apache Spark. The earliest version was built to make it faster to load huge datasets into memory. Spark forms the foundation of much of what we’ve built at Databricks. When I co-founded Databricks in 2013, I was deeply involved in product creation and engineering because I had been working on the core technologies since 2009 at Berkeley.
Glenn：Did you plan for the hypergrowth Databricks has experienced?
Ali：Since the early days, we built a plan to grow fast with clear goals for how big we wanted the company to be. But our early vision was to one day sell the company for $100 or $200 million so clearly, we underestimated the potential. But what we did right is that we bet on a trend that lots of people said would never really take off, or at least wouldn’t for a few more decades. That belief was that the cloud would house all data and didn’t need an on-prem solution. Everyone told us we were crazy and needed an on-prem solution since companies had invested billions in data centers. One potential customer even offered us $20 million to build an on-premise version of our software. That was hard to turn down, but we remained steadfast in cloud-only, and a lot of those early doubters now use our SaaS product. Being bold is the only way to create really, really successful companies, because if you're trying to do one thing a little bit differently, the big companies will eat you up. They'll copy your strategy and do it better because they have more money and more engineers. So you have to think about what will happen in the future and then build a company for that future.
Glenn：How do you build a company on free open source, in this case Apache Spark, but also succeed in making money? How have you walked that fine line?
Ali：Open source is a double-edged sword. We built Spark at Berkeley, but then when we started Databricks, we built a new proprietary engine called Ignite. We quickly realized only open source would fuel really big growth. We’re an enterprise company but we’ve grown more like a B2C company such as Facebook or Twitter, because we’ve had such incredible viral evangelism from the open source community. The challenge, though, was getting anyone to pay for our product. Developers would come up to us at conferences and want selfies and tell us how much we changed their lives, but then we’d ask them if they’d like to pay for our SaaS services, and they’d say, “why would we do that, you guys give us the software for free.” So we had to figure out what to leave in the open source version, to ensure it was really valuable to developers, and what to include in our SaaS verison to make it valuable enough to companies that they’d pay for it.
Glenn：How do you make money?
Ali：We have a pretty unique model we call SaaS open source. It’s very different from on-prem open source where you can download the free software, and an open source company creates paid versions with extra features to download as well. For us, we are only SaaS so we just continually update the product in the background. We charge customers for this development as well as running, operating, and hosting the software. We also contribute constantly to the open source version of Databricks that’s entirely free, but our SaaS offering just has lots more features of interest to enterprises such as reliability, availability, and scalability. We have always been SaaS and only SaaS, with no crutch of on-prem revenue, so we had to get really good at delivering everything in the cloud from day one.
Glenn：What’s your secret to continued innovation?
Ali：One thing that has helped us innovate is staying relatively nimble; we have more than 1,700 employees today. But also, since day one, we always thought of that Steve Jobs quote about how you should “cannibalize yourself before someone else does,” and that you always pick the future technology instead of current revenue. All of our developers have the mindset that you should “kill your darlings”; there was no politics and no attachment to a certain way of doing things. Part of our innovative nature is that the first 20 or 30 people who built the early company were researchers from Berkeley who saw themselves as truth-seekers and believed the data should always decide, not the humans. Sometimes I’ve had to put the brake on innovation because the tech team is always like, “hey, we have this new great thing,” and they want to abandon everything they did yesterday before it even has a chance to become successful.
Glenn：You’ve said you’ll go public this year. What’s next for Databricks?
Ali：We’ve talked about being IPO ready, but it’s just a stepping stone for us. We are going to be around for a long, long time. Our market is gigantic and we’ve only just scratched the surface. We see the things our customers are doing with their data, and it’s incredible, but there’s so much more opportunity. People forget what the “I” means in IPO; it’s for “initial,” and that’s how we see it—just the start of a new journey
*作者简介：Glenn Solomon是GGV Capital的管理合伙人之一。GGV是一家专注于本地创业者的国际创业投资公司。Glenn Solomon关注从种子期到成熟期的企业技术初创公司，涵盖多个关键领域，包括开源、云服务、基础架构和网络安全。Glenn Solomon有20多年的创投经验，过去十年里帮助9家公司完成了IPO上市。Glenn Solomon也是播客“Founder Real Talk”的主理人，在节目中采访了多位创始人和初创公司高管，交流创始人们所面临的挑战以及如何在重重困难中成长。