Highlights:

  • Databricks claims that all Delta Lake customers can benefit from unsurpassed query performance with Delta Lake 2.0.

Databricks unveiled Project Lightspeed, the next-generation Spark streaming engine, in cooperation with the Spark community. Databricks, a pioneer of the data lakehouse architecture, recently announced the launch of Delta Lake 2.0, the next version of MLflow 2.0, its platform for building data lakehouses, at the Data AI Summit. MLflow 2.0 is the next generation of Databricks’ platform for managing the Machine Learning (ML) pipeline, which now includes MLflow Pipelines with templates for bootstrapping model development and several announcements around the Apache Spark data analytics engine that is an essential part of the Databricks platform.

Databricks also stated that it would open-source all the Delta Lake APIs as part of the Delta Lake 2.0 release. It will contribute all features and improvements made to Delta Lake to the Linux Foundation. Furthermore, the company unveiled Project Lightspeed, a new Spark Structured Streaming engine for data streaming on the lakehouse, and Spark Connect, which enables the use of Spark on almost any device.

“From the beginning, Databricks has been committed to open standards and the open-source community. We have created, contributed to, fostered the growth of, and donated some of the most impactful innovations in modern open-source technology,” said Ali Ghodsi, Co-Founder and CEO of Databricks.” Open data lakehouses are quickly becoming the standard for how the most innovative companies handle their data and AI. Delta Lake, MLflow, and Spark are all core to this architectural transformation, and we’re proud to do our part in accelerating their innovation and adoption.”

Delta Lake 2.0 brings the lakehouse to everyone

Databricks claims that all Delta Lake customers can benefit from unsurpassed query performance with Delta Lake 2.0 and will be able to construct a highly effective data lakehouse using open standards. With this, Databricks customers and open-source community members will benefit from Delta Lake 2.0’s full functionality and improved performance. The Delta Lake 2.0 Release Candidate is now available and anticipated to be released later this year. The Delta Lake ecosystem’s diversity makes it adaptable and effective in various application cases. A thriving community of more than 6,400 members, including developers from more than 70 contributing organizations, is what drives this.

Spark with next generation streaming engine anytime and anywhere

Spark expands smoothly to manage data sets of all sizes because it is a leading unified engine for large-scale data analytics. However, the demands of contemporary data applications are hampered by the absence of remote connectivity, and the burden of applications developed and run on the driver node. Databricks launched Spark Connect, a client and server interface for Apache Spark based on the DataFrame API, to address this issue. Spark Connect will decouple the client and server for improved stability and enable built-in remote connectivity. Users will be able to access Spark from any device using Spark Connect.

Databricks also unveiled Project Lightspeed, the next-generation Spark streaming engine, in cooperation with the Spark community. With a spike in the diversification of applications moving into streaming data, new requirements have come to the surface to support the most in-demand data workloads for lakehouse, data streaming. Since the beginning of streaming, Spark Structured Streaming has been widely used because of its ease to use, performance, extensive ecosystem, and developer communities. In order to improve performance, ecosystem support for connections, functionality for processing data with new operators and APIs, and simplification of deployment, operations, monitoring, and troubleshooting, Databricks will work with the community and promote involvement in Project Lightspeed.

Experts’ Take:

“Databricks provides Akamai with a table storage format that is open and battle-tested for demanding workloads like ours. The lakehouse powers interactive analytics at scale so that our customers can have near real-time analysis of security events within our Edge platform,” said Aryeh Sivan, VP Engineering at Akamai. “We are very excited about the rapid innovation that Databricks, along with the rapidly growing community, is bringing to Delta Lake. We are also looking forward to collaborating with other developers on the project to move the data community to greater heights.”

“The Delta Lake project is seeing phenomenal activity and growth trends indicating the developer community wants to be a part of the project. Contributor strength has increased by 60% during the last year, and the growth in total commits is up 95%, and the average lines of code per commit are up 900%. We are seeing this upward velocity from contributing organizations like Uber Technologies, Walmart, and CloudBees, Inc., among others,” said Executive Director of the Linux Foundation, Jim Zemlin.