This blog post is Human-Centered Content: Written by humans for humans.
The Databricks Data and AI Summit (DAIS) has once again brought together industry leaders, innovators and data enthusiasts from around the world. There were many exciting announcements, but I’ll focus on the ones that stood out the most and will have an impact on the data community.
Unity Catalog: Advancing Interoperability with Apache Iceberg
Databricks announced full support for Apache Iceberg and aim for Unity Catalog (UC) to be the best catalog to use when working with Iceberg. Any Iceberg processing engine can now read or write to UC, which means you can use multiple tools outside the Databricks ecosystem that support the format like Snowflake and DuckDB. The support even includes cross-organization sharing through delta sharing just like any other UC asset.
The investments that have gone into Unity Catalog have made it a compelling contender as an open-source governance solution for managing data across multiple tools and commonly used formats.
LakeFlow: Orchestration Enchanments and No-Code Tool Announced
For those of you who don’t know, LakeFlow is Databricks’ unified data engineering solution for ETL/ELT pipelines with Lakeflow Declarative Pipelines (LDP, previously known as DLT). With LakeFlow, you can create, orchestrate and manage your pipelines in one central place.
This year, it was announced that Lakeflow jobs now support orchestration for PowerBI, Dbt (cloud/core), Snowflake, and any other API! It’s common to want a data transformation or dashboard update to be dependent on when your ingestion finishes. The power of centralized orchestrate is quite a tempting addition to Lakeflow, and I look forward to testing how robust it is with other APIs.
The other big announcement was the release of Lakeflow Designer, which is a no-code ETL tool. Lakeflow Designer allows users to share and collaborate using natural language to develop LDP pipelines. In addition, they have added interesting AI enhancement such as being able to “Transform by Example.” Have a desired format or slide you want to populate with your data? Paste a screenshot and have your pipeline create the syntax required to generate the results. The best part about Lakeflow Designer is that the pipelines are LDP pipelines under the hood, which means they include all features normally available. If you have an analyst create a pipeline but want a data engineer to review it, then they can see the underlying code and processes as they normally would with LDP pipelines. Lakeflow Designer will not be available for the general public yet and will be released in private preview in the next few months.
Databricks for Everyone: Free Edition
The old “free” edition of Databricks was the community edition. It was extremely limited in terms of what you could actually use and test to the point where you could not even complete some of Databricks’ free online training with it. Databricks has taken a huge step in a positive direction by releasing a true Free Edition and making all self-paced learning courses available at no cost. The free edition comes with many of the bells and whistle of the enterprise edition, like:
- Building AI Agents and applications.
- Analyzing data with Databricks SQL and build dashboards off your data.
- Playing around with AI/BI Genie, their compound AI system that allows for you to ask questions of your data and improve responses with feedback.
- Creating and designing Lakeflow data pipelines.
- Trying out the Databricks Assistant to help troubleshoot code, queries and files.
There are limitations to Free Edition, but it is an exciting strategic move to help students, enthusiasts and developers alike step into the world of Databricks. I expect that the low barrier to entry will strongly influence the skillsets of the next generation of the data community.
Got more questions about Databricks or how to take advantage of any of the new features I mention here? Reach out and see what we can do for you!