InterWorks recently joined Dremio’s Global Partner Network, so naturally we’re here to give you a rundown of what Dremio is and when you’d look to use it. Check out our post announcing this new partnership.
Dremio is a SQL Lakehouse Platform that features a shared semantic layer and provides high-performance queries for interactive analytics on data lakes. This is in part due to the open data architecture ideology, which separates compute and data. In this paradigm, the customer’s data remains in their cloud storage, on top of which they can adopt and utilize best-of-breed technologies more flexibly than ever.
Sweet Dremios Are Made of This
Dremio utilizes several open-source projects for the internals:
- Apache Arrow as a columnar memory format
- Apache Arrow Flight for performant data transport
- Iceberg as an open table format (also supports Delta Lake)
- Project Nessie for atomic data changes
As for the pieces that developers will see, Dremio mainly focuses on physical datasets and virtual datasets. Physical datasets are the various data lakes and sources that contain the data. Virtual datasets are built on top of physical/virtual datasets to make up the semantic layer. This can be thought of as the presentation layer where users can access the available datasets.
Another important feature of Dremio is the ability to create “reflections”. These reflections are pre-aggregated copies of data that can be used over the physical datasets if the query optimizer determines the reflection would be more efficient. These reflections can also include sorting, partitioning and various levels of date/time roll-up. Multiple reflections can be set up to cover a variety of scenarios, and Dermio’s query optimizer will use the lowest-cost query that provides the data required.
Who Am I to Disagree?
Dremio isn’t for every customer, and it won’t fit everyone’s needs. It also doesn’t eliminate the need for data warehouses across the data ecosystem. In general, Dremio is going to best fit customers who have already made significant investments in large or multiple on-prem/cloud data lakes.
Many clients have their data in cloud data lakes with untapped potential due to more traditional query engines. These are typically slow and require users with very specific technical experience. Dremio can help these data lakes and interactive BI efforts flourish by providing extremely high performance and a user-friendly development experience.
That’s not to say that this is the only fit for Dremio. It can also help with offloading EDWs or migrating clients to the cloud. If a customer has overwhelming amounts of data in their EDW and the pricing no longer makes sense, they could be a good candidate for Dremio. Likewise, if their BI capabilities are being limited by a traditional RDBMS and they’d like to explore migration to a cloud data lake, Dremio could help enable their success.
I Travel the World and the Seven Seas
If your data landscape is a good fit for Dremio, there are plenty of data lake integration options, including:
- Azure Data Lake Store
- Amazon S3
- Amazon Glue
- Google Cloud Storage
- HDFS
- NAS
- MapR-FS
Dremio also supports many relational database systems, as well as several NoSQL databases.
Everybody’s Looking for Something
Want to learn more? Visit the Dremio docs, check out Dremio’s list of resources, start your path to proficiency at Dremio University, or reach out to us here at InterWorks!