“The goal is to turn data into information and information into insight.”
This quote from former Hewlett Packard CEO Carly Fiorina really sums up what we at InterWorks feel is everyone’s ultimate goal, regardless of whether they’re doing statistical analysis, data modeling, visualizations or data reporting. We’re all trying to gain insight into what that data is trying to tell us. Raw data inherently doesn’t have that information; you have to work with it, shape it, clean it and create calculations to truly glean those actionable insights you crave.
However, as we’ve spoken with many clients over the years, we’ve discovered that the final step of data prep is where a lot of obstacles emerge. Perhaps you’re currently experiencing some of these speed bumps as well!
Possible Challenges in Data Prep
A few examples of the issues our clients have faced are:
- They have data in a lot of disparate locations, whether they’re different warehouses, a combination of warehouses and local files, or even multiple tables in the same warehouse.
- This has led users to workarounds such as custom SQL, multiple joins and/or blends in Tableau or slow, unsustainable Tableau workbooks due to an excessive number of calculations and LODs.
- They have employees and team members of various skill levels (some Python and SQL wizards, some who are not there yet) who are having trouble bridging the gap in order to work together, which contributes to duplicated work and miscommunication.
- Getting the data ready for investigation is a slow and often manual process with lots of download-to-Excel-and-edit.
- They’ve been hearing about data science, machine learning and AI and are very interested in using it but don’t know where to start.
If any of the previous problems resonate with you, may I please introduce you to Dataiku?
What Is Dataiku?
In Dataiku’s own words, it is one central, end-to-end solution for the design, deployment and management of analytics, ML and AI applications. Dataiku is infrastructure agnostic, working with all flavors of cloud and on-prem storage and compute. It is also inclusive of all skillsets, whether on the technical side working in code or on the business side with low to no code.
This is a very apt description of Dataiku, but I want to break it down even further and focus on two specific use cases: last-mile (or analytic) data prep and data science.
Last-Mile (or Analytic) Data Prep
When we discuss this type of data prep, it is not to be confused with enterprise-level ETL/ELT, which is often handled by Fivetran or Matillion. ETL/ELT is often utilized by Data Engineering/IT teams. Instead, last-mile/analytic data prep refers to the data preparation that happens before a particular report, visualization or analysis is created. This is more customized and often utilized by individuals or groups of data scientists, business analysts and data analysts.
Dataiku includes over 90 built-in data transformers for common data manipulations like binning, concatenation, currency conversions, date conversions, filtering, splitting, geospatial and more. Even when a transformer doesn’t exist in the library, users can quickly write formulas similar to those used in spreadsheets to accomplish almost any data transformation task.
Dataiku’s original name was Dataiku DSS where DSS stood for Data Science Studio. It was created with the intention of having a central location accessible and usable by the entire range of data scientists, from those who are just starting their data science journey to those who have been on the journey for a while and write their own models in R or Python. If someone needs assistance in model creation, there is a Lab section of a workflow that walks the user through the steps and how to create it with the user-friendly UI. If someone prefers to write their own model, they can upload that instead.
Where Does Dataiku Fit?
To summarize the previous section, Dataiku is a low-barrier-of-entry tool that bridges the gap between data sources/warehouses and data visualizations/reports, allowing all levels of users to be able to enhance the data through last-mile data prep and/or model building. It empowers analysts to work with and build the data they need for their analytics, and it removes some of the burden on the data engineers:
Why Should You Care About Dataiku?
There are three key aspects of Dataiku to keep in mind when deciding if it is right for your company. These qualities will answer many of the problems you may be facing (enumerated above) and some you didn’t even realize you had:
Dataiku was built with collaboration in mind. Through the Git integration inherent within Dataiku, multiple people can be working on the same project without having to worry about actively pushing or pulling through Git. There are also many internal documentation capabilities including, but not limited to, wiki pages, a discussion forum and a shared to-do list where you can tag coworkers.
As stated previously, Dataiku was created to be easily used by coders and non-coders alike. This keeps people and teams from being isolated in silos and allows cross-experience collaboration.
The cloud-based nature of Dataiku means that it can connect to many different data sources and warehouses in an efficient manner. It can also push the computation of every step of the process onto a database, making it so that you are not limited by or reliant on your local machine’s capabilities. A final advantage of Dataiku being cloud-based is that you can run any of the workflows you’ve created on a schedule, and you don’t even have to be logged in to your instance to do it.
How InterWorks Can Help!
With experts deeply familiar with Dataiku, we’re ready to partner with you and help guide your Dataiku journey, wherever it may lead you. Here are some specifics of how we can do that:
We can assist in the installation and implementation of Dataiku at your company.
We were the first of Dataiku’s partners to lead a training in their stead, and we continue to lead trainings to this day, both on site and virtually. Whether your company is brand new to Dataiku or needs more advanced training, we can tailor the right training plan for you.
With many consultants certified in Dataiku, we are ready to assist you in such tasks as building your first model, moving your manual last-mile data prep processes over or creating a Center of Excellence to help with adoption of Dataiku at your company.