Have you wondered how you can venture into the land of data science without being a data scientist? Most of us get intimidated with this idea of writing thousands of lines of code to build a machine learning algorithm. Well, that was true maybe a decade ago, but with Dataiku, Everyday AI is possible!
Dataiku has been instrumental in bringing AI (artificial intelligence) into the everyday workings of an employee, making the future of predictive analytics, as Dataiku says, “an organisation-wide collaboration.”
First, What Is Dataiku?
Dataiku is a one-stop solution for design, deployment and management of AI applications!
Below, we can see all the various spheres in Analytics and ML that Dataiku touches:
I would highly recommend reading the InterWorks blog article “Meet Dataiku, the Everyday AI Powerhouse for Your Data” written by Rachel Kurtz, InterWorks’ first Dataiku Neuron. Here, Rachel explains how Dataiku works as a machine learning and last-mile ETL prep tool.
In our Go! Dataiku webinar (more on that in a second), we had a brief look at machine learning being the science of teaching computers to perform tasks autonomously and learn from experience. It uses a combination of massive amounts of data, quick, repeated processing and clever algorithms to program computers to automatically learn from patterns or features in the data. There has never been a better opportunity to explore with machine learning thanks to the current high-performance cloud computing technology and the massive amount of data that has been gathered.
How Does Dataiku Fit into Our Analytics Tools Landscape?
Here at InterWorks, we have partnered with the best in technology, and our current tool landscape showcases how each of our partners fit in:
Dataiku is our preferred tool when it comes to machine learning and AI. It also acts as a last mile data prep tool, similar to other tools like Tableau Prep, Alteryx and Python.
We often hear this metaphor of last mile as “adding salt and pepper” to your food: You don’t actually cook a dish with just salt/pepper; its more about seasoning. So, if you need to add that salt and pepper to your corporate datasets, you will do it in the last mile.
For example:
- Light cleansing
- Adding formulas and business rules like new measures, derived fields, renaming columns and values, etc.
- Enriching with sources not stored in your data warehouse
- Light reshaping data via joining, pivot/unpivot, filtering, removing unrequired columns and rows
What Is Go! Dataiku, Then?
We at InterWorks developed a 90-minute, hands-on introduction to Dataiku Studio and the power of advanced analytics in driving business transformation. We call this event our Go! Dataiku webinar series. We recently ran our first interactive session on July 14 in the APAC region covering the end-to-end development cycle of a machine learning use case.
In that session, we discussed advanced analytics, which is the automatic or semi-automatic analysis of data using specialised tools and techniques, often in addition to those used in traditional business intelligence (BI), to uncover deeper insights, predict the future or produce recommendations. For businesses, using advanced analytics can mean the difference between staying ahead of the competition and falling further behind.
Here is where we get into the real essence of Go! Dataiku. Our presenter, Azucena Coronel, gave us a 90-mintue walkthrough from start to finish of a specific ML use case. Our task at hand was to predict which Scooby-Doo Mystery Gang is the best detective!
In order to complete this task, Azu took us through the below step-by-step procedure using Dataiku DSS and the Scooby-Doo Complete Episode List dataset from Kaggle:
- Data Connections: Explore the data connection options inside of Dataiku and then connect to the Scooby Doo dataset.
- Exploratory Data Analysis: Identify the target variable, relevant fields and data prep opportunities.
- Data Preparation: Dataiku provides an easy-to-use visual interface that dramatically speeds data preparation. Join and group datasets or aggregate, clean, normalize, enrich and deduplicate records – all with a few clicks.
- Training an ML Model: Here we choose the appropriate ML model to train our cleaned dataset. Dataiku AutoML uses leading algorithms and frameworks like Scikit-Learn and XGBoost to find the best modelling results in an easy to use interface for users across the business.
- Scoring your Data: Let the magic begin! Our ML model is ready to be used to score our prediction dataset.
Frequently Asked Questions (and Answers) About Dataiku
During our Go! Dataiku event, we received several common questions from the audience surrounding Dataiku. Perhaps the same questions are on your mind. Fortunately, we have some answers.
Q: Can Dataiku instance be installed on-prem?
A: Yes! It can be installed on-prem Linux Servers. You can find more documentation here.
Q: How does licensing work with Dataiku?
A: Dataiku has three editions: Free, Discovery and Enterprise. Check this page for details. The licensing cost is not based on the number of users, use cases or data volume.
Q: Is end-to-end automation possible with Dataiku?
A: Yes, and you can read about that here.
Q: What is the maximum data volume Dataiku can handle?
A: It completely depends on the architecture you implement and the processing power of your data warehouse, as one of the beauties of Dataiku is that it’s possible to push the computation to your data warehouse.
Q: Do we have to do data normalization manually or does Dataiku do it by itself?
A: Dataiku implements it automatically.
We’re happy to answer any other questions you might have about Dataiku that aren’t listed here. All you need to do is reach out to our team.
When’s the Next Go! Dataiku Event?
If you’ve gotten this far, you likely noticed that we’ve been talking about our Go! Dataiku webinar in the past tense. We have already a few events, but the good news is that we will be holding more! Even better, we’ll be holding them in various regions. Here’s what we currently have on the schedule but do check back often for additional dates.
- Go! Dataiku – 21 September – USCA / EMEA
- Go! Dataiku – 13 October – APAC (Link TBD)