New York City: the city that never sleeps. It’s the home of Wall Street, the Empire State Building, limitless talent (shoutout to the cast of Hadestown) and so much more.
At the end of this past June, I, alongside a team of my fellow InterWorkers, had the chance to head to the Big Apple to attend the Everyday AI Conference, hosted by Dataiku, where we got to learn about what’s on the horizon for both Dataiku as a tool and the artificial intelligence / machine learning space as a whole.
Above: Members of the InterWorks crew sent to NYC.
Hot Off the Presses: Dataiku 11
For those unfamiliar with Dataiku, it is essentially a tool that combines the powers of last-mile data prep with data science capabilities. Rather than spending hours writing custom Python code to perfect a single model, you can leverage Dataiku’s AutoML features to train several models and tune hyperparameters in just a few clicks. You can take a deeper dive and find out more in this blog by InterWorks’ resident Dataiku Neuron, Rachel Kurtz, who also spoke at Everyday AI.
Dataiku is continuing to add more features to their product, with some exciting new ones coming soon with Version 11. For a deep dive into the new features of Dataiku 11, check out Dataiku’s presentation here. While definitely not an exhaustive list, here are some of the ones we’re most excited about:
- Visual time series forecasting – Until this version, Dataiku only supported predictive models and not forecasting models. You would find options for your standard logistic regression or random forest models, but not ARIMA models, for example. Forecasters rejoice, as you will now see time series forecasting as an option in the lab next to the other predictive model options.
- Visual deep learning – Image classification has been added to the lab under visual analysis alongside object detection. Pre-built models have been included to save time on manual model development, and data augmentation allows users to generate more training/testing data by altering existing images (e.g., rotating/cropping images, flipping them, altering the colors, etc.).
- Centralized feature store – Dataiku has always been about collaboration among individuals as well as across an organization. This centralized repository can store curated datasets that can then be pulled into other projects, minimizing the time spent manually sharing these same assets.
- Flow documentation generator – As we all (hopefully) know, you should comment your code. Dataiku’s version of commenting code (besides actually commenting any written code) would be creating assets such as project wikis that contain any information you might need to understand a project. With this new generator, you can instantly create a document that includes a screenshot of a project flow and a summary of all datasets, recipes and folders. Features like this can help save so much time on a task that is often seen as mundane and tedious. Users can also provide custom documentation templates personalized to their organization. Lastly, this can be included as an automated task in a project, so documentation can be self-updating.
- Model stress tests – How would your model behave if something in the input data (e.g., the distribution of a feature) were to drastically change? Sure, we could programmatically create our own stress tests. On the other hand, we can now use Dataiku 11’s built in stress tests, which allow you to create these tests much more quickly and efficiently.
- Expanded visual logic – Whether it’s in Dataiku or Microsoft Excel, most of us have, at one point or another, created a long, convoluted string of if/else statements into a single formula. While doable, this obfuscates the purpose of the formula and makes it more prone to errors. In Dataiku 11, prepare recipes, as well as other filter locations, will include a new UX for developing nested if/else statements in order to remedy these issues. Additionally, the Switch() function has been added for more complex rules both in written formulas and as a processor in prepare recipes.
Above: InterWorkers manning our booth in between presentations.
What Lies in Store for AI/ML in the Future
While we loved learning about all the cool new tech coming in Dataiku 11, there was plenty of discussion surrounding high level concepts and visions for the future. One of the first presentations on the second day of the conference came from Florian Douetteau, CEO of Dataiku, where he reminded me of something important and presented an idea of how we build a world of self-service analytics that includes predictive elements.
First, when we talk about the use of AI in corporations, it’s important to remember that this is not necessarily referring to deep learning or complex neural networks. The term AI, at the end of the day, is a buzzword. As a buzzword, it is commonly associated with the complex, sexy solutions that people think are awesome. However, a linear regression model is just as much AI as a convolutional neural network.
To corporations newer to utilizing AI in your day-to-day operations: Do not be scared by the thought of needing subject matter expertise in ultra-complex solutions. We all have to start somewhere, after all. Florian also discussed how companies can structure themselves to best implement Dataiku and predictive analytics in their self-service ecosystems.
According to him, companies should create “analytics tribes,” where a small group, or groups, can combine domain knowledge and tech skills. These groups would be your primary Dataiku designers, as they can then act as liaisons between business users that know the domain but not the technology and tech experts that know the programs but not the domain.
Our First Neuron, Our First Speaker
Our very own Rachel Kurtz gave a presentation alongside the Head of Marketing at Meta, Nicole Alexander, about how we move towards a world fully driven by analytics. In fact, their proposal is that we should work towards a grand total of one billion knowledge workers. However, how exactly do we get there?
Above: Rachel Kurtz on stage for her presentation.
A common problem that we see today is that working in analytics and data science is seen as unattainable to most people, where jobs require masters degrees, if not doctorates. To combat this, both Kurtz and Alexander suggest that, in the future, data science and analytical knowledge will become a skillset rather than a profession. Rather than seeking a job solely as a data scientist, one can seek a job in a given vertical that would include data science work.
By eliminating this barrier, the world of data becomes infinitely more accessible to countless people, allowing all sorts of people to contribute. They also gave advice to companies to prioritize their current employees and foster any interest they may have in data science, being sure to include professional development time that can be used to progress along their data science journey.
But how do we start working towards one billion knowledge workers? During the conference, there was no shortage of companies detailing how they were using Dataiku to strengthen their analytics ecosystems. We heard stories from companies such as Pfizer, Ralph Lauren, Boeing and more, about their digital transformation journey towards self-service environments. Just about every company talked about the collaborative capabilities of Dataiku that broke down existing silos that allowed them to democratize their analytics.
However, one thing that was mentioned several times was that many companies held internal trainings, and that these trainings were a key part of their adoption of Dataiku. For companies who have either already invested in Dataiku or are considering it, this is where we, InterWorks, can help out. We have people across the globe licensed to deliver Dataiku trainings, and we assist in your journey of Dataiku adoption and your journey to self-service analytics. First up, if you haven’t already, give Dataiku a try with a 14-day free trial.