Transcript
Hi all, welcome to today's fireside chat about how businesses are using Tableau and GCP together to help more people see and understand data.
I'm François Zimmerman, the Field CTO for Tableau here in EMEA. And today, I'm joined by Sergej Barkar from InterWorks. We've worked with InterWorks on a number of Google Cloud projects for government and commercial customers. And so I'm really excited to discuss some of the impacts and best practices that we've seen out in the field.
So before we kick off the discussion, let's get to know the panelists. It's a real pleasure to introduce Sergej Barkar. Sergej, can I ask you for a few words about your background?
Of course. Thank you, François.
So my name is Sergej Barkar. I'm a Solutions Lead for InterWorks. I started back in 2014 with the UK team. So we transitioned to work with our APAC customers based out of Sydney. Now I'm back in Europe working out of gloomy Zurich. I mostly work with our larger customers, so I have a deep breadth of experience in a variety of different verticals.
Great, thanks for joining us.
I work as the Field CTO here for Tableau across EMEA, and so I divide my time between tech alliance partners like Google, helping to solve those data integration challenges to make sure we can connect to the data sources that matter the most. And also the other fifty percent of my time is working with strategic customers on defining our joint data strategy and helping to inform the product roadmap.
So that's what we do day to day.
For Tableau as a whole, there's really one key question that matters to us when we're helping our customers to leverage the power of cloud data services. That question really is, how many people across your business are really using data to support every key decision and customer interaction? So I've got some stats on this. We did a survey with the IDC, looked at more than four hundred IT decision makers.
What we found is that even though, you know, eighty-seven percent of C-level people said becoming data-driven is super important, only thirty percent of frontline personnel said that their actions were actually driven by data analysis.
And so what we see is there's often this big gap between the ambitions of companies to be data-driven and the actual execution in terms of the number of frontline people that are using data to support decisions. And if they're not using data, then they're obviously using bias or what has worked in the past or whatever. So you at InterWorks work with a lot of companies to help more people across the whole business see and understand data. What do you think businesses are doing to help address this?
Yeah, absolutely. That's a very good question. Well, look, there's definitely been historically the whole idea that BI is mostly for the privileged folk within the company, the ones that know what they're doing, quote-unquote.
But that's definitely changing. Look, ultimately, these sort of report factory type of models of BI, they're no longer really cutting it. You've really got quite a few different fundamental issues with that model. You really can't have a few folks, even if there's quite a few of them, building out reports for the whole business. It doesn't seem to work.
There's always going to be throughput issues.
And never mind the fact that, well, actually, the people that know the data are the business folk frequently. And therefore, they're the ones that actually should be doing the analysis and building out meaningful dashboards because other people would take longer to do the same type of quality of work. Tableau makes this really quite easy because it's an easy tool to teach and therefore it makes sense to move towards a more decentralized model to make sure that I guess the quality and the throughput increases in the business.
IT should become facilitators. Doesn't mean that they're not part of this process. That's definitely not the case. But there should be synergy between the business and IT to make sure that they're basically, you know, pulling the same rope and have the same goals.
What we've definitely found is the majority actually of my work in the last couple of years has actually been in coming up with the strategy for enablement and governance. And so how do we actually make sure that the decentralized model works in the way that we want to? And those two we found are key components of this. And, you know, I just want to make sure that we understand that decentralized does not mean Wild West. There are still rules, but those rules are better defined to suit the business goals.
Okay, so I guess one of the first things I'd like to follow up on that is, obviously, people have to manage this change from centralized to self-service. Can you describe a recent project just to bring that to life a little bit more?
Yeah, absolutely.
So imagine a major logistics and supply chain company with, well, they probably have over one hundred thousand employees globally.
And essentially, they've just had these source systems where certain analysts would have been connected to, and they'd only be able to see that data only, and it was all cubes, meaning everything is quite rigid.
And ultimately, we saw, well, the business saw a major shift towards the Excel-like workflows where people ultimately just didn't find the source system analytics useful. And that's created a lot of other issues. Well, especially things like you pretty much lost any sort of single point of truth. Versioning is a massive pain. Sharing data between departments was almost nonexistent.
And so we've come in, we've devised this strategy, and it's taken us about a year to get to a place where we already have fifteen hundred plus users onboarded onto the platform who actually actively build content based on certified data sources that were created by the core team.
We've also enabled functionality around bringing in your own data, and we've fondly called it BYOD. But it's something that allows you to simply enrich the data that has been certified, that's been brought into the platform to do even more insight that wasn't previously possible. Think of it as a hybrid model that's slowly transitioning into a more decentralized way of doing it. The biggest part of this was creating the COE and the mandatory enablement plan. It was something that's really made sure that this project was going to be a success.
It was a lot of work, but the framework that we have created is incredibly scalable and ensures that everyone who is actually doing analytics, number one, understands the tool and how to use it, and number two, understands the intricacies of the actual certified data sources that are published on the platform. And I guess in the enterprise world, in a very short period of time of one year, we've already had over seven hundred and fifty workbooks that were actually created by the community.
Of that, around fifty are fully in production.
And as has pretty much been mentioned by the CTO of that company, the velocity of a project of this scope is unheard of in their organization previously. So major success as far as the business is concerned. We're ramping that up. We're going from around ten certified data sources to over thirty in the next year. And because the framework is already in place, this scales fairly linearly with the amount of people on the team, so clients are very happy.
I'm really interested that you know, we started off discussing how many people are using data and actually your key success there was actually how many people could author data and independently create their own content and insights. And I think that's actually an interesting second part to it.
So what I'd like to drill in a little bit more is, from my perspective, if more people are asking more ad hoc questions, what we tend to see is that this move to self-service tends to stimulate demand for the cloud because those on-prem data sources just are no longer fit for purpose. So just an example is, you know, we tend to see on-prem environments are heavily reliant on data marts and OLAP.
And so those fundamentally, you know, they create limits on the granularity that you can interrogate data with.
You know, you have to pre-calculate cubes if you decide you want to do analysis at a different dimension or what have you.
Do you see that sort of thing where when we have more self-service enabled consumers, we start to place new demands on the on-premise infrastructure, and that's what moves people to the cloud? Do you think there's a link between self-service and cloud?
Yeah, I definitely think so.
It definitely feels like to facilitate this increased activity on the platform, which is of course very good, but to facilitate that, sometimes you need a little bit more flexibility and you need to be able to iterate a little bit faster. There is a lot of tools around cloud technologies that enable you to do that. You can use infrastructure as code to iterate through your infrastructure needs in a more better-defined way. You can use CI/CD for content if that's something that your organization is accustomed to.
And ultimately, there are much easier ways to be able to scale dynamically depending on the demands of the platform. And whether that's on the infrastructure side or on the data warehousing side, cloud is really what makes this quite easy. Especially for GCP, it is, in my opinion, one of the leading cloud platforms for data handling.
There are almost infinite ways you can go about doing things. They've got their own tools like Dataflow, BigQuery, Cloud Function, AI Platform, of course GCS.
But it also allows you to be fairly flexible and build your own tools on top of the infrastructure offerings that they have. And the combination of, I guess, whatever is off the shelf and build-your-own is what seems to be the sweet spot for some of these larger organizations. And without cloud technologies helping us with developing some of these off-the-shelf tools, it doesn't feel like the velocity would have been nearly as rapid.
Yeah, I think you spoke about velocity a lot and I think we're seeing a lot of that in terms of how cloud data pipelines are more efficient at integrating data sources at scale.
But secondly in terms of these cloud data stores, they are more efficient at holding data at a much more granular level of detail, which I think is fundamental for self-service. I think, you know, pre-aggregated datasets naturally constrain analysis.
That's right.
They naturally mean that there are limits to the number of downstream questions you can ask. And that means that you typically, if you aggregate datasets too early in the pipeline, you keep on having to go back to Core IT whenever you want to ask the new question. So I think it's like it really is one of those where once you start getting that data pipeline working and you start getting used to a scalable cloud data service, you know, the business starts to think very differently around how they do insights generation.
Yeah, absolutely. I think the flexibility is definitely key because as soon as you're rigid, you are moving slowly back towards the centralized model. And we've talked about the drawbacks that that might entail.
Yeah.
Let's dig a little bit deeper into deployment. So I want you to take me through the thought process. If you've got a customer that's running a large Tableau environment on-prem and self-service is taking off.
You know, those customers will typically move to the cloud so that they can scale the processing part of their analytics environment.
So Tableau offers a number of options for deployment in the cloud. So you can deploy Tableau Server to Compute Engine or GKE within your own VPC, so you can run a self-managed customer, or you can purchase that as software as a service, so we call this Tableau Online. Where do you see your customers going for the full software as a service offering versus self-managed? Any examples you've got?
Yeah, it's a very good question. Look, I can give you an example for Tableau Online, so let's start with that.
Tableau Online is, look, I think of it as a gateway drug because it allows us to very, very quickly deploy analytics as part of a proof of concept or even as a production environment for smaller, medium businesses where they don't yet require any of the more advanced functionalities that are better supported by a self-managed environment. So, for example, there's been a company who is relatively small, maybe under five hundred users. They're a medical company. And they came to us with the intention of standing up analytics as a proof of concept.
But they had some, I guess, maybe political issues with IT. And so, well, we could still do the analytics solution for them because we didn't really need to go down to the IT department and talk architecture and figure out how to deploy the underlying infrastructure. Because ultimately, what this company needed is the analytics piece. The infrastructure is just the facilitator again.
So I really find Tableau Online to be a great place to start for actually a lot of the customers. Once you go down the path of data security requirements that may be regional, you may have a little bit more flexibility on the self-managed side. And it allows you to tailor-make the infrastructure to ensure that you have the flexibility from a legal perspective. It allows you to go down the path of embedded analytics, which seems to be very hot these days.
And by embedded you mean analytics facing off to external users outside of the organization.
Absolutely. So inside your applications.
Yeah, and another big benefit of self-managed, I mean, we can't just rip apart, you know, for some companies fifty years worth of data warehousing that's on-prem. You know, there are always these Teradatas and other types of technologies on-prem. And ultimately if you use a self-managed environment, you could put it on-prem or you could have a cloud environment that is interconnected with your own network and that would allow you, I guess, easier access to these on-prem data sources as you control the security yourself.
Yes, I mean it's interesting that you created that distinction in terms of, so we don't see the distinction in terms of organization size. We actually see, so we have some customers with twenty thousand users in Tableau Online. The key difference for us seems to be when you are cloud-first on your data sources, then Tableau Online starts becoming very attractive. When you have a lot of weird complex security rules in terms of your access paths from the analytics environment to your data.
So some people require Tableau to be deployed inside their VPC or inside a specific network segment in order to connect to their stuff. Those are the people that tend to keep doing, you know, the server model often. I know you guys are also doing some partner-managed Tableau Server installations to try and bridge the gap between those two worlds. Is that right?
Yeah, that's right. So the way we've found a middle ground is we can actually host the Tableau Server for you. You have all the flexibility that you need, but without the maintenance overhead that might be incurred on your IT department. We'd take care of that for you. We just call it ServiceCare.
That has been a massive hit in the last two years as people are offloading some of these more standardized manual tasks to partners like us because we've automated the majority of the process and we provide the support that these customers need.
Okay. So there's another nuance about managing your own server cluster. If you're buying Tableau SaaS, frankly, you don't care about the discussion about VMs versus containers.
That's right.
But if you're running it in your own environment, we've, at the last Tableau conference, we announced support for Kubernetes.
Specifically, the idea here was in this initial release, we've announced that we can automatically scale backgrounders using Kubernetes, and then over time, more and more of the services within Tableau will be full auto-scale enabled. Do you see a demand for this? Do you see that most people are still running on traditional Compute Engine VMs or in GCP are people looking to containerize?
It's a very good question. Look, we've definitely seen quite a few more inquiries from customers to go down this path. Sure. There is a trend going on here.
I do find that customers that are on GCP are a little bit more in the know around containerization. And it kind of makes sense because we see Kubernetes is native to GCP. And it's kind of interesting to see how people are thinking of using it. As we know now, the functionality is at the moment just for the backgrounders.
But I guess as soon as we get down to the path of more processes being auto-scaling, I see a lot more demand for this. It's definitely for companies that know how to utilize Kubernetes or have teams that use it already. This could be a game changer.
Yeah.
I do, however, see these trends popping out on what people are using it for right now. And one of the major ones is using spot instances. So obviously, because the infrastructure and the containers are decoupled in a way, you can do some pretty funky stuff around using cheaper infrastructure, spot instances or preemptible instances to lower your overall infrastructure costs without actually losing any performance or any availability. And that seems to be what's happening with some of these more in-the-know customers. But I do expect this feature to take off massively in the future.
Yeah, so the reason we focused on backgrounders initially was because this tends to be the lumpier kind of workload. So those are used for processing things like prep flows and extract refreshes. I mean, we'll go through the question later on about if you're connecting to cloud, should you really be using extracts much or should you be using them tactically. You'll see that sort of activity where you are running flows and extracts, it tends to be, it ramps up and ramps down. Yeah. The idea there is use backgrounders just to provide enough processing muscle for your data pipeline when you are running those activities and then scale it back so that mitigates infrastructure costs.
Absolutely. Yeah, makes a whole lot of sense to me.
So we've got a little bit of time left and I now want to look at the data layer and some best practices for GCP data services. Really quickly, obviously we've invested a lot in our Connect ecosystem to make sure that we can connect to a huge number of cloud data sources.
We also invest in what we call Hyper connectivity. So this idea that you should be able to connect live to any cloud data source, but then seamlessly switch to an extract if you decide that you want to, for example, do cost containment or provide a performance boost for a specific dashboard. So there is this going on where customers have got this tool. They can choose between either modes.
That's one dimension. The other dimension is that GCP just offers a lot of options for how you host the data that powers your business. Yep. So let's divide this into three buckets, and let's kind of go through how your customers are using them.
So in bucket one, traditional transactional databases like Google Cloud SQL, or SQL Server inside a Compute Engine instance, that sort of thing. Yep. Bucket two, we're going to look at proper cloud data warehouses like Google BigQuery. And then bucket three, we're going to put things like end-user compute data sources like Google Sheets and things like SaaS data sources like Google Analytics and Google Ads.
So all of those things that are, you know, they're not really core analytics, but they are extremely useful for expanding the number of use cases we can go after. Bucket one, let's start there. Where do you see customers running Tableau on top of traditional relational databases, and what are the limits of that approach?
Yeah, absolutely. So look, these relational databases are very frequent for transactional processing, so sometimes analytics is just the secondary use.
They work well, don't get me wrong, but they definitely have their limits.
I think these really do lend themselves to mostly having, maybe, you know, we normally just suggest to actually take this data, ETL it and put it into the bucket two, as you call it. But there's definitely a way how you might bring in the data from these transactional databases into Tableau, and you're able to analyze in just the same way. But perhaps you wouldn't be able to maybe put too much volume into the analytics there because you may see some performance issues long-term.
Yeah, this is just running long-running analytics processes inside your transactional systems. So actually this is one of the reasons why we have Hyper I/O is because we recognize that that's one of the reasons where you would probably experiment with live and if it causes an impact then you would turn on an extract. That's right.
Yeah, like having, we sometimes see a lot of pushback from IT departments when we try and plug into these transactional systems directly, because of what you've just mentioned, the resource contention is a thing and people just need to be aware that that's one of the, I guess, limitations.
So my perspective is usually the reason people are using those systems in the cloud is just because they've done a kind of a lift-and-shift migration of an existing system.
Yeah, most of the time.
And when they are born in the cloud, they tend to already be using the services in bucket two, so scalable cloud data services. But let's look at when a customer decides they need to overcome the limitations of traditional databases, and so then most of our customers are connecting Tableau to Google BigQuery. That's the big beast that tends to give the best fit for analytical throughput. What are the configuration best practices there? What do you see customers doing?
Well, I guess simply put, it's this whole idea of purpose-built computing, if you can call it that. And Tableau is very good at a certain thing, but it's not a substitute for a data warehouse, right? Extracts only get you so far. Exactly.
And so the patterns that we're definitely seeing these days is just companies leveraging BigQuery's almost infinitely scalable compute engine to really analyze huge datasets. And they do it live. They do it live for not just the actual performance uplift, because you can fine-tune the back end to support the queries that you're interested in. But it's also for governance reasons.
I mean, there is definitely an increased interest in making sure that the data accesses are traceable, they're logged, that technologies like row-level security and column-level security are being utilized to ensure whether regulatory compliance or just internal requirements. And it's cloud data services that allow us to do this natively within, it's just a normal part of the functionality of Google BigQuery. And, of course, we now have the BI Engine integration as well, which could potentially yield some pretty interesting results for certain types of workloads.
We already have a company that is experimenting with it right now. We don't have the full data just yet on how well it's going. But they do expect BI Engine to save them about fifty percent of the costs for the Tableau Server environment that they're running right now. So if that's going to be the case, then again, game changer and it's all enabled by these cloud data services.
So just to capture that, what I hear from you is, and I absolutely agree with this, for BigQuery, you know, live connectivity is absolutely a requirement that your analytics platform needs to support and needs to be able to push down queries into that so that you can do all of that management of security and governance and access. And actually, I'm actually very excited about the BI Engine support that we've just added as well. I think, you know, this will help people to overcome some of those cost constraints that they had around doing, you know, lots and lots of scans of lots and lots of data held inside BigQuery. I think it will make it much more manageable. Really quickly, these other services. Where do you see people using, for example, Google Sheets to bring ad hoc data into Tableau? Do you see much of that?
Yeah, so do you remember how I mentioned bring your own data style frameworks? Well, this is where these backends really shine, because what is really flexible, and you know, it works like Excel. It's just that it's a cloud data source, so you can connect to it a little bit easier.
And, you know, you have a single version of the truth because, of course, everything is versioned in Google Sheets. And it allows you to whether join or blend your own data that perhaps might be departmental, that's not maybe guarded by some sort of a more centralized system, and allows you to really enrich the data that is on the Tableau platform with, well, whatever you need it to be.
Without this avenue, it really feels like we're still in the hybrid era or the more centralized era.
It's actually these types of data sources that enable us to go fully decentralized, as there is never going to be a scenario where a team will be able to cover the data requirements for the whole business at every single point in time.
I absolutely agree. I think if you don't give people the ability to bring these ad hoc data sources in, then what they'll do is they'll inevitably export out of Tableau into a CSV file and take it out. Exactly. That means you lose control, lose all of this collaboration, and it just becomes, you're back to the bad old days of stale data all over the place.
So it was a really interesting conversation. Thanks so much for joining me. I just want to have a quick wrap-up. Do you have, like, one or two takeaways that you'd like anyone who's listened to this call to take from you?
Any thoughts?
Yeah. I think what I'd say is you mustn't be afraid to decentralize, to sort of let go some of the aspects. Of course, it helps to have a proper framework that's already been configured and set up specifically for your use case. But the transformation that we're seeing for some of these companies that are moving this way is just so immense that I really think it's going to be the future. And these cloud platforms like GCP within the combination of BigQuery and the end-to-end identity are just the way to go in the future.
And my takeaway, you know, I was speaking about one of the key measures of success for every cloud project should be how many people are actually using data to support their decision making. And from you, I think we should extend that. It's actually also how many people can actively author new insights on the platform, because that's really where you deliver this scale of change in operational effectiveness. So if you keep focusing on agile decision making in the front line, then your cloud transformation will deliver value to the business faster. So thank you so much for your time today. And thanks everyone for joining us.
And look forward to seeing you on the next session. Thank you.
Thank you.