Using the Matillion API Query Profile

Data

Using the Matillion API Query Profile

by Holt Calder
//

After last week’s Airtable post, I’m sure all of our readers have been anxiously sitting on the edge of their seats. By now, I hope some of you have had the opportunity to explore Airtable and begin building bases. While I touched on the Airtable’s visualization capabilities, it is likely that if you’re a recurring reader of this blog, you will eventually want your data to land in Tableau. While Tableau does not currently have a connector for Airtable, I am excited to show you how we can structure this data in Snowflake. I am going to give you all a comprehensive example of how to configure Matillion’s API profile and how to utilize the API Query component.

Mo’ Matillion

What we will do in this blog is go through the necessary steps to set up the API profile, configure the XML file that maps the data and build an orchestration job to pull data from the API. So, what does the process look like for getting this set up?

  1. Create API profile
  2. Generate RSD file
  3. Build orchestration job
  4. Done!

Why Do I Need Matillion

In the time since I introduced Matillion to the blog, all of the InterWorks data engineers and architects participated in a comprehensive four-day training on the ins and outs of being a “Matillionaire.” Matillion is a cloud-based ELT tool that allows you to extract your data from 58 different source systems, apply transformations to that data and load it into your Snowflake warehouse. While 58 connectors may seem like a lot, there are going to be situations where a native connector for your product just doesn’t exist—and that’s why I’m here to show you how to build an API profile in Matillion.

Adding an API profile can have a significant impact on your data strategy and avoids the need to disperse your ELT process across different platforms; if you have the data, it deserves to live in your data warehouse, and nobody should have to monitor multiple ETL processes, which creates more room for error.

Create the API Profile

When you log into Matillion, you can go and check out the pre-configured API profiles from the Project drop-down menu. We’re going to go ahead and get started by adding a new profile and naming it Airtable Survey:

creating an API profile in Matillion

From our API Profile Management window, we can see a few of the API profiles that Matillion uses for pre-defined Load components. Really, the main significance of this call-out is to show you can generate an API profile that is equally as effective as an out-of-the-box component:

managing API profiles in Matillion

Generate an RSD File

Now that we’ve created our API profile, the fun part begins. We are going to select the gear icon to the right of our profile and begin configuring the connection. Inside of our configuration window, we have a Files and Parameter pane. The Files section is going to contain our RSD file, which holds the connection information for the API request. A single profile can contain multiple RSD files. I want you to think of an RSD file as an API request. If your service has multiple APIs available, they can all live within a single profile.

You have the option to build out different parameters for your RSD files. These are going to be useful if we are passing information with each request. Whether that is an authentication token or a search parameter, it will live here:

generating the RSD file in Matillion

Now that we are in the Configure API Profile window, we have two options pertaining to our RSD files:

Drake Matillion meme

Great choice! Let’s have Matillion do it. In our Files pane, select the option to Generate an RSD file. From here, we will pass the values Matillion needs to generate the RSD file:

  • Table Name: defines the name of our Matillion API request
  • Description: information regarding the request
  • XPath to repeat element: Most important detail; the path to what defines different rows of data (more on this later)
  • Data format: format our data is returned as
  • URI: URL to our API request
  • Properties

After submitting my form, Matillion will generate the RSD for me:

RSD file generation in Matillion

Before we move on to the next step, I do want to touch more on the importance of the repeat element. This is the single most important (and the most overlooked) piece of information required to successfully write an API Profile.

The Repeat Element

You must have your repeat element properly configured when creating an API Profile. If you do not, your data will become a jumbled, unusable mess. I’ll walk you through this in my Airtable API response.

Just by looking at the JSON the Airtable API responds with, you can identify that records is going to determine a new row has been created:

JSON path finder

In a perfect world, my data will come out of Matillion looking like this:

Matillion data output

However, choosing to omit this repeat element causes a world of problems that are difficult to pinpoint. Without the repeat element, instead of receiving different rows from my API call, the RSD file will be generated to identify each record as a distinct column, making my output look like this:

data output in API profile query

Now that we all know the pitfalls of the repeat element, we can continue to the fun stuff: how do we pull this into Snowflake?

Create the Orchestration Job

After configuring the API profile, we can create an orchestration job and test the new API profile. Using our API Query component located in the Load & Unload folder, we can select the Airtable profile we just created. The data source for the component will be the name of the relevant RSD file. Remember, this defines our API request:

API query in Matillion

API query in Matillion

The API Query component is going to send a request to our API profile and pull the data back into a Snowflake table, making the data available in our data warehouse.

Finished!

Being able to pull data from any API is a huge feature that Matillion executes exceptionally well. From defining the API profile to making requests, the complete process doesn’t take long at all.

While the process for connecting a new API profile to Matillion might not directly apply to your business use case yet, it is extremely valuable in understanding the process that Matillion uses to create its own components with. Matillion developers seem to use the tool the same way you and I do by creating “shared jobs,” like the Incremental Load Generator, out of the tools available to every customer.

Transitioning this attitude to the orchestration space, the API profile is a great way to get the most out of your ELT investment and continue improving the way your organization uses Matillion.

If you found this post interesting, please feel free to reach out and let me know how you are currently using Matillion, or where you think Matillion could be useful to you. I’d be happy to act as a sounding board to bounce ideas off of!

More About the Author

Holt Calder

Data Architect
Analytics in Digital Transformation with AWS The year 2020 produced a mass migration of business processes, enterprise software and internal workloads to AWS in a way that was ...
Introducing InterWorks Translate, Our DataDev Hackathon Project At InterWorks, we have always been a group of tinkerers. Near the end of last year, we had the opportunity to launch the Amazon ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!