How to Ingest Excel Files with Dynamic Sheet Names Using Matillion

Data

How to Ingest Excel Files with Dynamic Sheet Names Using Matillion

If you ever want to ingest data from an Excel file into Snowflake using Matillion, you can do so with the Excel Query component. If you know the name of the sheet you want to ingest, you can simply enter the name into the component and ingest that sheet.

But what if your sheet name may change? What if you wish to ingest multiple sheets, and they all may change? This challenge is common as sheet names are often used to identify the data in some way. For example, the sheet name could be the date for the data, the depot/branch the data is relevant to, the employee name … or pretty much anything to be honest.

How to Retrieve Sheet Names from an Excel File with Matillion

So, how do we read the names of all sheets in an Excel file using Matillion? Fortunately, the answer is simple:

  1. Use the Excel Query component.
  2. Configure the component to use Advanced Mode.

3. Enter the following SQL query:

SELECT * FROM sys_tables

As you can see from the sample in this screenshot, this will query the Excel file and retrieve the list of sheets. The sheet names are contained in the TableName field:

Why Does This Work? 

From Matillion’s software licenses, we can see that Matillion’s Excel Query component is supported by CData Cmdlets for Excel, which includes this sys_tables command.

How to Ingest Data from All Sheets in an Excel File

Using the sys_tables approach outlined above in an Excel Query component, we can store this information in a table in Snowflake. After Matillion has ingested this information into a Snowflake table, we can use the Table Iterator component to loop through each sheet name in the table. Pairing this with another Excel Query component, we can query each sheet in turn:

And there we have it. With this simple approach of using sys_tables to store the list of sheet names and iterating over it, we are able to ingest all sheets in an Excel file without knowing the sheet names in advance. Of course, you could make this process more complex. For example, you could:

  • Replace the second Excel Query component with a new orchestration job.
  • Modify the basic sys_tables query to match names that meet a specific naming convention.
  • Ingest the list of sheet names into a grid variable using the Query Result to Grid component, and manipulate that grid variable further.

Let me know if you find any interesting use cases for this approach!

More About the Author

Chris Hastie

Data Lead
Configure Azure Private Endpoints for Snowflake Internal Stages Snowflake and Azure are incredibly capable of storing data securely. You can see all reports available to customers through their pages ...
Configure Azure Private Link Connectivity with Snowflake Snowflake and Azure are incredibly capable of storing data securely. You can see all reports available to customers through their pages ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!