How to Ingest Excel Files with Dynamic Sheet Names Using Matillion

Data

How to Ingest Excel Files with Dynamic Sheet Names Using Matillion

If you ever want to ingest data from an Excel file into Snowflake using Matillion, you can do so with the Excel Query component. If you know the name of the sheet you want to ingest, you can simply enter the name into the component and ingest that sheet.

But what if your sheet name may change? What if you wish to ingest multiple sheets, and they all may change? This challenge is common as sheet names are often used to identify the data in some way. For example, the sheet name could be the date for the data, the depot/branch the data is relevant to, the employee name … or pretty much anything to be honest.

How to Retrieve Sheet Names from an Excel File with Matillion

So, how do we read the names of all sheets in an Excel file using Matillion? Fortunately, the answer is simple:

  1. Use the Excel Query component.
  2. Configure the component to use Advanced Mode.

3. Enter the following SQL query:

SELECT * FROM sys_tables

As you can see from the sample in this screenshot, this will query the Excel file and retrieve the list of sheets. The sheet names are contained in the TableName field:

Why Does This Work? 

From Matillion’s software licenses, we can see that Matillion’s Excel Query component is supported by CData Cmdlets for Excel, which includes this sys_tables command.

How to Ingest Data from All Sheets in an Excel File

Using the sys_tables approach outlined above in an Excel Query component, we can store this information in a table in Snowflake. After Matillion has ingested this information into a Snowflake table, we can use the Table Iterator component to loop through each sheet name in the table. Pairing this with another Excel Query component, we can query each sheet in turn:

And there we have it. With this simple approach of using sys_tables to store the list of sheet names and iterating over it, we are able to ingest all sheets in an Excel file without knowing the sheet names in advance. Of course, you could make this process more complex. For example, you could:

  • Replace the second Excel Query component with a new orchestration job.
  • Modify the basic sys_tables query to match names that meet a specific naming convention.
  • Ingest the list of sheet names into a grid variable using the Query Result to Grid component, and manipulate that grid variable further.

Let me know if you find any interesting use cases for this approach!

More About the Author

Chris Hastie

Data Architect
A Definitive Guide to Creating Python UDTFs in Snowflake Using Snowpark Snowpark for Python is the name for the new Python functionality integration that Snowflake has recently developed. At the Snowflake ...
A Definitive Guide to Creating Python UDTFs Directly within the Snowflake User Interface Snowpark for Python is the name for the new Python functionality integration that Snowflake has recently developed. At the Snowflake ...

See more from this author →

Subscribe to our newsletter

  • I understand that InterWorks will use the data provided for the purpose of communication and the administration my request. InterWorks will never disclose or sell any personal data except where required to do so by law. Finally, I understand that future communications related topics and events may be sent from InterWorks, but I can opt-out at any time.
  • This field is for validation purposes and should be left unchanged.

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072