If you ever want to ingest data from an Excel file into Snowflake using Matillion, you can do so with the Excel Query component. If you know the name of the sheet you want to ingest, you can simply enter the name into the component and ingest that sheet.
But what if your sheet name may change? What if you wish to ingest multiple sheets, and they all may change? This challenge is common as sheet names are often used to identify the data in some way. For example, the sheet name could be the date for the data, the depot/branch the data is relevant to, the employee name … or pretty much anything to be honest.
How to Retrieve Sheet Names from an Excel File with Matillion
So, how do we read the names of all sheets in an Excel file using Matillion? Fortunately, the answer is simple:
- Use the Excel Query component.
- Configure the component to use Advanced Mode.
3. Enter the following SQL query:
SELECT * FROM sys_tables
As you can see from the sample in this screenshot, this will query the Excel file and retrieve the list of sheets. The sheet names are contained in the TableName field:
Why Does This Work?
From Matillion’s software licenses, we can see that Matillion’s Excel Query component is supported by CData Cmdlets for Excel, which includes this sys_tables command.
How to Ingest Data from All Sheets in an Excel File
Using the sys_tables approach outlined above in an Excel Query component, we can store this information in a table in Snowflake. After Matillion has ingested this information into a Snowflake table, we can use the Table Iterator component to loop through each sheet name in the table. Pairing this with another Excel Query component, we can query each sheet in turn:
And there we have it. With this simple approach of using sys_tables to store the list of sheet names and iterating over it, we are able to ingest all sheets in an Excel file without knowing the sheet names in advance. Of course, you could make this process more complex. For example, you could:
- Replace the second Excel Query component with a new orchestration job.
- Modify the basic sys_tables query to match names that meet a specific naming convention.
- Ingest the list of sheet names into a grid variable using the Query Result to Grid component, and manipulate that grid variable further.
Let me know if you find any interesting use cases for this approach!