A client recently had a challenge wanting to use Exasol’s User-Defined Functions with a Python script, but they were having a hard time coming up with the right syntax to get it working properly. There’s a good deal of documentation out there, though much of it seemed a bit convoluted or difficult to comprehend.
I’m not a database guy, so I wanted to come up with a simpler solution for us less-technical Tableau developers. I ended up putting together a quick script (below) that the client found to be easier to understand and extend for their own purposes.
First a few notes:
- The following script is executed in Exaplus to create the UDF
- A UDF must be created within a desired schema, as notated below
- You may add or remove inputs depending on your requirements
CREATE OR REPLACE PYTHON SCALAR SCRIPT my_schema>.(input_1 , input_2 , ... , input_n )
The EMITS clause tells us what we should expect to come out of the script:
EMITS (output_column_1 , output_column_2 , ... , output_column_n ) AS
Once we’ve declared the Exasol parts of the script, we can use Python code with a few tweaks:
import pandas as pd
Ctx (call it whatever you’d like) is an object that’s being created that represents the connection between Exasol and Python. You can access the various inputs by using ctx.<input_name>. Make sure you use the same names specified in the ‘CREATE OR REPLACE’ clause):
def run(ctx): arg1 = ctx.input_1 arg2 = ctx.input_2 ... argn = ctx.input_n
Once you have your inputs, you can use them as normal Python objects:
df = pd.DataFrame(<build some dataframe>)
To output the dataframe, you can use the ctx.emit() function to output each column:
for index, row in df.iterrows(): ctx.emit(row[0], row[1], ..., row[n])
The nice thing here is that we’ve now extended Exasol’s lightning-fast data processing with Python’s massive (and growing) repository of libraries and functions. Have a good use case for using Python + Exasol? Leave a comment below!