This post covers how to automate the fetching and pushing of a Matillion project’s local git repository with its remote counterpart and uses the Bash Script component to do so. If you would prefer, I have also written a similar post which leverages the Python Script component instead.
Requirements
- Your Matillion project must already be configured with remote git integration.
- Your Matillion API user and private git SSH key must be stored appropriately on your Matillion virtual machine.
Key Goal
The key goal of our script is to hit the scm/fetch and scm/push endpoints for a given Matillion project, as described in Matillion’s documentation.
The Matillion group name and project name need to be URL-encoded before they can be passed to the Matillion API, and I found this easier to achieve with a quick Python component beforehand. The Bash script contains a very rough equivalent though to replace URL encode spaces.
Unfortunately, Matillion doesn’t yet support Bash scripts writing back to Matillion variables, so we can’t build in any further functionality based on this output. This is not an issue with the Python equivalent.
Once this is working, you could use one of Matillion’s iterator components to cycle through multiple projects, scheduling this to execute daily and automatically sync all your Matillion git repositories:
Python Script to Encode Group and Project
This simple Python script will modify Matillion variables to store the URL-encoded versions of the group name and project name:
import urllib.parse from datetime import datetime, timedelta context.updateVariable('GROUP_ENCODED', urllib.parse.quote(GROUP)) context.updateVariable('PROJECT_ENCODED', urllib.parse.quote(PROJECT))
Bash Script Overview
The script below performs the following steps:
- Import the modules required for this script. All of these modules are native, so nothing needs to be installed beforehand.
- Prep and authentication
- Retrieve the password for the Matillion API service user from the VM backend.
- Convert this username and password combination into the base64 encoded string for API authentication.
- Retrieve the private SSH key for git authentication from the VM backend.
- We could URL-encode the name of the project that requires this git sync using Bash, along with the group that the project belongs to. Personally, I couldn’t find a clean way to do this, so I do it in Python quickly first.
- Configure the authorization component of the JSON body that will be sent to each of the API endpoints.
- Configure the headers that will be sent with the API requests.
- Configure the URLs for the API requests, leveraging the local references as this script will be executed from within a Matillion orchestration job.
- The fetch API request
- Prepare the fetch options: removeDeletedRefs and thinFetch.
- Execute the fetch API request.
- Store the fetch response in a Matillion variable.
- The push API request
- Prepare the push options: atomic,forcePush and thinPush.
- Execute the push API request.
- Store the push response in a Matillion variable.
Once the script executes, we can read the values of our two response variables and leverage a Matillion IF component to flag whether or not the job succeeded.
Complete Bash Script
Below is the complete script. Of course, you may need to update certain file paths and file names to get this working for your own use cases:
# BASH - Automated Matillion Git Sync ## Description # This is a bash script to read Matillion api-user credentials and git SSH private key # from files in the linux backend and pass them into the Matillion API endpoints # that fetch from and push to the remote repository ## Prep and authentication ### Retrieve password for the Matillion user called api-user, and prepare it for API authentication api_user_password=$(cat /matillion_service_account_users/api-user.txt) api_auth="api-user:$api_user_password" ### Retrieve SSH private key and encode it for the API body ssh_key=$(cat /ssh_keys/matillion/id_rsa_gitlab) ssh_key_encoded="${ssh_key//$'\n'/\n}" ### URL encode group and project # This was not reliable so has been replaced with a Python component #group_encoded="${GROUP// /%20}" #project_encoded="${PROJECT// /%20}" ### Configure API auth api_body_auth='"auth": { "authType": "SSH", "privateKey": "'"$ssh_key_encoded"'", "passphrase": "" }' ### Configure the URLs for the API requests instance_address="http://127.0.0.1:8080" endpoint_path="group/name/$GROUP_ENCODED/project/name/$PROJECT_ENCODED" fetch_url="$instance_address/rest/v1/$endpoint_path/scm/fetch" push_url="$instance_address/rest/v1/$endpoint_path/scm/push" ## Fetch ### Prepare fetch options fetch_options='"fetchOptions": { "removeDeletedRefs": "'"$FETCH_OPTION_REMOVE_DELETED_REFS"'", "thinFetch": "'"$FETCH_OPTION_THIN_FETCH"'" }' ### Prepare body of the fetch API request fetch_body_raw='{ '"$api_body_auth"', '"$fetch_options"' }' ### Execute fetch API request echo "" echo "-------------FETCH START-------------" echo "" curl --request POST \ -u "$api_auth" \ --insecure "$fetch_url" \ --header 'Content-Type: application/json' \ --data "$fetch_body_raw" echo "" echo "--------------FETCH END--------------" echo "" ## Push ### Prepare push options push_options='"pushOptions": { "atomic": "'"$PUSH_OPTION_ATOMIC"'", "forcePush": "'"$PUSH_OPTION_FORCE_PUSH"'", "thinPush": "'"$PUSH_OPTION_THIN_PUSH"'" }' ### Prepare body of the push API request push_body_raw='{ '"$api_body_auth"', '"$push_options"' }' ### Execute push API request echo "" echo "--------------PUSH START-------------" echo "" curl --request POST \ -u "$api_auth" \ --insecure "$push_url" \ --header 'Content-Type: application/json' \ --data "$push_body_raw" echo "" echo "---------------PUSH END--------------" echo ""