How to Automate Matillion Git Fetch/Push Syncs with Bash

Data

How to Automate Matillion Git Fetch/Push Syncs with Bash

This post covers how to automate the fetching and pushing of a Matillion project’s local git repository with its remote counterpart and uses the Bash Script component to do so. If you would prefer, I have also written a similar post which leverages the Python Script component instead.

Requirements

Key Goal

The key goal of our script is to hit the scm/fetch and scm/push endpoints for a given Matillion project, as described in Matillion’s documentation.

The Matillion group name and project name need to be URL-encoded before they can be passed to the Matillion API, and I found this easier to achieve with a quick Python component beforehand. The Bash script contains a very rough equivalent though to replace URL encode spaces.

Unfortunately, Matillion doesn’t yet support Bash scripts writing back to Matillion variables, so we can’t build in any further functionality based on this output. This is not an issue with the Python equivalent.

Once this is working, you could use one of Matillion’s iterator components to cycle through multiple projects, scheduling this to execute daily and automatically sync all your Matillion git repositories:

Python Script to Encode Group and Project

This simple Python script will modify Matillion variables to store the URL-encoded versions of the group name and project name:

import urllib.parse
from datetime import datetime, timedelta

context.updateVariable('GROUP_ENCODED', urllib.parse.quote(GROUP))
context.updateVariable('PROJECT_ENCODED', urllib.parse.quote(PROJECT))

Bash Script Overview

The script below performs the following steps:

  1. Import the modules required for this script. All of these modules are native, so nothing needs to be installed beforehand.
  2. Prep and authentication
    1. Retrieve the password for the Matillion API service user from the VM backend.
    2. Convert this username and password combination into the base64 encoded string for API authentication.
    3. Retrieve the private SSH key for git authentication from the VM backend.
    4. We could URL-encode the name of the project that requires this git sync using Bash, along with the group that the project belongs to. Personally, I couldn’t find a clean way to do this, so I do it in Python quickly first.
    5. Configure the authorization component of the JSON body that will be sent to each of the API endpoints.
    6. Configure the headers that will be sent with the API requests.
    7. Configure the URLs for the API requests, leveraging the local references as this script will be executed from within a Matillion orchestration job.
  3. The fetch API request
    1. Prepare the fetch options: removeDeletedRefs and thinFetch.
    2. Execute the fetch API request.
    3. Store the fetch response in a Matillion variable.
  4. The push API request
    1. Prepare the push options: atomic,forcePush and thinPush.
    2. Execute the push API request.
    3. Store the push response in a Matillion variable.

Once the script executes, we can read the values of our two response variables and leverage a Matillion IF component to flag whether or not the job succeeded.

Complete Bash Script

Below is the complete script. Of course, you may need to update certain file paths and file names to get this working for your own use cases:

# BASH - Automated Matillion Git Sync

## Description
# This is a bash script to read Matillion api-user credentials and git SSH private key 
# from files in the linux backend and pass them into the Matillion API endpoints
# that fetch from and push to the remote repository

## Prep and authentication

### Retrieve password for the Matillion user called api-user, and prepare it for API authentication
api_user_password=$(cat /matillion_service_account_users/api-user.txt)
api_auth="api-user:$api_user_password"

### Retrieve SSH private key and encode it for the API body
ssh_key=$(cat /ssh_keys/matillion/id_rsa_gitlab)
ssh_key_encoded="${ssh_key//$'\n'/\n}"

### URL encode group and project
# This was not reliable so has been replaced with a Python component
#group_encoded="${GROUP// /%20}"
#project_encoded="${PROJECT// /%20}"

### Configure API auth
api_body_auth='"auth": {
    "authType": "SSH",
    "privateKey": "'"$ssh_key_encoded"'",
    "passphrase": ""
  }'

### Configure the URLs for the API requests
instance_address="http://127.0.0.1:8080"
endpoint_path="group/name/$GROUP_ENCODED/project/name/$PROJECT_ENCODED"
fetch_url="$instance_address/rest/v1/$endpoint_path/scm/fetch"
push_url="$instance_address/rest/v1/$endpoint_path/scm/push"

## Fetch

### Prepare fetch options
fetch_options='"fetchOptions": {
    "removeDeletedRefs": "'"$FETCH_OPTION_REMOVE_DELETED_REFS"'",
    "thinFetch": "'"$FETCH_OPTION_THIN_FETCH"'"
  }'

### Prepare body of the fetch API request    
fetch_body_raw='{
  '"$api_body_auth"',
  '"$fetch_options"'
}'

### Execute fetch API request

echo ""
echo "-------------FETCH START-------------"
echo ""

curl --request POST \
-u "$api_auth" \
--insecure "$fetch_url" \
--header 'Content-Type: application/json' \
--data "$fetch_body_raw"

echo ""
echo "--------------FETCH END--------------"
echo ""

## Push

### Prepare push options
push_options='"pushOptions": {
    "atomic": "'"$PUSH_OPTION_ATOMIC"'",
    "forcePush": "'"$PUSH_OPTION_FORCE_PUSH"'",
    "thinPush": "'"$PUSH_OPTION_THIN_PUSH"'"
  }'

### Prepare body of the push API request    
push_body_raw='{
  '"$api_body_auth"',
  '"$push_options"'
}'

### Execute push API request

echo ""
echo "--------------PUSH START-------------"
echo ""

curl --request POST \
-u "$api_auth" \
--insecure "$push_url" \
--header 'Content-Type: application/json' \
--data "$push_body_raw"

echo ""
echo "---------------PUSH END--------------"
echo ""

KeepWatch by InterWorks

Whether you need support for one platform or many, our technical experts have you covered.

More About the Author

Chris Hastie

Data Lead
Querying Stock Data with an API Query Profile in Matillion ETL Matillion ETL is a GUI-based orchestration and transformation tool for data movement that has many functionalities, including querying ...
Snowflake External Access: Retrieve Data from an API Recently, Snowflake released external access integrations into general availability. This allows Snowflake admins to create secure ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!