How to Automate Matillion Git Fetch/Push Syncs with Python

Data

How to Automate Matillion Git Fetch/Push Syncs with Python

This post covers how to automate the fetching and pushing of a Matillion project’s local git repository with its remote counterpart using the Python Script component. If you would prefer, I also have a similar post that leverages the Bash Script component.

Requirements

Key Goal

The key goal of our script is to hit the scm/fetch and scm/push endpoints for a given Matillion project, as described in Matillion’s documentation. We can store the success/failure responses in Matillion variables and then use an IF component to report on whether the job was a success or a failure. This additional functionality is not currently possible with the Bash equivalent as the Matillion Bash Script component does not yet support writing to Matillion variables.

Once this is working, you could use one of Matillion’s iterator components to cycle through multiple projects, scheduling this to execute daily and automatically sync all of your Matillion git repositories:

Python Script Overview

The script below performs the following steps:

  1. Import the modules required for this script. All of these modules are native, so nothing needs to be installed beforehand.
  2. Prep and authentication
    1. Retrieve the password for the Matillion API service user from the VM backend.
    2. Convert this username and password combination into the base64 encoded string for API authentication.
    3. Retrieve the private SSH key for git authentication from the VM backend.
    4. URL-encode the name of the project that requires this git sync, along with the group that the project belongs to.
    5. Configure the authorization component of the JSON body that will be sent to each of the API endpoints.
    6. Configure the headers that will be sent with the API requests.
    7. Configure the URLs for the API requests, leveraging the local references as this script will be executed from within a Matillion orchestration job.
  3. The fetch API request
    1. Prepare the fetch options: removeDeletedRefs and thinFetch.
    2. Execute the fetch API request.
    3. Store the fetch response in a Matillion variable.
  4. The push API request
    1. Prepare the push options: atomic, forcePush and thinPush.
    2. Execute the push API request.
    3. Store the push response in a Matillion variable.

Once the script has executed, we can then read the values of our two response variables and leverage a Matillion IF component to flag whether or not the job was a success.

Complete Python Script

Here, you can find the complete script. Naturally, you may need to update certain file paths and file names to get this working for your own use cases:

# Python3 - Automated Matillion Git Sync

## Description
# This is a Python3 script to read Matillion api-user credentials and git SSH private key 
# from files in the linux backend and pass them into the Matillion API endpoints
# that fetch from and push to the remote repository

## Modules
import urllib.parse
from datetime import datetime, timedelta
import os
import http.client
from base64 import b64encode
import json

## Prep and authentication

### Retrieve password for the Matillion user called api-user, and prepare it for API authentication
api_user_name = 'api-user'
api_user_password = open(os.path.expanduser('/matillion_service_account_users/{0}.txt'.format(api_user_name))).read().rstrip('\n')
api_auth_raw = '{0}:{1}'.format(api_user_name, api_user_password)
api_auth_encoded = b64encode(bytes(api_auth_raw, 'utf-8'))
api_auth_decoded = api_auth_encoded.decode('ascii')
api_auth = 'Basic {0}'.format(api_auth_decoded)

### Retrieve SSH private key
ssh_key = open(os.path.expanduser('/ssh_keys/matillion/id_rsa_gitlab')).read()

### URL encode group and project
group_encoded = urllib.parse.quote(GROUP)
project_encoded = urllib.parse.quote(PROJECT)

### Configure API auth
api_body_auth = {
  "authType": "SSH",
  "privateKey": ssh_key,
  "passphrase": ""
}

### Configure API headers
api_headers = {
  "Authorization": api_auth,
  "Content-type": "application/json"
} 

### Configure the URLs for the API requests
instance_address = '127.0.0.1'
instance_port = '8080'
endpoint_path = 'group/name/{0}/project/name/{1}'.format(group_encoded, project_encoded)

fetch_endpoint = '/rest/v1/{0}/scm/{1}'.format(endpoint_path, 'fetch')
push_endpoint = '/rest/v1/{0}/scm/{1}'.format(endpoint_path, 'push')

## Fetch

### Prepare fetch options
fetch_options = {
  "removeDeletedRefs": FETCH_OPTION_REMOVE_DELETED_REFS,
  "thinFetch": FETCH_OPTION_THIN_FETCH
}

### Prepare body of the fetch API request
fetch_body = {
  "auth": api_body_auth,
  "fetchOptions": fetch_options
}
  

### Execute fetch API request

print('\n-------------FETCH START-------------\n')

fetch_conn = http.client.HTTPConnection(instance_address, instance_port)

fetch_conn.request("POST", fetch_endpoint, json.dumps(fetch_body), api_headers)

fetch_res = fetch_conn.getresponse()
fetch_data = fetch_res.read()
fetch_response_json = json.loads(fetch_data.decode("utf-8"))
print(fetch_response_json)

### Store fetch response in a Matillion variable
fetch_success_text = fetch_response_json['success']

context.updateVariable('FETCH_SUCCESS', fetch_success_text)

print('\n--------------FETCH END--------------\n')

## Push

### Prepare push options
push_options = {
  "atomic": PUSH_OPTION_ATOMIC,
  "forcePush": PUSH_OPTION_FORCE_PUSH,
  "thinPush": PUSH_OPTION_THIN_PUSH
}

### Prepare body of the push API request
push_body = {
  "auth": api_body_auth,
  "pushOptions": push_options
}

### Execute push API request

print('\n--------------PUSH START-------------\n')

push_conn = http.client.HTTPConnection(instance_address, instance_port)

push_conn.request("POST", push_endpoint, json.dumps(push_body), api_headers)

push_res = push_conn.getresponse()
push_data = push_res.read()
push_response_json = json.loads(push_data.decode("utf-8"))
print(push_response_json)

### Store push response in a Matillion variable
push_success_text = push_response_json['success']

context.updateVariable('PUSH_SUCCESS', push_success_text)

print('\n---------------PUSH END--------------\n')

 

More About the Author

Chris Hastie

Data Lead
Configure Azure Private Endpoints for Snowflake Internal Stages Snowflake and Azure are incredibly capable of storing data securely. You can see all reports available to customers through their pages ...
Configure Azure Private Link Connectivity with Snowflake Snowflake and Azure are incredibly capable of storing data securely. You can see all reports available to customers through their pages ...

See more from this author →

InterWorks uses cookies to allow us to better understand how the site is used. By continuing to use this site, you consent to this policy. Review Policy OK

×

Interworks GmbH
Ratinger Straße 9
40213 Düsseldorf
Germany
Geschäftsführer: Mel Stephenson

Kontaktaufnahme: markus@interworks.eu
Telefon: +49 (0)211 5408 5301

Amtsgericht Düsseldorf HRB 79752
UstldNr: DE 313 353 072

×

Love our blog? You should see our emails. Sign up for our newsletter!