How to setup Code Repository in Azure Data Factory

How to setup Code Repository in Azure Data Factory

Why activate a Git Configuration?

The main reasons are:

  1. Source Control: Ensures that all your changes are saved and traceable, but also that you can easily go back to a previous version in case of a bug.
  2. Continuous Integration and Continuous Delivery (CI/CD): Allows you to Create build and release pipelines for easy release to other Data Factory instance, manually or triggered(DTAP).
  3. Collaboration: You have the ability to easily collaborate in the same Data Factory with different colleagues.
  4. Performance: Your Data Factory from Git is 10 times faster then loading directly from the Data Factory Service.

So enough reasons to start enabling your Git Configuration.

How to setup your Code Repository in Azure Data Factory!

During the configuration/set up of your Data Factory you have the possibility to select either Azure DevOps or GitHub as your Git Configuration. If you haven’t done that, you can still configure this integration in Azure Data Factory. The procedure for both options are the same.
Create ADF Version Control
In my previous article, Creating an Azure Data Factory Instance, I skipped the Git Configuration. In this article I will explain how to do this in an already created Data Factory.

Azure Data Factory Source Control

On the right of your splash screen when opening your Data Factory select the Setup Code Repository. Other options to start configuring your Code Repository are through the Management Hub or in the UX on the top left in the authoring canvas. If you don’t see the option, Code Repository is already configured. You can check this in the Management Hub or UX.

We have the option to configure Azure DevOps or GitHub.

Azure DevOps integration

First I will take you through the configuration of Azure DevOps and then also create a similar configuration in GitHub. If you want to start directly in GitHub, click here.

Select Azure DevOps Git:

Azure Dev Ops Config

  1. Azure Active Directory: Select the AAD where your Azure DevOps environment is located. If you use another AAD, make sure that this account has rights to that environment.
  2. Azure DevOps Account: Select your Account.
  3. Project Name: Select the Project Name where you want to store your repository in.
  4. Git Repository: Create a new Project.
  5. Collaboration Branch: Change this to Main.
  6. Publish Branch: Leave this on adf_publish.
  7. Root folder: If you want to create a complete project with SQL,Azure Analysis Service, Azure DataBricks etc etc, you define a root folder and create your repository into that folder.
  8. Import: When this is a blank Data Factory, you can disable this option. When you have create already resources in your Data Factory, you should enable this so already created resources are committed to the repository.

Click on apply and you will see that you repository is connected.

Repo Connected

When you log in to your Azure Dev Ops Environment, you will see that a new Repository is created Main Branch. Azure Dev Ops Main

Go back to your Data Factory and click on Publish.

Data Factory Publish

In Azure DevOps the adf_publish Branch is now also created.Azure Dev Ops Publish

GitHub Integration

In the repository screen, select GitHub:

Github login

The first time you connect with your Data Factory you need to login in GitHub.

Github authorize

Once connect you to need to Authorize your Data Factory.

Github Configuration

All the settings are almost the same as in Azure DevOps:

  1. Use GitHub Server Enterprise: If enabled fill the The GitHub Enterprise root URL.
  2. GitHub Account: Select your Account.
  3. Project Name: Select the Project Name where you want to store your repository in.
  4. Git Repository: Create a new Project.
  5. Collaboration Branch: Leave this on Main.
  6. Publish Branch: Leave this on adf_publish.
  7. Root folder: If you want to create a complete project with SQL, Azure Analysis Service, Azure DataBricks etc etc, you define a root folder and create your repository into that folder.
  8. Import: When this is a blank Data Factory, you can disable this option. When you have create already resources in your Data Factory, you should enable this so already created resources are committed to the repository.

Click on apply and you will see that you repository is connected.

Repo Connected

Log in to your GitHub, a new Repository is created Main Branch. If you go back to your Data Factory and click on Publish.

Data Factory Publish

In GitHub the adf_publish Branch is now also created.

GitHub Publish

As you can see the Setup for Azure Dev Ops and GitHub are mostly the same. You have now learned how to connect your Data Factory to a Code Repository. You’re now ready to start building your Release and build pipeline’s.

Thanks for reading and in case you have some questions, please leave them in the comments below.

Feel free to leave a comment

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

four × 2 =

This site uses Akismet to reduce spam. Learn how your comment data is processed.