Azure Synapse Analytics Code Repository has arrived

by Erwin | Nov 25, 2020 | Azure DevOps, Azure Synapse Analytics, GitHub

Azure Synapse Analytics Code repository

‎I just opened my Azure Synapse Analytics Workspace and got a great surprise, the option Git Configuration is available as of today‎.

After a long wait, today the Git Configuration option became available in Azure Synapse Analytics.

The setup isn't much different from Azure Data Factory, which can be found in this link.

The difference is that we no longer use an adf_publish branch but a workspace_publish branch. Which makes sense if you want to use both Azure Services side by side. In this blog I do quick walkthrough with the Azure Dev Ops Configuration enabled.

Once we have configured everything, we can walk through the Git Configuration options within Azure Synapse Analytics. I'm sure there will be a lot of them, but below is a list of the ones I noticed first.

Synapse live

After you published your code, it will be available in Synapse Live, like in Azure Data Factory you develop everything in Azure DevOps branches.

Notebooks

After creating a Notebook, we have the option Commit, after you have committed it will be directly saved within your current working branch.

SQL Scripts

Like Notebooks, After creating a SQL Script, we can Commit, after you have committed it will be directly saved within your current working branch.

Pipelines

Also here we have now a Commit option.

Workspace_publish

Beside the Notebooks and the SQL Scripts we can also store the Credentials and Spark Job definitions in Azure Dev Ops

Differences

So as we can see the main differences between Azure Data Factory and Azure Synapse Analytics are:

Workspace_publish branch instead of adf_publish branch.

Commit instead of Save.

Azure Data Factory Pipelines

By moving your code from Azure Data Factory to Azure Synapse Analytics in Azure Dev Ops your Azure Data Factory Configuration will be available in Azure Synapse Analytics.

I added my ADF code to Azure Synapse in Azure Dev Ops and it looks the same.

After Refreshing the Azure Synapse Analytics Workspace, in the Data Hub we see the Integration Datasets(ADF DataSets) and the Linked Storage accounts.

And in the Integrate Hub, we see all our Pipelines. And the same is working for our triggers

It looks like that we can reuse our code quite easily. I haven't tested everything yet but I wanted to share this with you as quick as possible. I'm sure a easier way to migrate from Azure Data Factory to Azure Synapse will be on his way, you can use above as a start.

Integration Runtimes

Does everything work as in Azure Data Factory, NO at this moment you can't use the Azure SSIS Integration Runtime and the shared Self Hosted Integration Runtime? But hopefully this will take not that long before it will arrive.

Thank you for reading, this was a quick overview of the first changes I discovered. Please feel free to leave comment if you have discovered more.

Do you want to become more familiar with the various possibilities of Azure Synapse Analytics, please read the following articles which I published a while ago:

✅ Creating your Azure Synapse Analytics Workspace

✅ Exploring the new Azure Synapse Analytics Studio

✅ Creating an Apache Spark Pool

✅ Creating a SQL Pool

✅ Integration with Power BI

Latest FMD Releases

How to setup Code Repository in Azure Data Factory

by Erwin | Nov 5, 2020 | Azure, Azure Data Factory, Azure DevOps, Azure Synapse Analytics, GitHub

Month: November 2020

by Erwin | Nov 5, 2020

Why activate a Git Configuration?

The main reasons are:

Source Control: Ensures that all your changes are saved and traceable, but also that you can easily go back to a previous version in case of a bug.
Continuous Integration and Continuous Delivery (CI/CD): Allows you to Create build and release pipelines for easy release to other Data Factory instance, manually or triggered(DTAP).
Collaboration: You have the ability to easily collaborate in the same Data Factory with different colleagues.
Performance: Your Data Factory from Git is 10 times faster then loading directly from the Data Factory Service.

So enough reasons to start enabling your Git Configuration.

How to setup your Code Repository in Azure Data Factory!

During the configuration/set up of your Data Factory you have the possibility to select either Azure DevOps or GitHub as your Git Configuration. If you haven't done that, you can still configure this integration in Azure Data Factory. The procedure for both options are the same.

In my previous article, Creating an Azure Data Factory Instance, I skipped the Git Configuration. In this article I will explain how to do this in an already created Data Factory.

On the right of your splash screen when opening your Data Factory select the Setup Code Repository. Other options to start configuring your Code Repository are through the Management Hub or in the UX on the top left in the authoring canvas. If you don't see the option, Code Repository is already configured. You can check this in the Management Hub or UX.

We have the option to configure Azure DevOps or GitHub.

Azure DevOps integration

First I will take you through the configuration of Azure DevOps and then also create a similar configuration in GitHub. If you want to start directly in GitHub, click here.

Select Azure DevOps Git:

Azure Active Directory: Select the AAD where your Azure DevOps environment is located. If you use another AAD, make sure that this account has rights to that environment.
Azure DevOps Account: Select your Account.
Project Name: Select the Project Name where you want to store your repository in.
Git Repository: Create a new Project.
Collaboration Branch: Change this to Main.
Publish Branch: Leave this on adf_publish.
Root folder: If you want to create a complete project with SQL,Azure Analysis Service, Azure DataBricks etc etc, you define a root folder and create your repository into that folder.
Import: When this is a blank Data Factory, you can disable this option. When you have create already resources in your Data Factory, you should enable this so already created resources are committed to the repository.

Click on apply and you will see that you repository is connected.

When you log in to your Azure Dev Ops Environment, you will see that a new Repository is created Main Branch.

Go back to your Data Factory and click on Publish.

In Azure DevOps the adf_publish Branch is now also created.

GitHub Integration

In the repository screen, select GitHub:

The first time you connect with your Data Factory you need to login in GitHub.

Once connect you to need to Authorize your Data Factory.

All the settings are almost the same as in Azure DevOps:

Use GitHub Server Enterprise: If enabled fill the The GitHub Enterprise root URL.
GitHub Account: Select your Account.
Project Name: Select the Project Name where you want to store your repository in.
Git Repository: Create a new Project.
Collaboration Branch: Leave this on Main.
Publish Branch: Leave this on adf_publish.
Root folder: If you want to create a complete project with SQL, Azure Analysis Service, Azure DataBricks etc etc, you define a root folder and create your repository into that folder.
Import: When this is a blank Data Factory, you can disable this option. When you have create already resources in your Data Factory, you should enable this so already created resources are committed to the repository.

Click on apply and you will see that you repository is connected.

Log in to your GitHub, a new Repository is created Main Branch. If you go back to your Data Factory and click on Publish.

In GitHub the adf_publish Branch is now also created.

As you can see the Setup for Azure Dev Ops and GitHub are mostly the same. You have now learned how to connect your Data Factory to a Code Repository. You're now ready to start building your Release and build pipeline's.

Thanks for reading and in case you have some questions, please leave them in the comments below.

Latest Posts

Feel free to leave a comment

Azure Data Factory Let’s get started

by Erwin | Nov 3, 2020 | Azure, Azure Data Factory, Azure Synapse Analytics

Creating an Azure Data Factory Instance, let’s get started

Many blogs nowadays are about which functionalities we can use within Azure Data Factory.
But how do we create an Azure Data Factory instance in Azure for the first time and what should you take into account? In this article I will take you step by step on how to get started.

First we have to login in the Azure Portal.

Search for Data Factories and select the Data Factory service.

Secondly we have to create a Data Factory Instance.

Fill in the required fields:

Subscription => Select your Azure subscription in which you want to create the Data Factory.
Resource Group =>Select Use existing, and select an existing resource group from the list or click on Create new, and enter the name of a resource group(a new Resource Group will be created)
Region => Select the desired Region/Location, this is where your Azure Data Factory meta data will be stored and has nothing to do where you create your compute or store your Data Stores.
Name = > Create a unique name in Azure.
Version => Always select V2 here, this contains the very latest developments and functionalities. V1 is only used for migration from another V1 instance.

Select Next: Git configuration

Enable the option to configure Git later, we will configure this later in Azure Data Factory.

Select Next: Networking:

Leave the options as is. I will explain the Connectivity Method in one of my next articles.

Select Next: Review + Create:

Your Azure Data Factory Instance will be created. Once you have created your Azure Data Factory, it is ready to use and you can open it from selected Resource Groups above:

Select Author & Monitor:

Encrypt your Azure Data Factory with customer-managed keys

Azure Data Factory encrypts data at rest, including entity definitions and any data cached while runs are in progress. By default, data is encrypted with a randomly generated Microsoft-managed key that is uniquely assigned to your data factory. But you also Bring Your Own Key (BYOK) more details can be find in my previous written article “Azure Data Factory: How to assign a Customer Managed Key“

Please be aware that you have to assign this key on an empty Azure Data Factory Instance.

Roles for Azure Data Factory

Data Factory Contributor role:

Assign the built-in Data Factory Contributor role, must be set on Resource Group Level if you want the user to create a new Data Factory on Resource Group Level otherwise you need to set it on Subscription Level.

User can:

Create, edit, and delete data factories and child resources including datasets, linked services, pipelines, triggers, and integration runtimes.
Deploy Resource Manager templates. Resource Manager deployment is the deployment method used by Data Factory in the Azure portal.
Manage App Insights alerts for a Data Factory.
Create support tickets.

Reader Role:

Assign the built-in reader role on the Data Factory resource for the user.