Scottisch Summit 2021(Video)
Data Factory
Recording of my session during Scottisch Summit 2021
Is there a way that we can build our Azure DataFactory all with parameters based on MetaData?
Recording of my session during Scottisch Summit 2021
Is there a way that we can build our Azure DataFactory all with parameters based on MetaData?
This Saturday I've been speaking during Scottisch Summit 2021. It was my first Summit, but is was a great event, with more than  400 sessions covering the full Microsoft Stack in 7 different language English, Spanish, German, French, Italian, Portuguese and Polish. Proud that I was to able to join and to present.
I presented a session on if there is a way that we can build our Azure Data Factory all with parameters based on MetaData?
In the beginning of my sessions the audio wasn't that well. I just double checked my uploaded recording and in there audio was fine.
You can watch the recording of my session Scottisch Summit 2021
Code from my demos can be found here.
You can find my slides below on Slideshare:
In case you have any questions left please feel free to ask them via the comment or Socials.
In my previous article, I explained how you can Pause and Resume your Dedicated SQL Pool with a Pipeline in Azure Synapse Analytics. In this article I will explain how to scale up and down a SQL Pool via a Pipeline in Azure Synapse Analytics. This is actually a necessary functionality during your Data Movement Solutions. In this way you can optimize costs.
The Pipeline can be added before and after your Nightly Run.
As a quick resume from the previous article, a SQL Pool can have different statuses:
To allow the Synapse workspace to call the REST API we need to give the Synapse workspace access to the SQL Pool. In the Access control (IAM) of the SQL Pool assign the contributor role to your Synapse Workspace.
Clone the Pipeline PL_ACT_RESUME_SQLPOOL and rename it to PL_ACT_SCALE_SQLPOOL.
Change the description of the Pipeline, ‘Pipeline to SCALE a Synapse Dedicated SQL Pool‘
Add the PerformanceLevel parameter to the Parameters of the Pipeline:
Action:Â Â RESUME(Leave this on RESUME, if we want to SCALE the SQL Pool must be Online)
WaitTime:Â Wait time in seconds before the Pipeline will finish
WaitTimeUntil: Wait time in seconds for the retry process
Synapse_ResourceGroupName:Â Name of the ResourceGroup of the used Synapse Workspace
SynapseWorkspace:Â SynapseWorkspace
SynapseDedicatedSQLPool: Name of the dedicated SQL Pool
SubsriptionId:Â SubscriptionId of Synapse Workspace
PerformanceLevel:Â The Database Performance level (DW100c, DW200c, DW300c, DW400c DW500c, DW1000c, DW1000c, DW1500c, DW2000c, DW2500c, DW3000c, DW5000c, DW6000c, DW7500c, DW10000c, DW15000c, DW30000c)
We leave the first two activities as is. The Pipeline can only continue when the status is Paused or Online and not one of the other statuses. When the SQL Pool is Paused, the second activity will Resume the SQL Pool.
To Scale the SQL Pool we need add a new Web Activity.
Name = SCALE SQLPOOL
URL=Â https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Synapse/workspaces/XXX/sqlPools/XXX/?api-version=2019-06-01-preview
The <xxx> we need to replace with the Pipeline Parameters. The final Result will be:
https://management.azure.com/subscriptions/@{pipeline().parameters.SubscriptionID}/resourceGroups/@{pipeline().parameters.Synapse_ResourceGroupName}/providers/Microsoft.Synapse/workspaces/@{pipeline().parameters.SynapseWorkspace}/sqlPools/@{pipeline().parameters.SynapseDedicatedSQLPool}/?api-version=2019-06-01-preview
Method = PATCH
Headers = Name = Content-Type   Value= application/json
Body = { “sku”: { “name”: ‘@{pipeline().parameters.PerformanceLevel}’ } }
Resource =https://management.azure.com/
Please feel free to download the Pipeline code here.
Add the above Pipeline as a Start Pipeline before your Daily run and Scale up to the desired Performance Level. When the Daily run is finished you Scale Down to a lower level or can you add the Pipeline to Pause the SQL Pool.
If you’re already using a database where you store your Meta Data, you can create a table where you store the desired Performance Level The only thing you need to do is adding a Lookup Activity to get the parameters from your database and replace the parameters with the output from the lookup activity.
[sql] CREATE TABLE [configuration].[Database_Level]( [Id] [int] IDENTITY(1,1) NOT NULL, [DatabaseName] [varchar](30) NULL, [DatabaseLevel] [varchar](10) NOT NULL, [PerformanceLevel] [varchar](10) NOT NULL, CONSTRAINT [PK_Pipeline_ExecutionLog] PRIMARY KEY CLUSTERED ( [Id] DESC )WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY] ) ON [PRIMARY] [/sql]
A SQL Pool(Former SQL DW) linked to a SQL (Logical) Server has a slightly different approach.
Use the settings below to create a Pipeline to Scale the SQL Pool.
Action:Â Â RESUME
WaitTime:Â Wait time in seconds before the Pipeline will finish
WaitTimeUntil: Wait time in seconds for the retry process
SQLServer_ResourceGroupName: Name of the ResourceGroup of the used SQL(Logical) Server
SQLServer: SQL(Logical) Server name
SQLServerDedicatedSQLPool: Name of the dedicated SQL Pool
SubsriptionId:Â SubscriptionId of Synapse Workspace
DatabaseTier:Â The Database Performance level (DW100c, DW200c, DW300c, DW400c DW500c, DW1000c, DW1000c, DW1500c, DW2000c, DW2500c, DW3000c, DW5000c, DW6000c, DW7500c, DW10000c, DW15000c, DW30000c)
Body: { “requestedServiceObjectiveName”: { “name”: ‘@{pipeline().parameters.PerformanceLevel}’ } }
Hopefully this article has helped you a step further. As always, if you have any questions, leave them in the comments.
Azure Synapse Analytics went GA in beginning of December 2020, with Azure Synapse we can now also create a Dedicated SQL Pool(formerly Azure SQL DW). Please read this document to learn what a Dedicated SQL Pool is. This article describes how to Pause or Resume a SQL Pool within Azure Synapse Analytics. A SQL Pool(Former Azure SQL DW) linked to a SQL (Logical) Server has a slightly different approach.
A SQL Pool is a MPP Database (short for massively parallel processing) and has a different approach of loading data but also different kind of pricing. This are details for another Blogpost.
In this article, we are going to build a Synapse Pipeline which will call a REST API. The concept is based on an earlier post about Analysis Services:Â Use Global Parameters to Suspend and Resume your Analysis Services in ADF.
A SQL Pool can have different statuses:
To allow the Synapse workspace to call the REST API we need to give the Synapse workspace access to the SQL Pool. In the Access control (IAM) of the SQL Pool assign the contributor role to your Synapse Workspace.
Create a new Pipeline with the name PL_ACT_PAUSE_SQLPOOL
Add the following Parameters to the Pipeline:
Above are the generic Parameters used within the Pipeline.
Action:Â Â PAUSE or RESUME
WaitTime:Â Wait time in seconds before the Pipeline will finish
WaitTimeUntil: Wait time in seconds for the retry process
Synapse_ResourceGroupName:Â Name of the ResourceGroup of the used Synapse Workspace
SynapseWorkspace:Â SynapseWorkspace
SynapseDedicatedSQLPool: Name of the dedicated SQL Pool
SubsriptionId:Â SubscriptionId of Synapse Workspace
We can only change the status when the SQL Pool is Paused or Online That’s why we need to add an Until activity to start the Pipeline. It executes a set of activities in a loop until the condition associated with the activity evaluates to true.
With this activity we can check the status of the SQL Pool and wait until it becomes Paused or Online. Let me explain how this works.
Within the Until Activity we need to create a new Web Activity. A Web Activity can be used to call a custom REST API endpoint from a Synapse Data pipeline.
Name = Check for changed SQLPool Status
URL=Â https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Synapse/workspaces/XXX/sqlPools/XXX/?api-version=2019-06-01-preview
The <xxx> we need to replace with the Pipeline Parameters. The final Result will be:
https://management.azure.com/subscriptions/@{pipeline().parameters.SubscriptionID}/resourceGroups/@{pipeline().parameters.Synapse_ResourceGroupName}/providers/Microsoft.Synapse/workspaces/@{pipeline().parameters.SynapseWorkspace}/sqlPools/@{pipeline().parameters.SynapseDedicatedSQLPool}/?api-version=2019-06-01-preview
Method = GET
Resource =https://management.azure.com/
Once we have created the Web Activity, we can define the expression for the Until Activity.
The Pipeline can only continue when the status is Paused or Online and not one of the other statuses. That’s the reason we need to add these 2 two statuses to check for.
Expression: @or(bool(startswith(activity(‘Check for changed SQLPool Status’).Output.Properties.status,’Paused’)),Bool(startswith(activity(‘Check for changed SQLPool Status’).Output.Properties.status,’Online’)))
Time out: 0.00:20:00
The Until Activity will only continue, when the status from the above Web Activity output is Paused or Online, this can take a while and we don’t want to execute the Web Activity every time. That’s why we add a Wait Activity.
A Wait Activity waits for the specified period of time before continuing with execution of subsequent activities. Within the Wait Activity add an expression from above parameters for Wait time seconds.
After the Web Activity, Azure Synapse waits in this case 30 seconds to check if the status has changed before it will check again.
To check if the SQL Pool is paused, we’re adding an If Condition Activity (Name: Check if SQL POOL is Paused)
Add an Expression on the If Condition Activity @bool(startswith(activity(‘Check for changed SQLPool Status’).Output.Properties.status,’Paused’))
This expression will check if the SQL Pool is Paused or not. In this situation we want to Pause our SQL Pool, to Pause the SQL Pool we need to add as Activity for pausing(see below) to False. In case the SQL Pool is already Paused we do nothing(True).
The following settings are set for the Web Activity:
URL: https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Synapse/workspaces/XXX/sqlPools/XXX/{Action}?api-version=2019-06-01-preview
The <xxx> we need to replace with the Pipeline Parameters. The final Result will be:
https://management.azure.com/subscriptions/{pipeline().parameters.SubscriptionID}/resourceGroups/{pipeline().parameters.Synapse_ResourceGroupName}/providers/Microsoft.Synapse/workspaces/{pipeline().parameters.SynapseWorkspace}/sqlPools/{pipeline().parameters.SynapseDedicatedSQLPool}/@{pipeline().parameters.Action}?api-version=2019-06-01-preview
It is almost the same URL but we have to add the action option @{pipeline().parameters.Action}Â
Method = Post
Header = {“Nothing”:”Nothing”}
Resource =https://management.azure.com/
Add a Wait Activity but this time with a different parameter @pipeline().parameters.WaitTime, the purpose of this activity is to wait a period before we start ingestion data(just to be sure the SQL Pool in online)
Clone your PL_ACT_PAUSE_SQLPOOL and rename it to PL_ACT_RESUME_SQLPOOL. Change your action Parameter to “Online”.
Within the IF Condition move the Web Activity Pause SQL Pool and the Wait Activity from False to True and rename to Resume SQL Pool.
You have now learned how to Pause and Resume your SQL Pool Dynamically with the use of Parameters. Both Pipelines can be easily transferred to different customers.
Please feel free to download the Pipeline code here
If you’re already using a database where you store your Meta Data, then you have also the possibility to store the necessary parameters in the database. The only thing you need to do is adding a Lookup Activity to get the parameters from your database(and replace the parameters with the output from the lookup activity)
A SQL Pool(Former SQL DW) linked to a SQL (Logical) Server has a slightly different approach, use the settings below to create a Pipeline to Pause or Resume.
Action:Â Â PAUSE or RESUME
WaitTime:Â Wait time in seconds before the Pipeline will finish
WaitTimeUntil: Wait time in seconds for the retry process
SQLServer_ResourceGroupName: Name of the ResourceGroup of the used SQL(Logical) Server
SQLServer: SQL(Logical) Server name
SQLServerDedicatedSQLPool: Name of the dedicated SQL Pool
SubsriptionId:Â SubscriptionId of Synapse Workspace
Pause: https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Sql/servers/XXX/databases/XXX/Pause?api-version=2020-08-01-preview
Resume: https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Sql/servers/XXX/databases/XXX/Resume?api-version=2020-08-01-preview
Status: https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Sql/servers/XXX/databases/XXX/?api-version=2020-08-01-preview
To allow the Synapse workspace to call the REST API we need to give the Synapse workspace access to the SQL(Logical) Server. In the Access control (IAM) of the SQL(Logical) Server assign the SQL DB Contributor role to your Synapse Workspace.
Hopefully this article has helped you a step further. As always, if you have any questions, leave them in the comments.
The main reasons are:
So enough reasons to start enabling your Git Configuration.
During the configuration/set up of your Data Factory you have the possibility to select either Azure DevOps or GitHub as your Git Configuration. If you haven't done that, you can still configure this integration in Azure Data Factory. The procedure for both options are the same.
In my previous article, Creating an Azure Data Factory Instance, I skipped the Git Configuration. In this article I will explain how to do this in an already created Data Factory.
On the right of your splash screen when opening your Data Factory select the Setup Code Repository. Other options to start configuring your Code Repository are through the Management Hub or in the UX on the top left in the authoring canvas. If you don't see the option, Code Repository is already configured. You can check this in the Management Hub or UX.
We have the option to configure Azure DevOps or GitHub.
First I will take you through the configuration of Azure DevOps and then also create a similar configuration in GitHub. If you want to start directly in GitHub, click here.
Select Azure DevOps Git:
Click on apply and you will see that you repository is connected.
When you log in to your Azure Dev Ops Environment, you will see that a new Repository is created Main Branch.
Go back to your Data Factory and click on Publish.
In Azure DevOps the adf_publish Branch is now also created.
In the repository screen, select GitHub:
The first time you connect with your Data Factory you need to login in GitHub.
Once connect you to need to Authorize your Data Factory.
All the settings are almost the same as in Azure DevOps:
Click on apply and you will see that you repository is connected.
Log in to your GitHub, a new Repository is created Main Branch. If you go back to your Data Factory and click on Publish.
In GitHub the adf_publish Branch is now also created.
As you can see the Setup for Azure Dev Ops and GitHub are mostly the same. You have now learned how to connect your Data Factory to a Code Repository. You're now ready to start building your Release and build pipeline's.
Thanks for reading and in case you have some questions, please leave them in the comments below.