Proud to be an ADF Hackathon winner!

Proud to be an ADF Hackathon winner!

Azure SQL Data Warehouse

by Erwin | Mar 15, 2021

Azure Data Factory Hackathon

Around the clock

During the event 'Around the clock Azure SQL and Azure Data Factory' event on Feb 3, 2021, they kicked-off the Azure Data Factory (ADF) Hackathon. Recording of this event can be found here.

Winner

I submitted a ADF Pipeline Template "Scale Dedicated SQL Pool Dynamically using Azure Data Factory control flow" and my submission was marked as WINNER. I am very proud that a simple template where you can easily save costs has won. See full post of the announcement Announcing the Azure Data Factory Hackathon winners! | LinkedIn

This template will help you can to scale up and down a Dedicated SQL Pool in Azure Synapse Analytics.

The pipelines is designed to Scale a SQL Pool within Azure Synapse Analytics. A SQL Pool(Former Azure SQL DW) linked to a SQL (Logical) Server has a slightly different approach(documentation can be found on Github).

Scaling a SQL Pool is actually a necessary functionality during your Data Movement Solutions, it will help to save and optimize your costs.

ADF Synapse SQL Pool Scale

Documentation of this pipeline can be found on GitHub.

You can also use this template in Azure Synapse and the details can also be found on Github more details can also be found in this Article.

 

 

In case you have unanswered questions please do not hesitate to contact me.

Feel free to leave a comment

Scale your SQL Pool dynamically in Azure Synapse

Scale your SQL Pool dynamically in Azure Synapse

Scale your Dedicated SQL Pool in Azure Synapse Analytics

In my previous article, I explained how you can Pause and Resume your Dedicated SQL Pool with a Pipeline in Azure Synapse Analytics. In this article I will explain how to scale up and down a SQL Pool via a Pipeline in Azure Synapse Analytics. This is actually a necessary functionality during your Data Movement Solutions. In this way you can optimize costs.

The Pipeline can be added before and after your Nightly Run.

SQL Pool Performance Level

As a quick resume from the previous article, a SQL Pool can have different statuses:

  • Pausing: SQL Pool is Pausing and we cannot change the status.
  • Resuming: SQL Pool is Resuming, the SQL Pool starting and during this  process and we cannot change the status.
  • Scaling: SQL Pool is Scaling, the SQL Pool is scaling to a different compute level and during this  process we cannot change the status.
  • Paused: SQLPool is Paused, we can now change the status.
  • Online: SQLPool is Online, we can now change the status.

To allow the Synapse workspace to call the REST API we need to give the Synapse workspace access to the SQL Pool. In the Access control (IAM) of the SQL Pool assign the contributor role to your Synapse Workspace.

Build Pipeline

Clone the Pipeline PL_ACT_RESUME_SQLPOOL and rename it to PL_ACT_SCALE_SQLPOOL.

Change the description of the Pipeline, ‘Pipeline to SCALE a Synapse Dedicated SQL Pool

Synapse SQL Pool Description

Add the PerformanceLevel parameter to the Parameters of the Pipeline:

Synapse SQL Pool Scale Parameters

Action:   RESUME(Leave this on RESUME, if we want to SCALE the SQL Pool must be Online)

WaitTime: Wait time in seconds before the Pipeline will finish

WaitTimeUntil: Wait time in seconds for the retry process

Synapse_ResourceGroupName: Name of the ResourceGroup of the used Synapse Workspace

SynapseWorkspace: SynapseWorkspace

SynapseDedicatedSQLPool: Name of the dedicated SQL Pool

SubsriptionId: SubscriptionId of Synapse Workspace

PerformanceLevel:  The Database Performance level (DW100c, DW200c, DW300c, DW400c DW500c, DW1000c, DW1000c, DW1500c, DW2000c, DW2500c, DW3000c, DW5000c, DW6000c, DW7500c, DW10000c, DW15000c, DW30000c)

Web Activity

Synapse SQL Pool Scale pipeline

We leave the first two activities as is. The Pipeline can only continue when the status is Paused or Online and not one of the other statuses. When the SQL Pool is Paused, the second activity will Resume the SQL Pool.

To Scale the SQL Pool we need add a new Web Activity.

Synapse SQL Pool Scale Header

Name = SCALE SQLPOOL

URL= https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Synapse/workspaces/XXX/sqlPools/XXX/?api-version=2019-06-01-preview

The  <xxx> we need to replace with the Pipeline Parameters. The final Result will be:

https://management.azure.com/subscriptions/@{pipeline().parameters.SubscriptionID}/resourceGroups/@{pipeline().parameters.Synapse_ResourceGroupName}/providers/Microsoft.Synapse/workspaces/@{pipeline().parameters.SynapseWorkspace}/sqlPools/@{pipeline().parameters.SynapseDedicatedSQLPool}/?api-version=2019-06-01-preview

Method = PATCH

Headers =  Name = Content-Type     Value= application/json

Body = { “sku”: { “name”: ‘@{pipeline().parameters.PerformanceLevel}’ } }

Resource =https://management.azure.com/

Please feel free to download the Pipeline code here.

DAILY RUN

Add the above Pipeline as a Start Pipeline before your Daily run and Scale up to the desired Performance Level. When the Daily run is finished you Scale Down to a lower level or can you add the Pipeline to Pause the SQL Pool.

Synapse SQL Pool Scale RUN1

Metadata

If you’re already using a database where you store your Meta Data, you can create a table where you store the desired Performance Level The only thing you need to do is adding a Lookup Activity to get the parameters from your database and replace the parameters with the output from the lookup activity.

[sql]
CREATE TABLE [configuration].[Database_Level](
[Id] [int] IDENTITY(1,1) NOT NULL,
[DatabaseName] [varchar](30) NULL,
[DatabaseLevel] [varchar](10) NOT NULL,
[PerformanceLevel] [varchar](10) NOT NULL,
CONSTRAINT [PK_Pipeline_ExecutionLog] PRIMARY KEY CLUSTERED
(
[Id] DESC
)WITH (STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]

[/sql]

Performance Database table

Synapse SQL Pool MetaData

A SQL Pool(Former SQL DW)

A SQL Pool(Former SQL DW) linked to a SQL (Logical) Server has a slightly different approach.

Use the settings below to create a Pipeline to Scale the SQL Pool.

Action:   RESUME

WaitTime: Wait time in seconds before the Pipeline will finish

WaitTimeUntil: Wait time in seconds for the retry process

SQLServer_ResourceGroupName: Name of the ResourceGroup of the used SQL(Logical) Server

SQLServer: SQL(Logical) Server name

SQLServerDedicatedSQLPool: Name of the dedicated SQL Pool

SubsriptionId: SubscriptionId of Synapse Workspace

DatabaseTier:  The Database Performance level (DW100c, DW200c, DW300c, DW400c DW500c, DW1000c, DW1000c, DW1500c, DW2000c, DW2500c, DW3000c, DW5000c, DW6000c, DW7500c, DW10000c, DW15000c, DW30000c)

Body: { “requestedServiceObjectiveName”: { “name”: ‘@{pipeline().parameters.PerformanceLevel}’ } }

 

Hopefully this article has helped you a step further. As always, if you have any questions, leave them in the comments.

Azure Synapse Pause and Resume SQL Pool

Azure Synapse Pause and Resume SQL Pool

Pause or Resume your Dedicated SQL Pool in Azure Synapse Analytics

Azure Synapse Analytics went GA in beginning of December 2020, with Azure Synapse we can now also create a Dedicated SQL Pool(formerly Azure SQL DW). Please read this document to learn what a Dedicated SQL Pool is. This article describes how to Pause or Resume a SQL Pool within Azure Synapse Analytics. A SQL Pool(Former Azure SQL DW) linked to a SQL (Logical) Server has a slightly different approach.

A SQL Pool is a MPP Database (short for massively parallel processing) and has a different approach of loading data but also different kind of pricing. This are details for another Blogpost.

 

Azure Synapse Dedicated SQL Pool

In this article, we are going to build a Synapse Pipeline which will call a REST API. The concept is based on an earlier post about Analysis Services:  Use Global Parameters to Suspend and Resume your Analysis Services in ADF.

A SQL Pool can have different statuses:

  • Pausing: SQL Pool is Pausing and we cannot change the status.
  • Resuming: SQL Pool is Resuming, the SQL Pool starting and during this  process and we cannot change the status.
  • Scaling: SQL Pool is Scaling, the SQL Pool is scaling to a different compute level and during this  process we cannot change the status.
  • Paused: SQLPool is Paused, we can now change the status.
  • Online: SQLPool is Online, we can now change the status.

To allow the Synapse workspace to call the REST API we need to give the Synapse workspace access to the SQL Pool. In the Access control (IAM) of the SQL Pool assign the contributor role to your Synapse Workspace.

Build Pipeline

Create a new Pipeline with the name PL_ACT_PAUSE_SQLPOOL

Add the following Parameters to the Pipeline:

Azure Synapse Dedicated SQL Pool Parameter

Above are the generic Parameters used within the Pipeline.

Action:   PAUSE or RESUME

WaitTime: Wait time in seconds before the Pipeline will finish

WaitTimeUntil: Wait time in seconds for the retry process

Synapse_ResourceGroupName: Name of the ResourceGroup of the used Synapse Workspace

SynapseWorkspace: SynapseWorkspace

SynapseDedicatedSQLPool: Name of the dedicated SQL Pool

SubsriptionId: SubscriptionId of Synapse Workspace

Until Activity

We can only change the status when the SQL Pool is Paused or Online That’s why we need to add  an Until activity to start the Pipeline. It executes a set of activities in a loop until the condition associated with the activity evaluates to true.

Azure Synapse Until

With this activity we can check the status of the SQL Pool and wait until it becomes Paused or Online. Let me explain how this works.

Web Activity

Within the Until Activity we need to create a new Web Activity. A Web Activity can be used to call a custom REST API endpoint from a Synapse Data pipeline.

Name = Check for changed SQLPool Status

URL= https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Synapse/workspaces/XXX/sqlPools/XXX/?api-version=2019-06-01-preview

The  <xxx> we need to replace with the Pipeline Parameters. The final Result will be:

https://management.azure.com/subscriptions/@{pipeline().parameters.SubscriptionID}/resourceGroups/@{pipeline().parameters.Synapse_ResourceGroupName}/providers/Microsoft.Synapse/workspaces/@{pipeline().parameters.SynapseWorkspace}/sqlPools/@{pipeline().parameters.SynapseDedicatedSQLPool}/?api-version=2019-06-01-preview

Method = GET

Resource =https://management.azure.com/

Azure Synapse Web Activity

Once we have created the Web Activity, we can define the expression for the Until Activity.

Azure Synapse Until expresssion

The Pipeline can only continue when the status is Paused or Online and not one of the other statuses. That’s the reason we need to add these 2 two statuses to check for.

Expression: @or(bool(startswith(activity(‘Check for changed SQLPool Status’).Output.Properties.status,’Paused’)),Bool(startswith(activity(‘Check for changed SQLPool Status’).Output.Properties.status,’Online’)))

Time out: 0.00:20:00

The Until Activity will only continue, when the status from the above Web Activity output is Paused or Online, this can take a while and we don’t want to execute the Web Activity every time. That’s why we add a Wait Activity.

Wait Activity

A Wait Activity  waits for the specified period of time before continuing with execution of subsequent activities. Within the Wait Activity add an expression from above parameters for Wait time seconds.

Azure Synapse Wait Activity

After the Web Activity, Azure Synapse waits in this case 30 seconds to check if the status has changed before it will check again.

Azure Synapse Until inside

Check for the SQL Pool Status

To check if the SQL Pool is paused, we’re adding an If Condition Activity (Name: Check if SQL POOL is Paused)

Azure Synapse Paused

Add an Expression on the If Condition Activity @bool(startswith(activity(‘Check for changed SQLPool Status’).Output.Properties.status,’Paused’))

This expression will check if the SQL Pool is Paused or not. In this situation we want to Pause our SQL Pool, to Pause the SQL Pool we need to add as  Activity for pausing(see below) to False. In case the SQL Pool is already Paused we do nothing(True).

Azure Synapse If inside

The following settings are set for the Web Activity:

URL: https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Synapse/workspaces/XXX/sqlPools/XXX/{Action}?api-version=2019-06-01-preview

The  <xxx> we need to replace with the Pipeline Parameters. The final Result will be:

https://management.azure.com/subscriptions/{pipeline().parameters.SubscriptionID}/resourceGroups/{pipeline().parameters.Synapse_ResourceGroupName}/providers/Microsoft.Synapse/workspaces/{pipeline().parameters.SynapseWorkspace}/sqlPools/{pipeline().parameters.SynapseDedicatedSQLPool}/@{pipeline().parameters.Action}?api-version=2019-06-01-preview

It is almost the same URL but we have to add the action option @{pipeline().parameters.Action} 

Method = Post

Header = {“Nothing”:”Nothing”}

Resource =https://management.azure.com/

Add a Wait Activity but this time with a different parameter @pipeline().parameters.WaitTime, the purpose of this activity is to wait a period before we start ingestion data(just to be sure the SQL Pool in online)

Create Pipeline to Resume your SQL Pool

Clone your PL_ACT_PAUSE_SQLPOOL and rename it to PL_ACT_RESUME_SQLPOOL. Change your action Parameter to “Online”.

Within the IF Condition move the Web Activity Pause SQL Pool and the Wait Activity  from False to True and rename to Resume SQL Pool.

You have now learned how to Pause and Resume your SQL Pool Dynamically with the use of Parameters. Both Pipelines can be easily transferred to different customers.

Please feel free to download the Pipeline code here

MetaData

If you’re already using a database where you store your Meta Data, then you have also the possibility to store the necessary parameters in the database. The only thing you need to do is adding a Lookup Activity to get the parameters from your database(and replace the parameters with the output from the lookup activity)

Azure Synapse MetaData

A SQL Pool(Former SQL DW)

A SQL Pool(Former SQL DW) linked to a SQL (Logical) Server has a slightly different approach, use the settings below to create a Pipeline to Pause or Resume.

Action:   PAUSE or RESUME

WaitTime: Wait time in seconds before the Pipeline will finish

WaitTimeUntil: Wait time in seconds for the retry process

SQLServer_ResourceGroupName: Name of the ResourceGroup of the used SQL(Logical) Server

SQLServer: SQL(Logical) Server name

SQLServerDedicatedSQLPool: Name of the dedicated SQL Pool

SubsriptionId: SubscriptionId of Synapse Workspace

Pause: https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Sql/servers/XXX/databases/XXX/Pause?api-version=2020-08-01-preview

Resume: https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Sql/servers/XXX/databases/XXX/Resume?api-version=2020-08-01-preview

Status: https://management.azure.com/subscriptions/XXX/resourceGroups/XXX/providers/Microsoft.Sql/servers/XXX/databases/XXX/?api-version=2020-08-01-preview

To allow the Synapse workspace to call the REST API we need to give the Synapse workspace access to the SQL(Logical) Server. In the Access control (IAM) of the SQL(Logical) Server assign the SQL DB Contributor role to your Synapse Workspace.

Hopefully this article has helped you a step further. As always, if you have any questions, leave them in the comments.

Exploring Azure Synapse Analytics Studio

Exploring Azure Synapse Analytics Studio

Azure Synapse Workspace Settings

In my previous article, I walked you through “how to create your Azure Synapse Analytics Workspace”. It’s now time to explore the brand new Synapse Studio.

Most configuration and settings can be done through the Synapse Studio. In your Workspace you need to set the SQL Active Directory Admin, like you have to do for a Logical Server.

SQL Active Directory Admin

SQL Active Directory admin

Firewall

Azure Synapse FIrewall

Change the IP-address to you own IP-address or to one of your employer if you work from the office. Make sure that you enable the option “Allow Azure service and Resources to access this Workspace” is enabled. Every trusted Azure Service or Resource can connect to this Workspace. Not all Public Preview Azure Services or Resources are Trusted yet.

Private Endpoint Connections

Private Endpoint Connections

Define your Private Endpoint Connections to your Services to use a Private IP-address form your Virtual Network. More details on how to setup a Private Endpoint Connection can be found here.

Launch Azure Synapse Analytics Studio

After you have opened the portal, the following screen will appear. Personally I am very charmed of this brand new Portal, you now get 1 place where you can access all your data. But also an integration with your Power BI reporting. But more about that later. Let’s walk through each tab below.

Azure Synapse Analytics Studio

Data

Azure Synapse Analytics Data Tab

The data Tab is Divided into 2 different parts Linked and Workspace

Linked

All Dataset likes you’re used to create in Azure Data Factory are stored here.

Azure Synapse Analytics Data Tab LInked

Your can now directly browse files within your Azure Data Lake Storage.

Azure Synapse Analytics Data Tab Linked to Browse Files

But you can also connect to External Data.

Connect to External Data

Workspace

Here you will find all your databases which you have created with Sparks, SQL on Demand or SQL Pools. How to create these database I will explain later in another article.

Synapse Studio Data Workspace

Develop

Azure Synapse Analytics Develop

The Develop tab, is the location, where your SQL Scripts, Notebooks, Dataflows, Spark Job Definitions and Power BI Reports are stored. In a later stage you can commit your work to Azure Dev Ops or GitHub.

Orchestrate

Azure Synapse Analytics Orchestrate

What do we see here , nothing more then you were used to see in Azure Data Factory except the addition of the Synapse Activities.

Azure Synapse Analytics Orchestrate Notebooks

Monitor

In the monitor tab you will find similar things to ADF except for the SQL requests and the Apache Spark Applications

Orchestration: Pipeline runs Overview of all Executed Pipelines
  Trigger Runs Overview of all Executed Trigger Runs
  Integration runtimes Overview of all created Integration Runtimes
Activities: Apache Spark applications Monitor your Apache Spark Executions
  SQL requests Monitor your SQL on Demand or SQL Pool queries

SQL request

All running(currently) SQl request for your SQL on Demand and your SQL Pools

Azure Synpase SQL Monitoring

Apache Spark Applications

All Apache Spark request.

Azure Synpase Apache Spark Monitoring

A detailed explanation of the Apache Spark Application monitor can be found here.

Management HUB

Azure Synapse Analytics Management Hub

The Synapse Analytics Management HUB offers the following options:

Analytic Pools: SQL pools Here  you can manage(Scale up or Down) your previously created SQL pools or create new ones.
  Apache Spark pools Create multiple instances of Spark pools depending on the workload requirements. Once you have your Instance created you also change Auto Scaling, Node Size and Number of Nodes from here.
External Connections: Linked Services Create and manage connections to different services, same as in Azure Data Factory
Orchestration: Triggers Create and manage Triggers for your Pipelines, same as in Azure Data Factory
  Integration runtimes Create and manage your different types of Integration Runtime:
Azure: execute workloads between Azure services or Azure Data Factory Mapping Data Flows.
Self-Hosted: execute workloads between on-premises environments and Azure.
Azure-SSIS: execute SQL Server Integration Services packages in Azure Data Factory.
Security: Access control Azure Synapse Analytics comes with role-based access control.
The available roles:
Workspace admin
Apache Spark admin
SQL admin
  Managed Private Endpoint Private link enables you to access Azure services from your Azure VNet securely.
More details on Managed Private Endpoints can be found here 

 

At the beginning of this article, I indicated that I was very charmed of this new Portal. Microsoft has ensured that we can now approach almost all our services seamlessly from 1 Portal and there will be much more to come in the future. And as you can see you don’t have to use the SQL Pools which is a reasonably expensive solution for customers who use a lot of data. But you can use Azure Synapse Analytics for almost every customer. I would say if you haven’t started trying Azure Synapse Analytics yet, start today and see how you can help your customers with this. If there are any questions I would like to hear them.

Thank you for reading.

Azure Synapse Analytics

Azure Synapse Analytics

Azure SQL Data Warehouse

by Erwin | Jun 15, 2020

Azure Synapse Analytics

 

Insights for all

Azure Synapse provides a breathtaking view of your data across data warehouses and big data analytics systems. Bringing these two worlds together into a single service is challenging as it requires unifying similar concepts that operate differently in each world such as security, privacy, and performance. With Azure Synapse, this seamless unification of data warehousing and big data not only simplifies a business’s analytics platform, but also breaks down silos that exist today because of teams, data, and skills. (source Azure blog)

Azure Synapse Analytics Workspace

During Ignite 2019 we already saw the first announcement about Azure Synapse Analytics. The first Public Preview was announced during Build 2020.

Immediately after Build 2020, I started playing and exploring with Azure Synapse Analytics Workspace.
Fortunately, I was off for a few days and was able to use this free time to dive a little bit into Azure Synapse.

A few days later during the Analytics in a Day workshops that I gave for my employer InSpark in collaboration with Microsoft, I immediately took the time to give a Live demo. I found the inspiration for this Live demo during a YouTube session presented by Simon Whiteley.

For many participants it is more imaginative,  to walk through the product Live than to tell a story via PowerPoint Slides.

Upcoming Articles

In the coming days I will try to write a number of articles so that you become more familiar with the various possibilities of Azure Synapse Analytics.

I have the following articles in mind:

✅ Creating your Azure Synapse Analytics Workspace

✅ Exploring the new Azure Synapse Analytics Studio

Creating an Apache Spark Pool

Creating a SQL Pool

Integration with Power BI

And if you have more subject which needs to be explained feel free to leave them in the comments.

Happy reading!

 

 

Feel free to leave a comment