Azure Data Factory Let’s get started

Azure Data Factory Let’s get started

Creating an Azure Data Factory Instance, let’s get started

Many blogs nowadays are about which functionalities we can use within Azure Data Factory. 
But how do we create an Azure Data Factory instance in Azure for the first time and what should you take into account?  In this article I will take you step by step on how to get started.

First we have to login in the Azure Portal.

Azure Data Factory

Search for Data Factories and select the Data Factory  service.

Create ADF

Secondly we have to create a Data Factory Instance.

Create ADF names

Fill in the required fields:

  1. Subscription => Select your Azure subscription in which you want to create the Data Factory.
  2. Resource Group =>Select Use existing, and select an existing resource group from the list or click on Create new, and enter the name of a resource group(a new Resource Group will be created)
  3. Region => Select the desired Region/Location, this is where your Azure Data Factory meta data will be stored and has nothing to do where you create your compute or store your Data Stores.
  4. Name = > Create a unique name in Azure.
  5. Version => Always select V2 here, this contains the very latest developments and functionalities. V1 is only used for migration from another V1 instance.

Select Next: Git configuration

Azure Data Factory Git Configuration

Enable the option to configure Git later,  we will configure this later in Azure Data Factory.

Select Next: Networking:

Create Azure Data Factory Networking

Leave the options as is. I will explain the Connectivity Method in one of my next articles.

Select Next: Review + Create:

Create Azure Data Factory Validation

Your Azure Data Factory Instance will be created. Once you have created your Azure Data Factory, it is ready to use and you can open it from selected Resource Groups above:

Open Azure Data Factory

Select Author & Monitor:

Azure Data Factory let's get started

Encrypt your Azure Data Factory with customer-managed keys

Azure Data Factory encrypts data at rest, including entity definitions and any data cached while runs are in progress. By default, data is encrypted with a randomly generated Microsoft-managed key that is uniquely assigned to your data factory. But you also Bring Your Own Key (BYOK) more details can be find in my previous written article “Azure Data Factory: How to assign a Customer Managed Key

Please be aware that you have to assign this key on an empty Azure Data Factory Instance.

Roles for Azure Data Factory

Data Factory Contributor role:

Assign the built-in Data Factory Contributor role, must be set on Resource Group Level if you want the user to create a new Data Factory on Resource Group Level otherwise you need to set it on Subscription Level.

User can:

  1. Create, edit, and delete data factories and child resources including datasets, linked services, pipelines, triggers, and integration runtimes.
  2. Deploy Resource Manager templates. Resource Manager deployment is the deployment method used by Data Factory in the Azure portal.
  3. Manage App Insights alerts for a Data Factory.
  4. Create support tickets.

Reader Role:

Assign the built-in reader role on the Data Factory resource for the user.

User can:

  1. View and monitor the selected Data Factory, but user can not edit or change it.

More on how to assign roles and permissions can be found here.

Thanks for reading, I my next blog I will describe how to Set up your Code Repository.

My Virtual Session at SQLBits 2020

My Virtual Session at SQLBits 2020

SQL BITS 2020, the greatest data show 

Last week was SQL BITS week.

After the event was moved from April to September, it eventually became a Virtual event. Setting up a Virtual event requires a lot of adjustments in the Organization.

Recording

All regular sessions had to be recorded in advance so that during the event itself it could not go wrong.
For some of us this was new and others have done it before. In any case, it was new to me, but the organization did everything it could, to help us with various sessions in which everything was explained and in which we could ask all kinds of questions. Thanks for that.

Is it strange to pre-record a session?

Yes, it is, you are trying to find an environment in which you have no ambient noise, a good microphone and a good camera. But you don't always have an influence on ambient sounds. And presenting to a webcam is strange.

After practicing my session again, I recorded my session in one go and did not edit anything in the session. After all, something can always go wrong or go in a different way in a session, even if you record it in advance. Once you start adjusting or editing that, the end is lost and a lot of time goes into it. But also the charm of  a session is gone. After all, we are data professionals and not movie stars.

My Session

Back to the day itself, half an hour in advance I could log in to my session and must say that it was quite exciting. Would the video start, how do I come across and some more questions where in my head?
But all nerves for nothing, the session started right on time. In the meantime I had created a few polls which you could have the audience answer in between.
But also being able to answer the questions live during the session and sometimes even with a link to some extra information was now easy.SQL BITS Poll

I was delighted to see at the outcome of the last poll, most of the people who attended my session will now start using Azure Key Vault in their day-to-day work. In the end that's why we do it for(help or advice others).

My presentation can me found here.

Conclusion

I saw a very great event, the quality of the sessions were very high. And there was so much choice, but luckily they will be soon available to watch On-Demand.
A big round of applause to the entire organization, you have organized a fantastic event with a super nice portal, including exhibitor hall, networking, chat rooms and much more. Thank you for having me and see you next year.

 

 

Use Global Parameters to Suspend and Resume your Analysis Services in ADF

Use Global Parameters to Suspend and Resume your Analysis Services in ADF

Suspend or Resume your Azure Analysis Services in Azure Data Factory

Last week one of my customer asked me if they could start or stop his Azure Analysis Services within Azure Data Factory. After a search on the internet I came across a blog from Joost, I’m using that blog as input for this post. Most of the credits goes to him. For me the focus was more on making it parameterized so that I can reuse these Pipelines for all of my customers. A couple of weeks ago the ADF team released the Global Parameters and in this post I’m going to use these parameters.

Global Parameters

Global parameters are constants across a data factory that can be consumed by a pipeline in any expression. They are useful when you have multiple pipelines with identical parameter names and values.

Creation and management of global parameters is done in the management hub.

ADF and GlobalParameter

ADF and GlobalParameter

Create above Global Parameters to build the Pipeline.

The following parameters can now be used across all your Data Factory Activities:

@pipeline().globalParameters.AAS_ResourceGroupName
@pipeline().globalParameters.AAS_ServerName
@pipeline().globalParameters.SubscriptionId

Build Pipeline

Create a new Pipeline PL_ACT_AAS_SUSPEND_GP 

Add a Parameter to the Pipeline Action to easily reuse this Pipeline to Resume our AAS.

ADF and GlobalParameters

Add a Web Activity.

Name = Suspend_AAS  (depends on the Action).

As Joost Mentioned in his blog we first have to define the Rest API Url in the Settings Tab.

https://management.azure.com/subscriptions/<xxx>/resourceGroups/<xxx>/providers/Microsoft.AnalysisServices/servers/<xxx>/<ACTION>?api-version=2017-08-01

The  <xxx> we need to replace with the Global Parameters and the <Action> with the Pipeline Parameter. The final Result will be:

https://management.azure.com/subscriptions/@{pipeline().globalParameters.SubscriptionId}/resourceGroups/@{pipeline().globalParameters.AAS_ResourceGroupName}/providers/Microsoft.AnalysisServices/servers/@{pipeline().globalParameters.AAS_ServerName}/@{pipeline().parameters.Action}?api-version=2017-08-01

Method = POST

Body = Create a dummy json message, it is not used by the Rest API.

ADF and GlobalParameters

Add Azure Data Factory as Contributor to Azure Analysis Services

Before you can debug or test your Pipelines you should add your ADF Instance with Contributor Role to your Azure Analysis Services.

ADF and GlobalParameters

After you have done this, you can Debug your Pipeline.

ADF and GlobalParameters

Error

I got an error because my AAS is already Suspend Or Resumed. We can solve this by adding a check, to check  what the Status of Analysis Services is.

Check Analysis Services Status

To check if our Analysis Services is already Suspended or Resumed we can at Web Activity to check the Status.

Add a Web Activity to your Pipeline  or make a copy of the existing Web Activity

Name = Check_Status_AAS

URL= Https://management.azure.com/subscriptions/@{pipeline().globalParameters.SubscriptionId}/resourceGroups/@{pipeline().globalParameters.AAS_ResourceGroupName}/providers/Microsoft.AnalysisServices/servers/@{pipeline().globalParameters.AAS_ServerName}?api-version=2017-08-01

Method = GET

ADF and GlobalParameters

Add IF Condition Activity (Check if AAS is Running)

ADF and GlobalParameters

Add an Expression on the If Condition Activity  @bool(startswith(activity(‘Check_AAS_Status’).output.properties.state,’Paused’))

This expression will check if the Analysis Services is Running or not. If we want to to Suspend our Analysis Services we have to add to the Web Activity Suspend_AAS  to False (Cut from the main frame and Paste in the False Activity). In case the Analysis Services is already Suspended we do nothing(True).

ADF and GlobalParameter

Debug your Pipeline, to see what is happening

Analysis Services was running, Web Activity Suspend AAS is called:

ADF and GlobalParameter

Analysis Services was already Paused/Suspended, no action required:

ADF and GlobalParameter

Create Pipeline to Resume your Analysis Services

Clone your PL_ACT_AAS_SUSPEND_GP and rename it to PL_ACT_AAS_RESUME_GP. Change your action Parameter to “Resume”.

ADF and GlobalParameter

Within the IF Condition move the Web Activity Suspend AAS from False to True and rename to Resume AAS

Debug to see if everything is working fine:

ADF and GlobalParameter

You have now learned how to Suspend and Resume your Azure Analysis Services Dynamically with the use of Global Parameters. Both Pipelines can be easily transferred to different customers.

Please feel free to download the Pipeline Templates here

If you’re already using a database where you store your Meta Data, then you have also the possibility to store the necessary parameters in the database. The only thing you need to do is to add a Lookup Activity to get the parameters from your database(and replace the global parameters with the output from the lookup activity)

ADF and GlobalParameter

 

Hopefully this article has helped you a step further. As always, if you have any questions, leave them in the comments.

 

Based on above article you should now also be able to build a Pipeline to Process your Analysis Services Model with some help from this blog or you download the Pipeline Template from here.

 

Speaking(Virtual) at SQL Saturday #963 Denmark

Speaking(Virtual) at SQL Saturday #963 Denmark

SQL Saturday 963 Denmark

PASS SQLSaturday is a free training event for professionals who use the Microsoft data platform. These community events offer content across data management, cloud and hybrid architecture, analytics, business intelligence, AI, and more.

My first virtual event

I like to interact during my session, so I’m curious if that will work. Last week I recorded my session for SQL Bits and that is quite strange when you look back. I am really looking forward to it, my session starts at 14:30.

The complete schedule can be found here. Are there is still time to register!!

My Session

Session Title:

Azure Key Vault, Azure Dev Ops and Data Factory how do these Azure Services work perfectly together!

Session Details

Can we store our Connectionstrings or BlobStorageKeys or other Secretvalues somewhere else then in Azure Data Factory(ADF)? Yes you can! You can store these valuable secrets in Azure Key Vault(AKV). But how can we achieve this in ADF? And finally how do we deploy our DataFactories in Azure Dev Ops to Test, Acceptance and Production environments with these Secrets ? Can this be setup dynamically? During this session I will give answers on all of these questions. You will learn how to setup your Azure Key Vault, connect these secrets in ADF and finally deploy these secrets dynamically in Azure Dev Ops. As you can see a lot to talk about during this session.

Do I see you on Saturday 26th of September?

 

 

October=Speaking Month

DataSaturday Holland and DataMinds in Belgium Like last year I have been selected again this year for both events to speak. Cool to be able to speak at 3 different events within 7 days, but also to see many known people and to met new people. Data Saturday Data…

DataGrillen 2022

DataGrillen 2022 Microsoft Purview When we say: Data, bratwurst and beer, we are of course talking about DataGrillen. After more than 2 years of absence, it was time again in recent days, with speakers from all over the world with almost 50 sessions, good weather and…

Video: Learn Live Use Data Factory pipelines in Microsoft Fabric

Below you find the recording form the session for Learn Live which I did together with Javier. Use Data Factory pipelines in Microsoft Fabric Use Data Factory pipelines in Microsoft Fabric – Training | Microsoft Learn     After you have followed above learning…

Speaking at SQL BITS 2022

SQL BITS 2022 We’re Hitting the Arcade SQL Bits is back this year in London from March 8-12 2022. SQLBits is the largest data conference in the world and this year’s theme is to bring us back to our incandescent youth, so prepare to level up your data skills and reach…

My Virtual Session Cloud Lunch and Learn Marathon

Cloud Lunch and Learn Marathon 2021 This Thursday May 13th 2021 I’ve been speaking during Cloud Lunch and Learn Marathon 2021. It was the first Cloud Lunch and Learn Marathon conference, more then 1200 registered attendees, 24hours Live and pre recorded sessions….

Speaking at SQLBits in London (postponed to September 2020)

SQLBits 2020 SQLBits is the largest Microsoft Data Platform conference in Europe taking place between 29nd September and 3rd October2020 at the Excel London. Proud to be speaking I am very proud and happy that one of my sessions was selected for SQLBits. It’s not the…

My Session at DataMindsConnect 2019

DataMindsConnect 2019 Date: 7 and 8 th October Location: Lamot Mechelen Conference number 3 within 7 days and this time in Mechelen.The location for this conference is in an old beer brewery in the center of Mechelen. Datamindsconnect is the largest Dataplatform event…

Watch the MS Ignite sessions on-demand

MS Ignite Sessions MS Ignite 2020 was this year a virtual event. Most of the sessions were live in the evenings and the other sessions were available at different times in different time zones. Compliments to the MS Ignite team for organizing such a great event Most…

My Virtual session at Data Toboggan

An inaugural event specializing on Azure Synapse Analytics Data Toboggan This Saturday I’ve been speaking during Data Toboggan an inaugural event specializing on Azure Synapse Analytics. 12 Hours of sessions with amazing speakers. Azure Purview I presented a session…

Scottisch Summit 2021(Video)

Recording of my session during Scottisch Summit 2021 Is there a way that we can build our Azure DataFactory all with parameters based on MetaData?

Azure Data Factory and Azure Synapse Analytics Naming Conventions

Naming ConventionsAzure Naming Conventions

More and more projects are using Azure Data Factory and Azure Synapse Analytics, the more important it is to apply a correct and standard naming convention. When using standard naming conventions you create recognizable results across different projects, but you also create clarity for your colleagues. In addition to that, it is easier to add these projects to other services such as Managed Services, Azure DevOps,  etc etc,  because standards are used.

To start with these naming conventions, I have made a list of suggestions with most common Linked Services. The list is not exhaustive, but it does provide guidance for new Linked Services.

There are a few standard naming conventions that apply to all elements in Azure Data Factory and in Azure Synapse Analytics.

  • *Names are case insensitive (not case sensitive).  For that reason I’m only using CAPITALS.
  • *Maximum number of characters in a table name: 260.
  • All object names must begin with a letter, number or underscore (_).
  • Following characters are not allowed: “.”, “+”, “?”, “/”, “<”, ”>”,”*”,”%”,”&”,”:”,””

These rules are also defined on the following link

This post has been updated on Feb 2nd, 2023 with the latest connectors.

Azure

Abbreviation Linked Service Dataset
Azure Blob Storage ABLB_ LS_ABLB_ DS_ABLB_
Azure Cosmos DB SQL API ACSA_ LS_ACSA_ DS_ACSA_
Azure Cosmos DB MongDB API ACMA_ LS_ACMA_ DS_ACMA_
Azure Data Explorer ADEX_ LS_ADEX_ DS_ADEX_
Azure Data Lake Storage Gen1 ADLS_ LS_ADLS_ DS_ADLS_
Azure Data Lake Storage Gen2 ADLS_ LS_ADLS_ DS_ADLS_
Azure Database for MariaDB AMDB_ LS_AMDB_ DS_AMDB_
Azure Database for MySQL AMYS_ LS_AMYS_ DS_AMYS_
Azure Database for PostgreSQL APOS_ LS_APOS_ DS_APOS_
Azure File Storage AFIL_ LS_AFIL_ DS_AFIL_
Azure Search ASER_ LS_ASER_ DS_ASER_
Azure SQL Database ASQL_ LS_ASQL_ DS_ASQL_
Azure SQL Database Managed Instance ASQM_ LS_ASQM_ DS_ASQM_
Azure Synapse Analytics (formerly Azure SQL DW) ASDW_ LS_ASDW_ DS_ASDW_
Azure Table Storage ATBL_ LS_ATBL_ DS_ATBL_
Azure DataBricks ADBR_ LS_ADBR_ DS_ADBR_
Azure Cognitive Search ACGS_ LS_ACGS DS_ACGS_
Azure Synapse Analytics  ASA_ LS_ASA DS_ASA
Azure Cognitive Service ACG_ LS_ACG_ N/A

 

Database

     
  Abbreviation Linked Service Dataset
SQL Server  MSQL_ LS_SQL_ DS_SQL_
Oracle ORAC_ LS_ORAC_ DS_ORAC_
Oracle Eloqua ORAE_ LS_ORAE_ DS_ORAE_
Oracle Responsys ORAR_ LS_ORAR_ DS_ORAR_
Oracle Service Cloud ORSC_ LS_ORSC_ DS_ORSC_
MySQL MYSQ_ LS_MYSQ_ DS_MYSQ_
DB2 DB2_ LS_DB2_ DS_DB2_
Teradata  TDAT_ LS_TDAT_ DS_TDAT_
PostgreSQL POST_ LS_POST_ DS_POST_
Sybase SYBA_ LS_SYBA_ DS_SYBA_
Cassandra CASS_ LS_CASS_ DS_CASS_
MongoDB MONG_ LS_MONG_ DS_MONG_
Amazon Redshift ARED_ LS_ARED_ DS_ARED_
SAP Business Warehouse SAPW_ LS_SAPW_ DS_SAPW_
SAP Cloud for Customer (C4C) SAPC_ LS_SAPC_ DS_SAPC_
SAP Table SAPT_ LS_SAPT DS_SAPT_
SAP HANA HANA_ LS_HANA_ DS_HANA_
Drill DRILL_ LS_DRILL_ DS_DRILL_
Google BigQuery GBQ_ LS_GBQ_ DS_GBQ_
Greenplum GRPL_ LS_GRPL_ DS_GRPL_
HBase HBAS_ LS_HBAS_ DS_HBAS_
Hive HIVE_ LS_HIVE_ DS_HIVE_
Apache Impala IMPA_ LS_IMPA_ DS_IMPA_
Informix INMI_ LS_INMI_ DS_INMI_
MariaDB MDB_ LS_MDB_ DS_MDB_
Microsoft Access MACS_ LS_MACS_ DS_MACS_
Netezza NETZ_ LS_NETZ_ DS_NETZ_
Phoenix PHNX_ LS_PHNX_ DS_PHNX_
Presto (Preview) PRST_ LS_PRST_ DS_PRST_
Spark SPRK_ LS_SPRK_ DS_SPRK_
Vertica VERT_ LS_VERT_ DS_VERT_
Snowflake SNWF_ LS_SNWF_ DS_SNWF_
MongoDB Atlas MONG_ATLAS_ LS_MONG_ATLAS_ DS_MONG_ATLAS_
Amazon RDS for Oracle RDSORAC_ LS_RDSORAC_ DS_RDSORAC_
Amazon RDS for SQL Server RDSSQL_ LS_RDSSQL_ DS_RDSSQL_

 

Files

     
  Abbreviation Linked Service Dataset
File System FILE_ LS_FILE_ DS_FILE_
HDFS HDFS_ LS_HDFS_ DS_HDFS_
Amazon S3  AMS3_ LS_AMS3_ DS_AMS3_
FTP FTP_ LS_FTP_ DS_FTP_
SFTP SFTP_ LS_SFTP_ DS_SFTP_
Google Cloud Storage GCS_ LS_GCS_ DS_GCS_
Oracle Cloud Storage OCS_ LS_OCS_ DS_OCS_
Amazon S3 Compatible Storage SMS3C_ LS_SMS3C_ DS_SMS3C_

 

Generic

     
  Abbreviation Linked Service Dataset
Generic ODBC ODBC_ LS_ODBC_ DS_ODBC_
Generic OData  ODAT_ LS_ODAT_ DS_ODAT_
Generic REST REST_ LS_REST_ DS_REST_
Generic HTTP HTTP_ LS_HTTP_ DS_HTTP_

 

Services and Apps

Abbreviation Linked Service Dataset
Salesforce SAFC_ LS_SAFC_ DS_SAFC_
Salesforce Service Cloud SAFCSC_ LS_SAFCSC_ DS_SAFCSC_
Salesforce Marketing Cloud SAFOMC_ LS_SAFOMC_ DS_SAFOMC_
GitHub GITH_ LS_GITH_ DS_GITH_
Jira JIRA_ LS_JIRA_ DS_JIRA_
Web Table (table from HTML)  WEBT_ LS_WEBT_ DS_WEBT_
Amazon Marketplace Web Service AMSMWS_ LS_AMSMWS_ DS_AMSMWS_
Xero XERO_ LS_XERO_ DS_XERO_
SharePoint Online List SHAREPOINT_ LS_SHAREPOINT_ DS_SHAREPOINT_
ServiceNow SERVICENOW_ LS_SERVICENOW_ DS_SERVICENOW_
Dynamics (Microsoft Dataverse) DATAVERSE_ LS_DATAVERSE_ DS_DATAVERSE__
Dynamics 365 D365_ LS_D365_ DS_D365_
Dynamics AX DAX_ LS_DAX_ DS_DAX_
Dynamics CRM DCRM_ LS_DCRM_ DS_DCRM_
Microsoft 365 M365_ LS_M365_ Ds_M365__
SAP Cloud for Customer (C4C) SAPC4C_ LS_SAPC4C_ DS_LS_SAPC4C_
SAP ECC SAPE_ LS_SAPE_ DS_SAPE_

If your connector is not described(mostly connectors which are in Preview), please let me know. For more details for all the different connectors, check the connector overview

Pipeline

Even for Pipeline you can define naming conventions. I think the most important thing is that you always start your pipeline with PL_ followed by a Logic Name for you. You can for example use:

TRANS: Pipeline with transformations

SSIS: Pipeline with SSIS Packages

DATA: Pipeline with DataMovements

COPY: Pipeline with Copy Activities

Divers

NB: Notebook 

DF: Mapping Dataflows

SQL: SQL Scripts

KQL: KQL Scripts

JOB: Spark job definition

Once again these naming conventions are just suggestions. The most important thing is that you start using naming conventions and that you use the folder structure within the Pipelines (categories). Like the picture below as an example.

FolderStructure

If you have suggestions just let me know by leaving a comment below.