Azure Data Factory Let’s get started

by Erwin | Nov 3, 2020 | Azure, Azure Data Factory, Azure Synapse Analytics

Creating an Azure Data Factory Instance, let’s get started

Many blogs nowadays are about which functionalities we can use within Azure Data Factory.
But how do we create an Azure Data Factory instance in Azure for the first time and what should you take into account? In this article I will take you step by step on how to get started.

First we have to login in the Azure Portal.

Azure Data Factory

Search for Data Factories and select the Data Factory service.

Create ADF

Secondly we have to create a Data Factory Instance.

Create ADF names

Fill in the required fields:

Subscription => Select your Azure subscription in which you want to create the Data Factory.
Resource Group =>Select Use existing, and select an existing resource group from the list or click on Create new, and enter the name of a resource group(a new Resource Group will be created)
Region => Select the desired Region/Location, this is where your Azure Data Factory meta data will be stored and has nothing to do where you create your compute or store your Data Stores.
Name = > Create a unique name in Azure.
Version => Always select V2 here, this contains the very latest developments and functionalities. V1 is only used for migration from another V1 instance.

Select Next: Git configuration

Azure Data Factory Git Configuration

Enable the option to configure Git later, we will configure this later in Azure Data Factory.

Select Next: Networking:

Create Azure Data Factory Networking

Leave the options as is. I will explain the Connectivity Method in one of my next articles.

Select Next: Review + Create:

Create Azure Data Factory Validation

Your Azure Data Factory Instance will be created. Once you have created your Azure Data Factory, it is ready to use and you can open it from selected Resource Groups above:

Open Azure Data Factory

Select Author & Monitor:

Azure Data Factory let's get started

Encrypt your Azure Data Factory with customer-managed keys

Azure Data Factory encrypts data at rest, including entity definitions and any data cached while runs are in progress. By default, data is encrypted with a randomly generated Microsoft-managed key that is uniquely assigned to your data factory. But you also Bring Your Own Key (BYOK) more details can be find in my previous written article “Azure Data Factory: How to assign a Customer Managed Key“

Please be aware that you have to assign this key on an empty Azure Data Factory Instance.

Roles for Azure Data Factory

Data Factory Contributor role:

Assign the built-in Data Factory Contributor role, must be set on Resource Group Level if you want the user to create a new Data Factory on Resource Group Level otherwise you need to set it on Subscription Level.

User can:

Create, edit, and delete data factories and child resources including datasets, linked services, pipelines, triggers, and integration runtimes.
Deploy Resource Manager templates. Resource Manager deployment is the deployment method used by Data Factory in the Azure portal.
Manage App Insights alerts for a Data Factory.
Create support tickets.

Reader Role:

Assign the built-in reader role on the Data Factory resource for the user.

User can:

View and monitor the selected Data Factory, but user can not edit or change it.

More on how to assign roles and permissions can be found here.

Thanks for reading, I my next blog I will describe how to Set up your Code Repository.

My Virtual Session at SQLBits 2020

by Erwin | Oct 5, 2020 | Events, SQLBits

SQL BITS 2020, the greatest data show

Last week was SQL BITS week.

After the event was moved from April to September, it eventually became a Virtual event. Setting up a Virtual event requires a lot of adjustments in the Organization.

Recording

All regular sessions had to be recorded in advance so that during the event itself it could not go wrong.
For some of us this was new and others have done it before. In any case, it was new to me, but the organization did everything it could, to help us with various sessions in which everything was explained and in which we could ask all kinds of questions. Thanks for that.

Is it strange to pre-record a session?

Yes, it is, you are trying to find an environment in which you have no ambient noise, a good microphone and a good camera. But you don't always have an influence on ambient sounds. And presenting to a webcam is strange.

After practicing my session again, I recorded my session in one go and did not edit anything in the session. After all, something can always go wrong or go in a different way in a session, even if you record it in advance. Once you start adjusting or editing that, the end is lost and a lot of time goes into it. But also the charm of a session is gone. After all, we are data professionals and not movie stars.

My Session

Back to the day itself, half an hour in advance I could log in to my session and must say that it was quite exciting. Would the video start, how do I come across and some more questions where in my head?
But all nerves for nothing, the session started right on time. In the meantime I had created a few polls which you could have the audience answer in between.
But also being able to answer the questions live during the session and sometimes even with a link to some extra information was now easy. SQL BITS Poll

I was delighted to see at the outcome of the last poll, most of the people who attended my session will now start using Azure Key Vault in their day-to-day work. In the end that's why we do it for(help or advice others).

My presentation can me found here.

Conclusion

I saw a very great event, the quality of the sessions were very high. And there was so much choice, but luckily they will be soon available to watch On-Demand.
A big round of applause to the entire organization, you have organized a fantastic event with a super nice portal, including exhibitor hall, networking, chat rooms and much more. Thank you for having me and see you next year.

Use Global Parameters to Suspend and Resume your Analysis Services in ADF

by Erwin | Sep 16, 2020 | Azure, Azure Analyis Services, Azure Data Factory

Suspend or Resume your Azure Analysis Services in Azure Data Factory

Last week one of my customer asked me if they could start or stop his Azure Analysis Services within Azure Data Factory. After a search on the internet I came across a blog from Joost, I’m using that blog as input for this post. Most of the credits goes to him. For me the focus was more on making it parameterized so that I can reuse these Pipelines for all of my customers. A couple of weeks ago the ADF team released the Global Parameters and in this post I’m going to use these parameters.

Global Parameters

Global parameters are constants across a data factory that can be consumed by a pipeline in any expression. They are useful when you have multiple pipelines with identical parameter names and values.

Creation and management of global parameters is done in the management hub.

ADF and GlobalParameter

Create above Global Parameters to build the Pipeline.

The following parameters can now be used across all your Data Factory Activities:

@pipeline().globalParameters.AAS_ResourceGroupName
@pipeline().globalParameters.AAS_ServerName
@pipeline().globalParameters.SubscriptionId

Build Pipeline

Create a new Pipeline PL_ACT_AAS_SUSPEND_GP

Add a Parameter to the Pipeline Action to easily reuse this Pipeline to Resume our AAS.

ADF and GlobalParameters

Add a Web Activity.

Name = Suspend_AAS (depends on the Action).

As Joost Mentioned in his blog we first have to define the Rest API Url in the Settings Tab.

https://management.azure.com/subscriptions/<xxx>/resourceGroups/<xxx>/providers/Microsoft.AnalysisServices/servers/<xxx>/<ACTION>?api-version=2017-08-01

The <xxx> we need to replace with the Global Parameters and the <Action> with the Pipeline Parameter. The final Result will be:

https://management.azure.com/subscriptions/@{pipeline().globalParameters.SubscriptionId}/resourceGroups/@{pipeline().globalParameters.AAS_ResourceGroupName}/providers/Microsoft.AnalysisServices/servers/@{pipeline().globalParameters.AAS_ServerName}/@{pipeline().parameters.Action}?api-version=2017-08-01

Method = POST

Body = Create a dummy json message, it is not used by the Rest API.

ADF and GlobalParameters

Add Azure Data Factory as Contributor to Azure Analysis Services

Before you can debug or test your Pipelines you should add your ADF Instance with Contributor Role to your Azure Analysis Services.

ADF and GlobalParameters

After you have done this, you can Debug your Pipeline.

ADF and GlobalParameters

Error

I got an error because my AAS is already Suspend Or Resumed. We can solve this by adding a check, to check what the Status of Analysis Services is.

Check Analysis Services Status

To check if our Analysis Services is already Suspended or Resumed we can at Web Activity to check the Status.

Add a Web Activity to your Pipeline or make a copy of the existing Web Activity

Name = Check_Status_AAS

URL= Https://management.azure.com/subscriptions/@{pipeline().globalParameters.SubscriptionId}/resourceGroups/@{pipeline().globalParameters.AAS_ResourceGroupName}/providers/Microsoft.AnalysisServices/servers/@{pipeline().globalParameters.AAS_ServerName}?api-version=2017-08-01

Method = GET

ADF and GlobalParameters

Add IF Condition Activity (Check if AAS is Running)

ADF and GlobalParameters

Add an Expression on the If Condition Activity @bool(startswith(activity(‘Check_AAS_Status’).output.properties.state,’Paused’))

This expression will check if the Analysis Services is Running or not. If we want to to Suspend our Analysis Services we have to add to the Web Activity Suspend_AAS to False (Cut from the main frame and Paste in the False Activity). In case the Analysis Services is already Suspended we do nothing(True).

ADF and GlobalParameter

Debug your Pipeline, to see what is happening

Analysis Services was running, Web Activity Suspend AAS is called:

ADF and GlobalParameter

Analysis Services was already Paused/Suspended, no action required:

ADF and GlobalParameter

Create Pipeline to Resume your Analysis Services

Clone your PL_ACT_AAS_SUSPEND_GP and rename it to PL_ACT_AAS_RESUME_GP. Change your action Parameter to “Resume”.

ADF and GlobalParameter

Within the IF Condition move the Web Activity Suspend AAS from False to True and rename to Resume AAS

Debug to see if everything is working fine:

ADF and GlobalParameter

You have now learned how to Suspend and Resume your Azure Analysis Services Dynamically with the use of Global Parameters. Both Pipelines can be easily transferred to different customers.

Please feel free to download the Pipeline Templates here

If you’re already using a database where you store your Meta Data, then you have also the possibility to store the necessary parameters in the database. The only thing you need to do is to add a Lookup Activity to get the parameters from your database(and replace the global parameters with the output from the lookup activity)

ADF and GlobalParameter

Hopefully this article has helped you a step further. As always, if you have any questions, leave them in the comments.

Based on above article you should now also be able to build a Pipeline to Process your Analysis Services Model with some help from this blog or you download the Pipeline Template from here.

Speaking(Virtual) at SQL Saturday #963 Denmark

by Erwin | Sep 10, 2020 | Events, SQL Saturday

SQL Saturday 963 Denmark

PASS SQLSaturday is a free training event for professionals who use the Microsoft data platform. These community events offer content across data management, cloud and hybrid architecture, analytics, business intelligence, AI, and more.

My first virtual event

I like to interact during my session, so I’m curious if that will work. Last week I recorded my session for SQL Bits and that is quite strange when you look back. I am really looking forward to it, my session starts at 14:30.

The complete schedule can be found here. Are there is still time to register!!

My Session

Session Title:

Azure Key Vault, Azure Dev Ops and Data Factory how do these Azure Services work perfectly together!

Session Details

Can we store our Connectionstrings or BlobStorageKeys or other Secretvalues somewhere else then in Azure Data Factory(ADF)? Yes you can! You can store these valuable secrets in Azure Key Vault(AKV). But how can we achieve this in ADF? And finally how do we deploy our DataFactories in Azure Dev Ops to Test, Acceptance and Production environments with these Secrets ? Can this be setup dynamically? During this session I will give answers on all of these questions. You will learn how to setup your Azure Key Vault, connect these secrets in ADF and finally deploy these secrets dynamically in Azure Dev Ops. As you can see a lot to talk about during this session.

Do I see you on Saturday 26th of September?

October=Speaking Month

DataSaturday Holland and DataMinds in Belgium Like last year I have been selected again this year for both events to speak. Cool to be able to speak at 3 different events within 7 days, but also to see many known people and to met new people. Data Saturday Data…

DataGrillen 2022

DataGrillen 2022 Microsoft Purview When we say: Data, bratwurst and beer, we are of course talking about DataGrillen. After more than 2 years of absence, it was time again in recent days, with speakers from all over the world with almost 50 sessions, good weather and…

Video: Learn Live Use Data Factory pipelines in Microsoft Fabric

Below you find the recording form the session for Learn Live which I did together with Javier. Use Data Factory pipelines in Microsoft Fabric Use Data Factory pipelines in Microsoft Fabric – Training | Microsoft Learn After you have followed above learning…

Speaking at SQL BITS 2022

SQL BITS 2022 We’re Hitting the Arcade SQL Bits is back this year in London from March 8-12 2022. SQLBits is the largest data conference in the world and this year’s theme is to bring us back to our incandescent youth, so prepare to level up your data skills and reach…

My Virtual Session Cloud Lunch and Learn Marathon

Cloud Lunch and Learn Marathon 2021 This Thursday May 13th 2021 I’ve been speaking during Cloud Lunch and Learn Marathon 2021. It was the first Cloud Lunch and Learn Marathon conference, more then 1200 registered attendees, 24hours Live and pre recorded sessions….

Speaking at SQLBits in London (postponed to September 2020)

SQLBits 2020 SQLBits is the largest Microsoft Data Platform conference in Europe taking place between 29nd September and 3rd October2020 at the Excel London. Proud to be speaking I am very proud and happy that one of my sessions was selected for SQLBits. It’s not the…

My Session at DataMindsConnect 2019

DataMindsConnect 2019 Date: 7 and 8 th October Location: Lamot Mechelen Conference number 3 within 7 days and this time in Mechelen.The location for this conference is in an old beer brewery in the center of Mechelen. Datamindsconnect is the largest Dataplatform event…

Watch the MS Ignite sessions on-demand

MS Ignite Sessions MS Ignite 2020 was this year a virtual event. Most of the sessions were live in the evenings and the other sessions were available at different times in different time zones. Compliments to the MS Ignite team for organizing such a great event Most…

My Virtual session at Data Toboggan

An inaugural event specializing on Azure Synapse Analytics Data Toboggan This Saturday I’ve been speaking during Data Toboggan an inaugural event specializing on Azure Synapse Analytics. 12 Hours of sessions with amazing speakers. Azure Purview I presented a session…

Scottisch Summit 2021(Video)

Recording of my session during Scottisch Summit 2021 Is there a way that we can build our Azure DataFactory all with parameters based on MetaData?

Azure Data Factory and Azure Synapse Analytics Naming Conventions

by Erwin | Jul 5, 2020 | Azure, Azure Data Factory, Azure Synapse Analytics

Naming Conventions

More and more projects are using Azure Data Factory and Azure Synapse Analytics, the more important it is to apply a correct and standard naming convention. When using standard naming conventions you create recognizable results across different projects, but you also create clarity for your colleagues. In addition to that, it is easier to add these projects to other services such as Managed Services, Azure DevOps, etc etc, because standards are used.

To start with these naming conventions, I have made a list of suggestions with most common Linked Services. The list is not exhaustive, but it does provide guidance for new Linked Services.

There are a few standard naming conventions that apply to all elements in Azure Data Factory and in Azure Synapse Analytics.

*Names are case insensitive (not case sensitive). For that reason I’m only using CAPITALS.
*Maximum number of characters in a table name: 260.
All object names must begin with a letter, number or underscore (_).
Following characters are not allowed: “.”, “+”, “?”, “/”, “<”, ”>”,”*”,”%”,”&”,”:”,””

These rules are also defined on the following link

This post has been updated on Feb 2nd, 2023 with the latest connectors.

Azure
	Abbreviation	Linked Service	Dataset
Azure Blob Storage	ABLB_	LS_ABLB_	DS_ABLB_
Azure Cosmos DB SQL API	ACSA_	LS_ACSA_	DS_ACSA_
Azure Cosmos DB MongDB API	ACMA_	LS_ACMA_	DS_ACMA_
Azure Data Explorer	ADEX_	LS_ADEX_	DS_ADEX_
Azure Data Lake Storage Gen1	ADLS_	LS_ADLS_	DS_ADLS_
Azure Data Lake Storage Gen2	ADLS_	LS_ADLS_	DS_ADLS_
Azure Database for MariaDB	AMDB_	LS_AMDB_	DS_AMDB_
Azure Database for MySQL	AMYS_	LS_AMYS_	DS_AMYS_
Azure Database for PostgreSQL	APOS_	LS_APOS_	DS_APOS_
Azure File Storage	AFIL_	LS_AFIL_	DS_AFIL_
Azure Search	ASER_	LS_ASER_	DS_ASER_
Azure SQL Database	ASQL_	LS_ASQL_	DS_ASQL_
Azure SQL Database Managed Instance	ASQM_	LS_ASQM_	DS_ASQM_
Azure Synapse Analytics (formerly Azure SQL DW)	ASDW_	LS_ASDW_	DS_ASDW_
Azure Table Storage	ATBL_	LS_ATBL_	DS_ATBL_
Azure DataBricks	ADBR_	LS_ADBR_	DS_ADBR_
Azure Cognitive Search	ACGS_	LS_ACGS	DS_ACGS_
Azure Synapse Analytics	ASA_	LS_ASA	DS_ASA
Azure Cognitive Service	ACG_	LS_ACG_	N/A

Database
	Abbreviation	Linked Service	Dataset
SQL Server	MSQL_	LS_SQL_	DS_SQL_
Oracle	ORAC_	LS_ORAC_	DS_ORAC_
Oracle Eloqua	ORAE_	LS_ORAE_	DS_ORAE_
Oracle Responsys	ORAR_	LS_ORAR_	DS_ORAR_
Oracle Service Cloud	ORSC_	LS_ORSC_	DS_ORSC_
MySQL	MYSQ_	LS_MYSQ_	DS_MYSQ_
DB2	DB2_	LS_DB2_	DS_DB2_
Teradata	TDAT_	LS_TDAT_	DS_TDAT_
PostgreSQL	POST_	LS_POST_	DS_POST_
Sybase	SYBA_	LS_SYBA_	DS_SYBA_
Cassandra	CASS_	LS_CASS_	DS_CASS_
MongoDB	MONG_	LS_MONG_	DS_MONG_
Amazon Redshift	ARED_	LS_ARED_	DS_ARED_
SAP Business Warehouse	SAPW_	LS_SAPW_	DS_SAPW_
SAP Cloud for Customer (C4C)	SAPC_	LS_SAPC_	DS_SAPC_
SAP Table	SAPT_	LS_SAPT	DS_SAPT_
SAP HANA	HANA_	LS_HANA_	DS_HANA_
Drill	DRILL_	LS_DRILL_	DS_DRILL_
Google BigQuery	GBQ_	LS_GBQ_	DS_GBQ_
Greenplum	GRPL_	LS_GRPL_	DS_GRPL_
HBase	HBAS_	LS_HBAS_	DS_HBAS_
Hive	HIVE_	LS_HIVE_	DS_HIVE_
Apache Impala	IMPA_	LS_IMPA_	DS_IMPA_
Informix	INMI_	LS_INMI_	DS_INMI_
MariaDB	MDB_	LS_MDB_	DS_MDB_
Microsoft Access	MACS_	LS_MACS_	DS_MACS_
Netezza	NETZ_	LS_NETZ_	DS_NETZ_
Phoenix	PHNX_	LS_PHNX_	DS_PHNX_
Presto (Preview)	PRST_	LS_PRST_	DS_PRST_
Spark	SPRK_	LS_SPRK_	DS_SPRK_
Vertica	VERT_	LS_VERT_	DS_VERT_
Snowflake	SNWF_	LS_SNWF_	DS_SNWF_
MongoDB Atlas	MONG_ATLAS_	LS_MONG_ATLAS_	DS_MONG_ATLAS_
Amazon RDS for Oracle	RDSORAC_	LS_RDSORAC_	DS_RDSORAC_
Amazon RDS for SQL Server	RDSSQL_	LS_RDSSQL_	DS_RDSSQL_

Files
	Abbreviation	Linked Service	Dataset
File System	FILE_	LS_FILE_	DS_FILE_
HDFS	HDFS_	LS_HDFS_	DS_HDFS_
Amazon S3	AMS3_	LS_AMS3_	DS_AMS3_
FTP	FTP_	LS_FTP_	DS_FTP_
SFTP	SFTP_	LS_SFTP_	DS_SFTP_
Google Cloud Storage	GCS_	LS_GCS_	DS_GCS_
Oracle Cloud Storage	OCS_	LS_OCS_	DS_OCS_
Amazon S3 Compatible Storage	SMS3C_	LS_SMS3C_	DS_SMS3C_

Generic
	Abbreviation	Linked Service	Dataset
Generic ODBC	ODBC_	LS_ODBC_	DS_ODBC_
Generic OData	ODAT_	LS_ODAT_	DS_ODAT_
Generic REST	REST_	LS_REST_	DS_REST_
Generic HTTP	HTTP_	LS_HTTP_	DS_HTTP_

Services and Apps
	Abbreviation	Linked Service	Dataset
Salesforce	SAFC_	LS_SAFC_	DS_SAFC_
Salesforce Service Cloud	SAFCSC_	LS_SAFCSC_	DS_SAFCSC_
Salesforce Marketing Cloud	SAFOMC_	LS_SAFOMC_	DS_SAFOMC_
GitHub	GITH_	LS_GITH_	DS_GITH_
Jira	JIRA_	LS_JIRA_	DS_JIRA_
Web Table (table from HTML)	WEBT_	LS_WEBT_	DS_WEBT_
Amazon Marketplace Web Service	AMSMWS_	LS_AMSMWS_	DS_AMSMWS_
Xero	XERO_	LS_XERO_	DS_XERO_
SharePoint Online List	SHAREPOINT_	LS_SHAREPOINT_	DS_SHAREPOINT_
ServiceNow	SERVICENOW_	LS_SERVICENOW_	DS_SERVICENOW_
Dynamics (Microsoft Dataverse)	DATAVERSE_	LS_DATAVERSE_	DS_DATAVERSE__
Dynamics 365	D365_	LS_D365_	DS_D365_
Dynamics AX	DAX_	LS_DAX_	DS_DAX_
Dynamics CRM	DCRM_	LS_DCRM_	DS_DCRM_
Microsoft 365	M365_	LS_M365_	Ds_M365__
SAP Cloud for Customer (C4C)	SAPC4C_	LS_SAPC4C_	DS_LS_SAPC4C_
SAP ECC	SAPE_	LS_SAPE_	DS_SAPE_

If your connector is not described(mostly connectors which are in Preview), please let me know. For more details for all the different connectors, check the connector overview

Pipeline

Even for Pipeline you can define naming conventions. I think the most important thing is that you always start your pipeline with PL_ followed by a Logic Name for you. You can for example use:

TRANS: Pipeline with transformations

SSIS: Pipeline with SSIS Packages

DATA: Pipeline with DataMovements

COPY: Pipeline with Copy Activities

Divers

NB: Notebook

DF: Mapping Dataflows

SQL: SQL Scripts

KQL: KQL Scripts

JOB: Spark job definition

Once again these naming conventions are just suggestions. The most important thing is that you start using naming conventions and that you use the folder structure within the Pipelines (categories). Like the picture below as an example.

FolderStructure

If you have suggestions just let me know by leaving a comment below.

« Older Entries

Next Entries »

Creating an Azure Data Factory Instance, let’s get started

Encrypt your Azure Data Factory with customer-managed keys

Roles for Azure Data Factory

SQL BITS 2020, the greatest data show

Last week was SQL BITS week.

Recording

My Session

Conclusion

Suspend or Resume your Azure Analysis Services in Azure Data Factory

Global Parameters

Build Pipeline

Add Azure Data Factory as Contributor to Azure Analysis Services

Error

Check Analysis Services Status

Create Pipeline to Resume your Analysis Services

SQL Saturday 963 Denmark

PASS SQLSaturday is a free training event for professionals who use the Microsoft data platform. These community events offer content across data management, cloud and hybrid architecture, analytics, business intelligence, AI, and more.

My first virtual event

My Session

Naming Conventions

Azure

Database

Files

Generic

Services and Apps

Pipeline

Divers

Categories