Connect Azure Synapse Analytics with Azure Purview

Connect Azure Synapse Analytics with Azure Purview

Erwin

How do you integrate Azure Purview in Azure Synapse Analytics?

This article explains how to integrate Azure Purview into your Azure Synapse workspace for data discovery and exploration. Follow the steps below to connect your Azure Purview account in your Azure Synapse Workspace.

In the Management Hub you will see now a new option called Azure Purview.

Azure Purview Management HUb

Click on the option  “Connect to a Purview Account”. Please be aware that you need a Contributor role in your Azure Synapse workspace and access to your Azure Purview Account(Purview Reader or Purview Curator).

Find the Purview account you want to connect to from the drop down list or add it manually by adding the source ID.

Azure Puriew Connect Resource ID

If the connection is successful, you will see the following screen. If not, make sure you have the correct role to connect to your Azure Purview account.

Azure Purview Connected

 

Data discovery:

If you select your Data, Develop or Integrate HUB, you will see in the top center a Search bar.

Azure Purview Search

Using Azure Purview in Azure Synapse

To use Azure Purview in Azure Synapse it requires you to have access to the connected Azure Purview account. Azure Synapse will then passes-through the correct Azure Purview permissions.

Purview Reader role

  • Can read all content in Azure Purview

Purview Curator role

  • Can read all content in Azure Purview
  • Can edit assets, classification and glossary terms
  • Can apply classifications and glossary terms to assets.

Azure Purview actions

The following Azure Purview features are available in Azure Synapse Analytics(based on your role):

  • Overview of the metadata
  • View and edit schema of the metadata with classifications, glossary terms, data types, and descriptions
  • View lineage to understand dependencies and do impact analysis. For more information about, see lineage
  • View and edit Contacts to know who is an owner or expert over a dataset
  • Related to understand the hierarchical dependencies of a specific dataset. This experience is helpful to browse through data hierarchy.

Azure Purview Integration

Connect data to Azure Synapse

Add addition to above features, you can also connect directly to the assets you have searched.

Linked Service

  • Creating a new Linked Service will be required to copy data to Synapse or have them in your data hub (for supported data sources like ADLS Gen 2)

Integration Dataset

  • For objects like files, folders, or tables, you can directly create a new Integration Dataset and leverage an existing linked service if already created.

Develop in Azure Synapse

There are three actions that you can perform: New SQL Script, New Notebook, and New Data Flow.

SQL Script

  • View the top 100 rows in order to understand the shape of the data.
  • Create an external table from Synapse SQL database.
  • Load the data into a Synapse SQL database.

Notebook

  • Load data into a Spark DataFrame.
  • Create a Spark Table (if you do that over Parquet format, it also creates a serverless SQL pool table).

Data flow

  • Create an integration dataset that can be used as a source in a data flow pipeline.

Azure Purview Integration Develop

These new functionalities makes the integration between Azure Purview and Azure Synapse Analytics even more Powerful. More details can be found here.

Useful links

Create a Synapse workspace

Create an Azure Purview account

Thank you for reading, please feel free to ask questions and I’m more then happy to answer them.

Azure Purview Public Preview Starts billing

Azure Purview Public Preview Starts billing

Erwin

by Erwin | Jan 18, 2021

Billing for Azure Purview(Public Preview)

As of January 20th 2021 0:00 UTC Azure Purview will starts billing.

Preview

From January 20 ,2021 Azure Purview will start billing. During the Public Preview, you will only be billed if you exceed the 4 capacity units for Azure Data Map and 16 vCore hours for scanning. These 4 capacity units and vCore hours are free until February 28, 2021.
So keep an eye on this so that you will not be faced with surprises after February 28th. What the prices will look like after February 28 is not yet known.

Update on pricing as of 27 februari,2021 can be found here 

Below an overview

Azure Purview Data Map

  Price
Capacity Unit €0.289 per 1 Capacity Unit Hour
Provisioned API throughput. 1 capacity unit = 1 API/sec
Includes 4 capacity units for free until February 28, 2021*.
Metadata Storage Free

 

Scanning and Classification

  Price
Power BI online Free in preview
SQL Server on-prem Free in preview
Other data sources €0.532 per 1 vCore Hour
Includes 16 vCore-hours for Free every month until February 28, 2021**.

Please find below the updated detail for pricing, which has been updated on Azure Purview pricing page on 1st of February 2021

*The 4 free capacity units are only available for customers on the Pay-As-You-Go (MS-AZR-0003P), Microsoft Azure Enterprise (MS-AZR-0017P), Microsoft Azure Plan (MS-AZR-0017G), Azure in CSP (MS-AZR-0145P), and Enterprise Dev/Test (MS-AZR-0148P) offer types. Free quantities are applied at the enrollment level for enterprise customers. Free quantities are applied at the subscription level for pay-as-you-go customers.
**The 16 vCore-hours of free scanning are only available for customers on the Pay-As-You-Go (MS-AZR-0003P), Microsoft Azure Enterprise (MS-AZR-0017P), Microsoft Azure Plan (MS-AZR-0017G), Azure in CSP (MS-AZR-0145P), and Enterprise Dev/Test (MS-AZR-0148P) offer types. Free quantities are applied at the enrollment level for enterprise customers. Free quantities are applied at the subscription level for pay-as-you-go customers. Note: Azure Purview provisions a storage account and an Azure Event Hubs account as managed resources. This may incur separate charges that in most cases will not exceed 2% of charges for scanning. Refer to the Managed Resources section in the Azure portal within Azure Purview Resource JSON.

Note:

Be aware if you add a lot of Azure Data Sources and scan them every day, you will quickly reach the number of hours. Choose for weekly or manual scans will be my advice.

Azure Purview vCore overview

Azure Purview Data Catalog

  Price
C0 Included with the Data Map
Search and browse of data assets
C1 Free in preview
Business glossary, lineage visualization and catalog insights
D0 Free in preview
Sensitive data identification insights

 

Azure Purview Pricing Overview

More details on pricing Pricing - Azure Purview

Azure Purview Documentation  Documentation - Azure Purview

Azure Purview Q&A Q&A -Azure Purview

 

In case you have unanswered questions please do not hesitate to contact me.

Feel free to leave a comment

Goodbye 2020 Hello 2021

Goodbye 2020 Hello 2021

Erwin

by Erwin | Jan 3, 2021

Goodbye 2020 

Started to work for InSpark

Last year was certainly an eventful year. Started with a new job at InSpark and after 10 weeks we all know what happened, the first intelligent lockdown. The Netherlands was partially locked, but our office was immediately closed. Fortunately, all our applications run in the cloud and we were able to switch easily. But building a team in these times is not easy. I am really very proud of all my colleagues in the Data and AI team and of course all my other colleagues  from InSpark, we made a great year together. 

Managed Oxygen

With Managed Oyxgen, our Data Platform as a Service, we've made such great improvements that I wouldn't have thought that it was possible at the beginning of this year. Really cool and compliments to ones who worked so hard on it. On top of our Managed Oxyen we have worked with the whole team on our Sparkhouse Data Accelerator, a Metadata Framework which we can use to automatically extract data from different sources, building history with Delta Lake and load data into an Azure SQL Database/Pool for further transformations of the data.

Cool and innovative projects

We worked closely with Microsoft NL and Corp, but also with our mother company KPN. We're now seeing the effect of this, we've done such great projects and we're on our way for 2021, Projects using the latest Azure Data Services, image and photo recognition, IOT, Azure Synapse Analytics and Power BI. And for the first quarter we will
In any case, enough to look forward to. We are still looking for reinforcements for our team, so if you want to be part of these super cool projects let me know for sure.
I'm happy with the step I made a year ago and can definitely recommend it to anyone.

And when I look at myself.

Blog

My intention was to write more blogs and articles this year, in the end this only succeeded partly. It turned out to be 24, which is an average of 2 per month. Sometimes I just lacked inspiration, hopefully this inspiration will come back in 2021.
My top post on my website still remains Azure Data Factory Naming Conventions . Nice to see that more and more people are implementing the right standards within their organization.

Certifications

This year I wanted to get my DP-200 and DP-201, it finally became AZ-900 and DP-200. It's been far too long since I've done a certification. Of all the questions everyone really knows the right answer, but still when you see it on a screen and you have to give the correct answer, it is secretly quite difficult.
In any case, the DP-201 is on my agenda this year.

Events

My last personal event this year was SQL Konferenz in Darmstadt, Germany, in early March. Wow, how I miss these personal events. In addition to exchanging knowledge, the personal contacts that you build with everyone, it is also very valuable. With the virtual events this is a lot more difficult.
I was quite proud to be selected for SQL Bits, it remains one of the biggest events in Europe. However, it was a disappointment that it ended up becoming virtual instead of physical. Obviously the correct decision of the organization. You had to record the session yourself beforehand. During the event itself it was broadcast and you had to moderate your own session. Very strange to do that, but in the end I was happy with the result. The platform they used was really good. Compliments to the organization and the volunteers who made it a success for a while.
In addition to SQL Bits, I have spoken on a number of Virtual SQL Saturdays, I only started speaking virtual at the end of the year, but finally there aren't that many. I didn't like virtual speaking in the first time, but eventually I started to like it anyway. Before 2021, I have registered for a number of DataSaturdays and will be speaking at the Scottish Summit. Nice things to look forward to.

Whatever a year looks like, the most important thing is that everyone is healthy and safe. I look forward to a great collaboration with everyone.

Feel free to leave a comment

10 Days of Azure Synapse Analytics

10 Days of Azure Synapse Analytics

Erwin

by Erwin | Dec 9, 2020

10 Days of Azure Synapse Analytics

For the next 10 days, every day a different topic is explained about Azure Synapse Analytics. The shortest and easiest way to see how Azure Synapse Analytics can help you, to make decisions within your data landscape.

Day 1

👉 Unleash the power of predictive analytics in Azure Synapse with machine learning and AI

Day 2

👉 Quickly Get Started with Azure Synapse Studio

Day 3

👉Access and analyze all data from the Data Hub in Azure Synapse Analytics

Day 4

👉Using the Develop Hub with Knowledge Center to accelerate development with Azure Synapse Analytics

Day 5

👉Ingest and Transform Data with Azure Synapse Analytics With Ease

Day 6

👉Explore the Monitor Hub in Synapse Studio to keep track of all activities in your Synapse workspace

Day 7

👉Explore the Manage Hub in Synapse Studio to provision and secure resources

Day 8

👉Analyze and explore data with T-SQL in Azure Synapse Analytics

Day 9

👉Kickstart your Apache Spark learning in Azure Synapse with immediately available samples

Day 10

👉Integrate Power BI with Azure Synapse Analytics

Hopefully it will save you some time this collection of different blogs and will you get that much excited about Azure Synapse Analytics as I am. And like always, in case you have some questions feel free to ask them, I'm more then happy to answer them.

Get started with Azure Synapse Analytics

Do you want to know more on how to get started with Azure Synapse Analytics please read my blog series.

Feel free to leave a comment

Azure Synapse Analytics Code Repository has arrived

Azure Synapse Analytics Code Repository has arrived

Azure Synapse Analytics Code repository

‎I just opened my Azure Synapse Analytics Workspace and got a great surprise, the option Git Configuration is available as of today‎.

 

 

After a long wait, today the Git Configuration option became available in Azure Synapse Analytics.

The setup isn't much different from Azure Data Factory, which can be found in this link.

The difference is that we no longer use an adf_publish branch but a workspace_publish branch. Which makes sense if you want to use both Azure Services side by side. In this blog I do quick walkthrough with the Azure Dev Ops Configuration enabled.

Azure Synapse GIT Config

Once we have configured everything, we can walk through the Git Configuration options within Azure Synapse Analytics. I'm sure there will be a lot of them, but below is a list of the ones I noticed first.

Synapse live

After you published your code, it will be available in Synapse Live, like in Azure Data Factory you develop everything in Azure DevOps branches.

Synapse Live

Notebooks

After creating a Notebook, we have the option Commit, after you have committed it will be directly saved within your current working branch.

Azure Synapse Notebooks  Azure Synapse Notebooks

SQL Scripts

Like Notebooks,  After creating a SQL Script, we can Commit, after you have committed it will be directly saved within your current working branch.

Azure Synapse SQL Script

Azure Synapse SQL Scripts

Pipelines

Also here we have now a Commit option.

Azure Synapse Pipelines

Workspace_publish

Beside the Notebooks and the SQL Scripts we can also store the Credentials and Spark Job definitions in Azure Dev Ops

Azure Synapse Publish

Differences

So as we can see the main differences between Azure Data Factory and Azure Synapse Analytics are:

Workspace_publish branch instead of adf_publish branch.

Commit instead of Save.

Azure Data Factory Pipelines

By moving your code from Azure Data Factory to Azure Synapse Analytics in Azure Dev Ops your Azure Data Factory Configuration will be available in Azure Synapse Analytics.

I added my ADF code to Azure Synapse in Azure Dev Ops and it looks the same.

Azure Synapse Dev Ops Example

After Refreshing the Azure Synapse Analytics Workspace, in the Data Hub we see the Integration Datasets(ADF DataSets) and the Linked Storage accounts.

Azure Synapse Data Example

And in the Integrate Hub, we see all our Pipelines. And the same is working for our triggers

Azure Synapse PipeLine Example

It looks like that we can reuse our code quite easily. I haven't tested everything yet but I wanted to share this with you as quick as possible. I'm sure a easier way to migrate from Azure Data Factory to Azure Synapse will be on his way, you can use above as a start.

Integration Runtimes

Does everything work as in Azure Data Factory, NO at this moment you can't use the Azure SSIS Integration Runtime and the shared Self Hosted Integration Runtime? But hopefully this will take not that long before it will arrive.

Thank you for reading, this was a quick overview of the first changes I discovered. Please feel free to leave comment if you have discovered more.

Do you want to become more familiar with the various possibilities of Azure Synapse Analytics, please read the following articles which I published a while ago:

✅ Creating your Azure Synapse Analytics Workspace

✅ Exploring the new Azure Synapse Analytics Studio

Creating an Apache Spark Pool

Creating a SQL Pool

Integration with Power BI