Last night I was preparing for a demo with Azure Purview. As always, I walk through all the activity hubs to see if there are any new options. This time I noticed that the Pattern Rules option was greyed out.
Resource Set
To enable this Pattern Rules you need to enable the option Advanced Resource Sets in the Management Activity tab.
The Resource set was already present in my Purview Account which was created before August 19th, so it was surprise for me that the pattern rules where greyed out for me. My Demo Purview account was created after August 19th and there differences between the 2 versions and available options/features. What has changed Azure Purview after August 19th can be read in my previously written blog.
Once you have enabled this feature, the Azure Purview team recommends waiting an hour before scanning in new Data Lake data. After scanning your Data Lake data manual or scheduled, you will see the Resource Sets.
When advanced resource sets feature is on, asset and classification insights will only update twice a day(every 12 hours).
More details on how to create Resource Set Pattern Rules, can be found here.
Costs
When you have enabled Advanced Resource Set feature you will be charged €0.18 per 1 vCore Hour(Free in preview). Billing for processing the resource set data assets is serverless and based on the duration of the processing, which can vary based on the change in partitioned files and resource set profile configured.
If you have any questions regarding the above, please let me know.
Note:Billing for Azure Purview will commence November 1, 2021.
Updated October 31st, 2021
Pricing for Elastic Data Map and Scanning for Other Sources are changed and updated in the blog below.
Since my last post on Azure Purview announcements and new functionalities I got some questions regarding pricing. In the meantime the pricing page has been updated and I’ve created also a new Azure Purview instance in my subscription(after August 18th). Currently most of the Azure Purview components are still free until further Notice. To get more details I still recommend everyone to watch the Azure Purview event from September 28th 2021, https://azuredatagovernance.eventcore.com/
Updated September 29th, 2021
Yesterday Microsoft announced the General Availability of Azure Purview, more on the announcement can be found in the blog from Rohan Kumar
Since September 28, 2021, the price of Azure Purview has been adjusted. The main change is that the use of the Elastic Data Map will remain free until November 1, 2021. To encourage trial of the Elastic Data Map, we are providing all customers free usage of Data Map from August 16, 2021 to October 31, 2021. I’ve updated the pricing details below.
As a small recap:
Azure Purview Elastic Data Map
Â
Price
Capacity Unit
€0.353 per 1 Capacity Unit Hour
Billing for Data Map capacity unit consumption will commence November 1, 2021.
When you have created your Azure Purview after Augusts 18th, you will see that you are currently not charged for the Data Map Units.
As you can see, no charging anymore for Data Map, I’m only charged for my scanning, which I only do manually do save some costs.
Automated Scanning & Classification
Â
Price
For Power BI online
Free for a limited time
For SQL Server on-prem
Free for a limited time
For other data sources
€0.540 per 1 vCore Hour
Â
Other features
Â
Price
Resource Set
€0.18 per 1 vCore Hour
Billing for scanning duration will commence November 1, 2021.
Pricing Example
Based on the example which is published on the pricing page, I’ve done a Calculation:
Example Scenario: Data Map can scale capacity elastically based on the request load. Request load is measured in terms of data map operations per second. As a cost control measure, a Data Map is configured by default to elastically scale up to a peak of 8 times the steady state capacity.
For dev/trial usage:
Data Map (Always on): 1 capacity unit x Price per capacity unit per hour x 730 hours per month
Scanning (Pay as you go): Total duration (in minutes) of all scans in a month / 60 min per hour x 32 vCore per scan x €0.540 per vCore per hour
Resource Set: Total duration (in hours) of processing resource set data assets in a month * Price per vCore per hour
The total cost per month for Azure Purview = cost of Data Map + cost of Scanning + cost of Resource Set
Assuming above Scenario that we only use 1 Capacity Unit and use not more then 2 GB of Metadata storage and we scan our data once a week for 2 hours.
Data Map 1 CPU x €0.353 X 730 hours = €257,69
Scanning 4 scans x 2 hours x 32 VCore x €0.540 per vCore per hour = €138,24
Resource Set 4 scans x 1 hour x €0.18 per vCore per hour €0,72
In Total €396,65including 4 scans. If you leave Azure Purview as is and no scanning you base fee will be €257,69.
Like always, in case you have questions, leave them in the comments or send me a message.
This Saturday I've been speaking during DataSaturday #4 Oslo. If you want to visit more Datasaturday events please visit the Data Saturdays event page.
Azure Purview
I presented a session on Azure Purview Microsoft's answer to Data Governance and Data Lineage
More clarity about pricing and when Azure Purview goes to GA is likely to become clear during the event on September 28. You can register for this event via the link below.
EVENT=>Achieve unified data governance with Azure Purview
As always, in case you have any questions, please feel free to contact me.
In case you have any questions left please feel free to ask them via the comment or Socials
This week the Azure Purview Product team added some new functionalities, new connectors(these connectors where added during my holiday), Azure Synapse Data Lineage, a better Power BI integration and the introduction of Elastics Data Map. Slowly we are on our way to a GA status, on September 2021, 28th there will be a Digital Event. Please find below some of announcements in detail.
New connectors in Azure Purview
Over the past period, the Azure Purview team has worked hard, they have already added the necessary new connectors such as ERWIN, Looker, Cassandra and Google Big Query.
This week it was time for some new functionalities.
Azure Synapse Analytics Data Lineage:
This functionality currently only works for a copy activity, but the first step has been made. Where for Lineage from Azure Data Factory you still had to make a link in Azure Purview, for the Lineage from Azure Synapse, it is the other way around. You create the link to Azure Purview in Azure Synapse. How to create this link I described this a couple of months ago in one of my post and can be found here.
Some known limitations on copy activity lineage based on the docs.
Currently, if you use the following copy activity features, the lineage is not yet supported:
Copy data into Azure Data Lake Storage Gen1 using Binary format.
Copy data into Azure Synapse Analytics using PolyBase or COPY statement.
Compression setting for Binary, delimited text, Excel, JSON, and XML files.
Source partition options for Azure SQL Database, Azure SQL Managed Instance, Azure Synapse Analytics, SQL Server, and SAP Table.
Source partition discovery option for file-based stores.
Copy data to file-based sink with setting of max rows per file.
Add additional columns during copy.
In additional to lineage, the data asset schema (shown in Asset -> Schema tab) is reported for the following connectors:
CSV and Parquet files on Azure Blob, Azure File Storage, ADLS Gen1, ADLS Gen2, and Amazon S3
Power BI supports now automated discovery of columns, measures and datatypes of the Power BI.
To enable this functionality you much enable the following settings in the Power BI tenant setting page(be aware that you need to be a Power BI Admin)
Allow service principals to use read-only Power BI admin APIs.
To use this setting create a Security group or use an existing one and add your Purview account to this SG.
Enhance admin APIs responses with detailed metadata
Elastic data map in Azure Purview
All Purview account created after August 2021, 18th are now created with the new Elastic data map concept. With this new concept your Purview account will come by default with one capacity unit and elastically grow based on usage. Each Data Map capacity unit includes a throughput of 25 operations/sec and 2 GB of metadata storage limit. So now when you’re not using Purview you’re not paying the default value of 4 capacity units.
The Data Map is billed on an hourly basis. You are billed for the maximum Data Map capacity unit needed within the hour. At times, you may need more operations/second within the hour, and this will increase the number of capacity units needed within that hour. At other times, your operations/second usage may be low, but you may still need a large volume of metadata storage. The metadata storage is what determines how many capacity units you need within the hour. Please read the documentation for a more detailed explanation and some examples
All existing Azure Purview accounts will be migrated in September/October to the Elastics data map concept.
The big question that remains open is what exactly does this Capacity Unit cost? For the time being during the Preview, it is still free, which can be read from the updated price page of Azure Purview..
More clarity about pricing and when Azure Purview goes to GA is likely to become clear during the event on September 28. You can register for this event via the link below.
EVENT=>Achieve unified data governance with Azure Purview
As always, in case you have any questions, please feel free to contact me.