Goodbye 2021, Hello 2022

Goodbye 2021, Hello 2022

Goodbye 2021

Recap

First of all, I would like to wish everyone a very beautiful and healthy 2022.

We are now 3 days on the road into the new year and it is always good to look back at what happened last year. It's certainly been an eventful year, topped off with my MVP Award, which I'm super proud of.
Within InSpark there were a number of changes within our Management Team and that had a significant effect, partly expected but sometimes not completely. Everyone has now been able to find their way and the various Teams are making quite a lot of progress.
Our office is currently closed due to the Lock down in the Netherlands and we communicate through Teams again. I certainly look back to the months that our office was open and that you also saw other colleagues outside your own team. I continue to find meetings via teams very difficult and I regularly have trouble finding the right drive and inspiration there, but unfortunately it is no different and let's hope for better.

Managed Oxygen

POTY_Analytics_2021

With Managed Oyxgen, our Data Platform as a Service, we have once again made such major improvements that I did not think it was possible at the beginning of this year, but confirmation came in July 2021. We submitted Managed Oxygen for the Microsoft Partner of the Year awards, we just didn't win the award but we did become a finalist in the Category Analytics and that out of 4400 entries.
Wow we were so happy with this appreciation, then you know were you worked hard for every day. Compliments to all my colleagues who work every day with such a drive and energy on the development of our Managed Oxygen.

In addition to our Managed Oxygen, we continued working with the whole team on our Nitrogen Accelerator.

  • Metadata-driven Framework for Azure Data Factory and Azure Synapse Analytics which allows us to automatically extract data from various sources and  building a Lakehouse.
  • Monitoring, Logging and Audit Pipelines
  • Build and release pipelines for all the necessary Azure Data services and Power BI in DevOps.
  • Data quality and privacy patterns.
  • Automated Documentation and other best practices.

An Accelerator that greatly benefits our customers and to which we as a team provide input from all disciplines.

Cool and innovative projects

As InSpark, we are the Cloud Incubator for our mother company KPN, which has the advantage that we can work a lot on innovation. We have done a lot of connected projects this year, such as Connected Vehicles, Connected Ships, Connected containers, with some projects processing more than 70 million messages per day. Still pretty cool to see how easy it all goes and fit within Managed Oxygen. In addition to these Connected projects, we have done projects in which we help cities and local governments with their ever-growing demand for data and data solutions, the Urban Data Platform. We have made the first steps with Azure Percept and I'm looking forward to start our first Azure Percept project this year.

We are still looking for new colleagues to help us with these cool projects. If you want to know more about InSpark and what cool projects we do even more, let me know.

Blog

Just like last year, I wanted to write more blogs and articles, but unfortunately the counter has stopped at 20 this time. My blogs and articles were mostly about Azure Synapse Analytics and Azure Purview. It was good to see that the community is finding my blogs and articles better and better and that's what it's all about in the end.

MVP Award

In October last year I became a Data Platform MVP, a great appreciation from Microsoft for all the input and feedback I provide on the various Azure Data Services and my contribution to the Community. When I saw the message in my mailbox I didn't know what I saw, so happy, I immediately called my colleagues to share the news. They have always supported me in everything I do.

ADF Hackathon

I submitted a ADF Pipeline Template “Scale Dedicated SQL Pool Dynamically using Azure Data Factory control flow“ to the ADF Hackathon in March and my submission was marked as WINNER. I am very proud that a simple template where you can easily save costs has won. This template will help you to scale up and down a Dedicated SQL Pool in Azure Synapse Analytics. See full post of the announcement here.

Events

This year I regularly spoke at Virtual Events such as SQL Bits, Scottisch Summit, DataWeekender, Data Toboggan, Cloud Lunch and Learn Marathon and various DataSaturdays. In October I helped as a volunteer during DataMinds Connect a physical event(my only one in 2021) which was held in Mechelen. This event gave me so much energy again because I saw so many great sessions and talked to so many people again. The event was perfectly organized according to the then applicable conditions in Belgium. My brain and body were no longer used to that and I was completely exhausted after these 2 super fantastic days.

SQL Bits, Scottish Summit, DataMinds and Datagrillen are already planned for this year. I look forward to seeing everyone again.

Whatever a year looks like, the most important thing is that everyone is healthy and safe. I look forward to a great collaboration with everyone.

Migrate Azure Storage to Azure Data Lake Gen2

Migrate Azure Storage to Azure Data Lake Gen2

Migrate Azure Storage to Storage Account with Azure Data Lake Gen2 capabilities

Does it sometimes happen that you come across a Storage Account where the Hierarchical namespace is not enabled or that you still have a Storage Account V1? In the tutorial below I describe the different steps that have recently become possible to perform this migration.

Azure Storage V1

The first step is to check what Account kind is currently deployed. If this is Storage (general purpose v1), we first need to Migrate the Storage account to V2, if this is already V2 then go to the next step.

Storage V1 Account

You can click on change and a new window will pop-up.

Upgrade Storage Account

Note: Choosing a storage access tier during account upgrade is free. Changing the storage access tier after the upgrade operation may result in changes to your bill.

Select the Tier you want to Migrate to, once you have done that start the Upgrade.

Start Migration

When the upgrade is successful, you will see that the Account kind is now StorageV2. We can now continue to the next step.

Blob_Migration_V1_result

Azure Storage V2

To start the Migration click in the Taskbar on Data Lake Gen2 upgrade or click in the blob service properties on ‘Disabled’ for the Hierarchical namespace property.

The Migration window will open and we can start with step 1.

Blob_Migration_V2

Take notice of the unsupported features/functionalities.

Blob_Migration_V2_step1

Agree with implications of Upgrading your Azure Data Lake Storage. Once this step is done we can continue with step 2, the validation.

If everything runs fine, you can start the upgrade step 3. If it fails check the errors. You need to download the error.json file to check which blobs are failing, mostly this are the unsupported functionalities or incompatible features.

{
“startTime”: “2021-08-04T18:40:31.8465320Z”,
“id”: “45c84a6d-6746-4142-8130-5ae9cfe013a0”,
“incompatibleFeatures”: [
“Blob Delete Retention Enabled”
],
“blobValidationErrors”: [],
“scannedBlobCount”: 0,
“invalidBlobCount”: 0,
“endTime”: “2021-08-04T18:40:34.9371480Z”
}

 

The upgrade will take a while, this mostly depends on how much data needs to be migrated.

At the end of the process you notice that the Hierarchical namespace is now enabled and can not be changed anymore.

Blob_Migration_V2_finished

Post Migration

Create new linked services in Azure Data Factory and Azure Synapse Analytics to make sure that you will use the DFS file system.

Change any other application to the correct End Point.

Test, test and Test all your workloads to make sure everything is working like expected.

Start migrating your Development Storage Account, test all the workloads, before you start Migrating your Production Storage account.

 

Like always, in case you have questions, leave them in the comments or send me a message.

Useful links

Upgrade to a general-purpose v2 storage account

Upgrade Azure Blob Storage with Azure Data Lake Storage Gen2 capabilities

Azure Synapse Analyics costs analyis for Integration Runtime

Azure Synapse Analyics costs analyis for Integration Runtime

AutoResolveIntegrationRuntime!

The last few days I’ve been following some discussions on Twitter on using a separate Integration Runtime in Azure Synapse Analytics running in the selected region instead of auto-resolve. The AutoResolveIntegrationRuntime is automatically deployed with Auto Resolve and cannot be changed. If you create a separate Integration Runtime you can set the Region.

Azure_Synapse_IntegrationRuntime

 

The blog  from Asanka Padmakumara has a good explanation why should you choose for a new Integration Runtime with a dedicated Region so I’m not going in detail of that.

I was more interested what this will do with the costs when Managed Virtual Network is enabled and run a certain Pipeline with AutoResolveIntegrationRuntime enabled or with a manual created Integration Runtime. The final result was quite surprising for me.

Case:

Azure Synapse Analytics deployed with Managed Virtual Network and Private Links in West-Europe region.

Copy data from a Azure SQL server to Datalake.

Azure_Synapse_Pipeline

Result:

Pipeline Consumption with AutoResolveIntegrationRuntime

Azure Synapse Pipeline AutoResolve

Pipeline Consumption with Integration Runtime created in West-Europe

Azure Synapse Pipeline West Europe

 

I didn’t expect the consumption of these 2 Integration Runtimes to be different.

The next step is how does that compare in terms of costs based on the Azure Price Calculator. In the example below,  I did the calculation based on above pipelines and that the pipeline has run 1 month every day(30days).

Azure_Synapse_Cost_Calculation

 

Conclusion:

When running all my Linked Services on the AutoResolveIntegrationRuntime it looks to be a little bit faster compared to an Integration Runtime created in West-Europe. But there was a huge difference in costs, you have to pay 350% more if you run on an Integration Runtime. That is quite a lot, especially if you run 100 of these Pipelines per day, which is almost € 270 on a monthly basis. These differences probably won’t be there if you don’t use the Managed Virtual Network. 

Remarks:

During my test of the Integration Runtime I also found out that you cannot change a DataFlow in Azure Synapse Analytics to an Integration Runtime without auto resolve.

Azure_Synapse_Dataflow_IR

 

If you enable Managed Virtual Network for auto-resolve Azure IR, the IR in the Data Factory or Synapse Workspace region is used.

=> Integration runtime – Azure Data Factory & Azure Synapse | Microsoft Docs

 

As always, if you have any questions, let me know.

My Virtual session DataWeekender 4.2

My Virtual session DataWeekender 4.2

DataWeekender 4.2

This Saturday I've joined the Van and Spoke at DataWeekender

Azure Purview

I presented a session on Azure Purview Microsoft's answer to Data Governance and Data Lineage

You can find my slides below on Slideshare:

 
Some useful links:
 
 
 
 
 
 
 
 
 

 

As always, in case you have any questions, please feel free to contact me.

In case you have any questions left please feel free to ask them via the comment or Socials

Enable Pattern Rules in Azure Purview

Enable Pattern Rules in Azure Purview

How can I enable Pattern Rules?

​Pattern Rules

Last night I was preparing for a demo with Azure Purview. As always, I walk through all the activity hubs to see if there are any new options. This time I noticed that the Pattern Rules option was greyed out.

Azure_purview_pattern_rules

Resource Set

To enable this Pattern Rules you need to enable the option Advanced Resource Sets in the Management Activity tab.

Azure_purview_advanced_resource_set

The Resource set was already present in my Purview Account which was created before August 19th, so it was surprise for me that the pattern rules where greyed out for me.
My Demo Purview account was created after August 19th and there differences between the 2 versions and available options/features. What has changed Azure Purview after August 19th can be read in my previously written blog.

Once you have enabled this feature, the Azure Purview team recommends waiting an hour before scanning in new Data Lake data.  After scanning your Data Lake data manual or scheduled, you will see the Resource Sets.

Azure_purview_resource_set

When advanced resource sets feature is on, asset and classification insights will only update twice a day(every 12 hours).

More details on how to create Resource Set Pattern Rules, can be found here.

Costs

When you have enabled Advanced Resource Set feature you will be charged €0.18 per 1 vCore Hour(Free in preview). Billing for processing the resource set data assets is serverless and based on the duration of the processing, which can vary based on the change in partitioned files and resource set profile configured.

If you have any questions regarding the above, please let me know.