Migrate Azure Storage to Azure Data Lake Gen2

Migrate Azure Storage to Azure Data Lake Gen2

Migrate Azure Storage to Storage Account with Azure Data Lake Gen2 capabilities

Does it sometimes happen that you come across a Storage Account where the Hierarchical namespace is not enabled or that you still have a Storage Account V1? In the tutorial below I describe the different steps that have recently become possible to perform this migration.

Azure Storage V1

The first step is to check what Account kind is currently deployed. If this is Storage (general purpose v1), we first need to Migrate the Storage account to V2, if this is already V2 then go to the next step.

Storage V1 Account

You can click on change and a new window will pop-up.

Upgrade Storage Account

Note: Choosing a storage access tier during account upgrade is free. Changing the storage access tier after the upgrade operation may result in changes to your bill.

Select the Tier you want to Migrate to, once you have done that start the Upgrade.

Start Migration

When the upgrade is successful, you will see that the Account kind is now StorageV2. We can now continue to the next step.

Blob_Migration_V1_result

Azure Storage V2

To start the Migration click in the Taskbar on Data Lake Gen2 upgrade or click in the blob service properties on ‘Disabled’ for the Hierarchical namespace property.

The Migration window will open and we can start with step 1.

Blob_Migration_V2

Take notice of the unsupported features/functionalities.

Blob_Migration_V2_step1

Agree with implications of Upgrading your Azure Data Lake Storage. Once this step is done we can continue with step 2, the validation.

If everything runs fine, you can start the upgrade step 3. If it fails check the errors. You need to download the error.json file to check which blobs are failing, mostly this are the unsupported functionalities or incompatible features.

{
“startTime”: “2021-08-04T18:40:31.8465320Z”,
“id”: “45c84a6d-6746-4142-8130-5ae9cfe013a0”,
“incompatibleFeatures”: [
“Blob Delete Retention Enabled”
],
“blobValidationErrors”: [],
“scannedBlobCount”: 0,
“invalidBlobCount”: 0,
“endTime”: “2021-08-04T18:40:34.9371480Z”
}

 

The upgrade will take a while, this mostly depends on how much data needs to be migrated.

At the end of the process you notice that the Hierarchical namespace is now enabled and can not be changed anymore.

Blob_Migration_V2_finished

Post Migration

Create new linked services in Azure Data Factory and Azure Synapse Analytics to make sure that you will use the DFS file system.

Change any other application to the correct End Point.

Test, test and Test all your workloads to make sure everything is working like expected.

Start migrating your Development Storage Account, test all the workloads, before you start Migrating your Production Storage account.

 

Like always, in case you have questions, leave them in the comments or send me a message.

Useful links

Upgrade to a general-purpose v2 storage account

Upgrade Azure Blob Storage with Azure Data Lake Storage Gen2 capabilities

Azure Synapse Analyics costs analyis for Integration Runtime

Azure Synapse Analyics costs analyis for Integration Runtime

AutoResolveIntegrationRuntime!

The last few days I’ve been following some discussions on Twitter on using a separate Integration Runtime in Azure Synapse Analytics running in the selected region instead of auto-resolve. The AutoResolveIntegrationRuntime is automatically deployed with Auto Resolve and cannot be changed. If you create a separate Integration Runtime you can set the Region.

Azure_Synapse_IntegrationRuntime

 

The blog  from Asanka Padmakumara has a good explanation why should you choose for a new Integration Runtime with a dedicated Region so I’m not going in detail of that.

I was more interested what this will do with the costs when Managed Virtual Network is enabled and run a certain Pipeline with AutoResolveIntegrationRuntime enabled or with a manual created Integration Runtime. The final result was quite surprising for me.

Case:

Azure Synapse Analytics deployed with Managed Virtual Network and Private Links in West-Europe region.

Copy data from a Azure SQL server to Datalake.

Azure_Synapse_Pipeline

Result:

Pipeline Consumption with AutoResolveIntegrationRuntime

Azure Synapse Pipeline AutoResolve

Pipeline Consumption with Integration Runtime created in West-Europe

Azure Synapse Pipeline West Europe

 

I didn’t expect the consumption of these 2 Integration Runtimes to be different.

The next step is how does that compare in terms of costs based on the Azure Price Calculator. In the example below,  I did the calculation based on above pipelines and that the pipeline has run 1 month every day(30days).

Azure_Synapse_Cost_Calculation

 

Conclusion:

When running all my Linked Services on the AutoResolveIntegrationRuntime it looks to be a little bit faster compared to an Integration Runtime created in West-Europe. But there was a huge difference in costs, you have to pay 350% more if you run on an Integration Runtime. That is quite a lot, especially if you run 100 of these Pipelines per day, which is almost € 270 on a monthly basis. These differences probably won’t be there if you don’t use the Managed Virtual Network. 

Remarks:

During my test of the Integration Runtime I also found out that you cannot change a DataFlow in Azure Synapse Analytics to an Integration Runtime without auto resolve.

Azure_Synapse_Dataflow_IR

 

If you enable Managed Virtual Network for auto-resolve Azure IR, the IR in the Data Factory or Synapse Workspace region is used.

=> Integration runtime – Azure Data Factory & Azure Synapse | Microsoft Docs

 

As always, if you have any questions, let me know.

My Virtual session DataWeekender 4.2

My Virtual session DataWeekender 4.2

DataWeekender 4.2

This Saturday I've joined the Van and Spoke at DataWeekender

Azure Purview

I presented a session on Azure Purview Microsoft's answer to Data Governance and Data Lineage

You can find my slides below on Slideshare:

 
Some useful links:
 
 
 
 
 
 
 
 
 

 

As always, in case you have any questions, please feel free to contact me.

In case you have any questions left please feel free to ask them via the comment or Socials