Blog Serie: Provision users and groups from AAD to Azure Databricks

by Jan 26, 2023

Blog Series

This blog post series contains topics on how to Provision users and groups from Azure Active Directory to Azure Databricks using the Enterprise Application(SCIM). This is a summary of the all the blogs I posted the last couple of days. I am very happy with all the feedback and tips I have received about this blog series. Thank you.

  1. Configure the Enterprise Application(SCIM) for Azure Databricks Account Level provisioning
  2. Assign and Provision users and groups in the Enterprise Application(SCIM)
  3. Creating a metastore in your Azure Databricks account to assign an Azure Databricks Workspace
  4. Assign Users and groups to an Azure Databricks Workspace and define the correct entitlements
  5. Add Service Principals to your Azure Databricks account using the account console
  6. Configure the Enterprise Application(SCIM) for Azure Databricks Workspace provisioning

Key Takeaways

There are 2 different options to provision users and groups to Azure Databricks using Azure Active Directory (AAD) at the Azure Databricks account level or at the Azure Databricks workspace level.

Azure Databricks account level

 

Azure Databricks workspace level

Databricks recommends using SCIM provisioning to sync users and groups automatically from Azure Active Directory to your Azure Databricks account. 

Preview

Update  23-02: Azure Databricks account level is out of preview. Azure Databricks workspace level is still in preview

 We can define 3 different identities:
• Users: User identities recognized by Azure Databricks and represented by email addresses.
• Service principals: Identities for use with jobs, automated tools, and systems such as scripts, apps, and CI/CD platforms.
• Groups: Groups simplify identity management, making it easier to assign access to workspaces, data, and other securable objects.

As you can read in the various blog posts, the setup of Account-Level provisioning is a bit more work, but it will provide you with many more benefits now and in the future. If you only use 1 Azure Databricks Workspace, then I would simply apply the Workspace-Level Provisioning. The most important thing is that you set up SCIM so that users are not added manually in the Azure Databricks. Adding Service Principals is much easier with the Account Level Setup.

Metastore

Only one Metastore per Region can be created, pay close attention to where you create it(samen or separate Subscription/Resource Group) and whether the Metastore should be part of the Data Management Landing Zone.

User, Service Principals and Groups

  • Users with the Contributor or Owner role on the workspace resource in Azure are automatically added as workspace administrators.
  • Azure Active Directory does not support the automatic provisioning of service principals to Azure Databricks.
  • User removed manually from an Azure Databricks workspace will no be synced again using the Azure Active Directory provisioning.
  • The sync is running every 40 minutes
  • Updates of Username or email address needs to be done in the AAD.
  • Nested groups are not supported by Azure Active Directory automatic provisioning.

Scoping Filters

My colleague Pim Jacobs gave me a tip that you can also use Scoping Filters. A scoping filter allows you to include or exclude any users who have an attribute that matches a specific value. For example you only want to sync a subset of users in a group to Databricks based on a specific attribute you have defined in your AAD(only users in the department Advanced Analytics).

scoping filter

Documentation

For the blog series I partly used the documentation below. The documentation is fairly scattered, from that idea I started this blog series.

Configure SCIM provisioning using Microsoft Azure Active Directory – Azure Databricks | Microsoft Learn

Manage users, service principals, and groups – Azure Databricks | Microsoft Learn

Manage users – Azure Databricks | Microsoft Learn

Manage groups – Azure Databricks | Microsoft Learn

Manage service principals – Azure Databricks | Microsoft Learn

Sync users and groups from Azure Active Directory – Azure Databricks | Microsoft Learn

Create a Unity Catalog metastore – Azure Databricks | Microsoft Learn

 

Feel free to leave a comment

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

5 × five =

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Updated Microsoft Purview Pricing and Applications

Microsoft Purview Pricing and introduction of Purview Applications The Microsoft Purview pricing page has been updated. Below I have listed most of the changes. The most important changes are the introduction of the Microsoft Purview Applications and the pricing of...

Scale your SQL Pool dynamically in Azure Synapse

Scale your Dedicated SQL Pool in Azure Synapse Analytics In my previous article, I explained how you can Pause and Resume your Dedicated SQL Pool with a Pipeline in Azure Synapse Analytics. In this article I will explain how to scale up and down a SQL Pool via a...

Connect Azure Databricks to Microsoft Purview

Connect and Manage Azure Databricks in Microsoft Purview This week the Purview team released a new feature, you’re now able to Connect and manage Azure Databricks in Microsoft Purview. This new functionality is almost the same as the Hive Metastore connector which you...

Migrate Azure Storage to Azure Data Lake Gen2

Migrate Azure Storage to Storage Account with Azure Data Lake Gen2 capabilities Does it sometimes happen that you come across a Storage Account where the Hierarchical namespace is not enabled or that you still have a Storage Account V1? In the tutorial below I...

Azure Data Factory Let’s get started

Creating an Azure Data Factory Instance, let's get started Many blogs nowadays are about which functionalities we can use within Azure Data Factory. But how do we create an Azure Data Factory instance in Azure for the first time and what should you take into account? ...

Azure Purview Pricing example

Azure Purview pricing? Note: Billing for Azure Purview will commence November 1, 2021. Updated October 31st, 2021 Pricing for Elastic Data Map and Scanning for Other Sources are changed and updated in the blog below. Since my last post on Azure Purview announcements...

Get control of data loads in Azure Synapse

Load Source data to DataLake There are several ways to extract data from a source in Azure Synapse Analytics or in Azure Data Factory. In this article I'm going to use a metadata-driven approach by using a control table in Azure SQL in which we configure the...

How to create a Azure Synapse Analytics Workspace

Creating your Azure Synapse Analytics Workspace In the article below I would like to take you through,  how you can configure an Azure Synapse Workspace and not the already existing Azure Synapse Analytics SQL Pool(formerly Azure SQL DW): In de Azure Portal search for...

Azure Synapse Analyics costs analyis for Integration Runtime

AutoResolveIntegrationRuntime! The last few days I've been following some discussions on Twitter on using a separate Integration Runtime in Azure Synapse Analytics running in the selected region instead of auto-resolve. The AutoResolveIntegrationRuntime is...

Connect Azure Synapse Analytics with Azure Purview

How do you integrate Azure Purview in Azure Synapse Analytics? This article explains how to integrate Azure Purview into your Azure Synapse workspace for data discovery and exploration. Follow the steps below to connect your Azure Purview account in your Azure Synapse...