Create an Azure Synapse Analytics Apache Spark Pool

by Jun 17, 2020

Adding a new Apache Spark Pool

There are 2 options to create an Apache Spark Pool.
Go to your Azure Synapse Analytics Workspace in de Azure Portal and add a new Apache Spark Pool.

Azure Portal Apache Spark Pool

Or go to the Management Tab in your Azure Synapse Analytics Workspace and add a new Apache Spark Pool.

Azure Synapse Apache Spark Pool

Create an Apache Spark Pool

Creating a new Apache Spark Pool

Apache Spark pool name

Note that there are specific limitations for the names that Apache Spark Pools can use. Names must contain letters or numbers only, must be 15 or less characters, must start with a letter, not contain reserved words, and be unique in the workspace.

Node size

Small(4vCPU)

Medium(8vCPU)

Large(16vCPU)

Autoscale

Enabled:   Based on your workloads the Spark Pool will scale up or down.

Disabled:   You have to define a fix number of nodes.

Number of nodes.

You can select 3 up to 200 nodes

Make sure that

Contact an Owner of the storage account, and verify that the following role assignments have been made:

  • Assign the workspace MSI to the Storage Blob Data Contributor role on the storage account
  • Assign you and other users to the Storage Blob Data Contributor role on the storage account

Once those assignments are made, the following Spark features can be used: (1) Spark Library Management, (2) Read and Write data to SQL pool databases via the Spark SQL connector, and (3) Create Spark databases and tables.

If you haven’t assign the Storage Blob Data Contributor role to your user, you will get the following error when you want to browse the date in your Linked Workspace.

Error Storage Synapse Studio

 

Apache Spark Pool Details

Currently you can only select Apache Spark version 2.4.

Make sure you enable the Auto Pause settings. If will save you a lot of money. Your cluster will turn off after the configured Idle minutes.

Python packages can be added at the Spark pool level and .jar based packages can be added at the Spark job definition level.

  • If the package you are installing is large or takes a long time to install, this affects the Spark instance start up time.
  • Packages which require compiler support at install time, such as GCC, are not supported.
  • Packages can not be downgraded, only added or upgraded.

How to install these packages can be found here.

Review your settings and your Apache Spark Pool will be created.

You have now created your Apache Spark Pool.

Thanks for reading, in my next article I will explain how to create a SQL Pool, the formerly Azure SQL DW Instance

 

Feel free to leave a comment

1 Comment

  1. JOHN DEVEROS

    HI, what does this part mean

    Assign the workspace MSI to the Storage Blob Data Contributor role on the storage account

    What is the workspace MSI,

    Reply

Submit a Comment

Your email address will not be published.

19 − 3 =

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Scale SQL Database dynamically with Metadata

Scale SQL Database Dynamically with Metadata Use this template to scale up and down an Azure SQL Database in Azure Synapse Analytics or in Azure Data Factory. This article describes a solution template how you can Scale up or down a SQL Database within Azure Synapse...

Exploring Azure Synapse Analytics Studio

Azure Synapse Workspace Settings In my previous article, I walked you through "how to create your Azure Synapse Analytics Workspace". It's now time to explore the brand new Synapse Studio. Most configuration and settings can be done through the Synapse Studio. In your...

SSMS 18.xx: Creating your Azure Data Factory SSIS IR directly in SSMS

Creating your Azure Data Factory(ADF) SSIS IR in SSMS Since  version 18.0 we could see our Integration Catalog on Azure Instances directly. Yesterday I wrote an article how to Schedule your SSIS Packages in ADF, during writing that article I found out that you can...

Migrate Azure Storage to Azure Data Lake Gen2

Migrate Azure Storage to Storage Account with Azure Data Lake Gen2 capabilities Does it sometimes happen that you come across a Storage Account where the Hierarchical namespace is not enabled or that you still have a Storage Account V1? In the tutorial below I...

Azure Data Factory Let’s get started

Creating an Azure Data Factory Instance, let's get started Many blogs nowadays are about which functionalities we can use within Azure Data Factory. But how do we create an Azure Data Factory instance in Azure for the first time and what should you take into account? ...

Service Healths in Azure

Creating Service Health Alerts in AzureAzure Portal In the Azure Portal go to Monitor – Service Health – Health alerts If you have created alerts before you will see them over here. Assuming you haven’t created an Alert before, we will start to create an Alert.1...

Azure Data Factory and Azure Synapse Analytics Naming Conventions

Naming Conventions More and more projects are using Azure Data Factory and Azure Synapse Analytics, the more important it is to apply a correct and standard naming convention. When using standard naming conventions you create recognizable results across different...

Scale your SQL Pool dynamically in Azure Synapse

Scale your Dedicated SQL Pool in Azure Synapse Analytics In my previous article, I explained how you can Pause and Resume your Dedicated SQL Pool with a Pipeline in Azure Synapse Analytics. In this article I will explain how to scale up and down a SQL Pool via a...

Azure Purview announcements and new functionalities

This week the Azure Purview Product team added some new functionalities, new connectors(these connectors where added during my holiday), Azure Synapse Data Lineage, a better Power BI integration and the introduction of Elastics Data Map. Slowly we are on our way to a...

Azure SQL Data Warehouse: Reserved Capacity versus Pay as You go

How do I use my Reserved Capacity correctly? Update 11-11-2020: This also applies to Azure Synapse SQL Pools. In my previous article you were introduced, how to create a Reserved Capacity for an Azure SQL Datawarehouse (SQLDW). Now it's time to take a look at how this...