ERWIN & BUSINESS ANALYTICS

Create an Azure Synapse Analytics Apache Spark Pool

by Jun 17, 2020

Adding a new Apache Spark Pool

There are 2 options to create an Apache Spark Pool.
Go to your Azure Synapse Analytics Workspace in de Azure Portal and add a new Apache Spark Pool.

Azure Portal Apache Spark Pool

Or go to the Management Tab in your Azure Synapse Analytics Workspace and add a new Apache Spark Pool.

Azure Synapse Apache Spark Pool

Create an Apache Spark Pool

Creating a new Apache Spark Pool

Apache Spark pool name

Note that there are specific limitations for the names that Apache Spark Pools can use. Names must contain letters or numbers only, must be 15 or less characters, must start with a letter, not contain reserved words, and be unique in the workspace.

Node size

Small(4vCPU)

Medium(8vCPU)

Large(16vCPU)

Autoscale

Enabled:   Based on your workloads the Spark Pool will scale up or down.

Disabled:   You have to define a fix number of nodes.

Number of nodes.

You can select 3 up to 200 nodes

Make sure that

Contact an Owner of the storage account, and verify that the following role assignments have been made:

  • Assign the workspace MSI to the Storage Blob Data Contributor role on the storage account
  • Assign you and other users to the Storage Blob Data Contributor role on the storage account

Once those assignments are made, the following Spark features can be used: (1) Spark Library Management, (2) Read and Write data to SQL pool databases via the Spark SQL connector, and (3) Create Spark databases and tables.

If you haven’t assign the Storage Blob Data Contributor role to your user, you will get the following error when you want to browse the date in your Linked Workspace.

Error Storage Synapse Studio

 

Apache Spark Pool Details

Currently you can only select Apache Spark version 2.4.

Make sure you enable the Auto Pause settings. If will save you a lot of money. Your cluster will turn off after the configured Idle minutes.

Python packages can be added at the Spark pool level and .jar based packages can be added at the Spark job definition level.

  • If the package you are installing is large or takes a long time to install, this affects the Spark instance start up time.
  • Packages which require compiler support at install time, such as GCC, are not supported.
  • Packages can not be downgraded, only added or upgraded.

How to install these packages can be found here.

Review your settings and your Apache Spark Pool will be created.

You have now created your Apache Spark Pool.

Thanks for reading, in my next article I will explain how to create a SQL Pool, the formerly Azure SQL DW Instance

 

Feel free to leave a comment

0 Comments

Submit a Comment

Your email address will not be published. Required fields are marked *

2 × four =

This site uses Akismet to reduce spam. Learn how your comment data is processed.

SSMS 18.1: Schedule your SSIS Packages in Azure Data Factory

Schedule your SSIS Packages with SSMS in Azure Data Factory(ADF) This week SQL Server Management Studio version 18.1 was released, which can be downloaded from here. In version 18.1 the Database diagrams are back and from now on we can also schedule SSIS Packages in...

Create an Azure Synapse Analytics Apache Spark Pool

Adding a new Apache Spark Pool There are 2 options to create an Apache Spark Pool.Go to your Azure Synapse Analytics Workspace in de Azure Portal and add a new Apache Spark Pool. Or go to the Management Tab in your Azure Synapse Analytics Workspace and add a new...

Azure Data Factory: How to assign a Customer Managed Key

Customer key With this new functionality you can add extra security to your Azure Data Factory environment. Where the data was first encrypted with a randomly generated key from Microsoft, you can now use the customer-managed key feature. With this Bring Your Own Key...

Scale your SQL Pool dynamically in Azure Synapse

Scale your Dedicated SQL Pool in Azure Synapse Analytics In my previous article, I explained how you can Pause and Resume your Dedicated SQL Pool with a Pipeline in Azure Synapse Analytics. In this article I will explain how to scale up and down a SQL Pool via a...

Azure Synapse Pause and Resume SQL Pool

Pause or Resume your Dedicated SQL Pool in Azure Synapse Analytics Azure Synapse Analytics went GA in beginning of December 2020, with Azure Synapse we can now also create a Dedicated SQL Pool(formerly Azure SQL DW). Please read this document to learn what a Dedicated...

Azure Purview Costs in Public Preview explained

Why are you potentially charged for Azure Purview during the public preview? Since I published my post that Azure Purview started billing as of January 21th, I got a lot of questions how billing was working. UPDATE 27th of February: We are extending the Azure Purview...

Azure DevOps and Azure Feature Pack for Integration Services

Azure Feature Pack for Integration ServicesAzure Blob Storage A great addition for SSIS is using extra connectors like  Azure Blob Storage or Azure Data Lake Store which are added by the Azure Feature Pack. This Pack needs to be installed on your local machine. Are...

Using Azure Automation to generate data in your WideWorldImporters database

CASE: For my test environment I want to load every day new increments into the WideWorldImporters Azure SQL Database with Azure Automation. The following Stored Procedure is available to achieve this. EXECUTE DataLoadSimulation.PopulateDataToCurrentDate...

Exploring Azure Synapse Analytics Studio

Azure Synapse Workspace Settings In my previous article, I walked you through "how to create your Azure Synapse Analytics Workspace". It's now time to explore the brand new Synapse Studio. Most configuration and settings can be done through the Synapse Studio. In your...

Connect Azure Synapse Analytics with Azure Purview

How do you integrate Azure Purview in Azure Synapse Analytics? This article explains how to integrate Azure Purview into your Azure Synapse workspace for data discovery and exploration. Follow the steps below to connect your Azure Purview account in your Azure Synapse...