How to use concurrency in Azure Synapse pipelines?

by Jan 12, 2022

How to prevent concurrent pipeline execution?

Concurrency

This week I had a discussion with a colleague about how we can now make sure that a Pipeline does not start when it’s already started.

He then indicated, have you ever thought of the concurrency option?  I’ve seen this option before but never paid attention to it.

How does the concurrency work?

If you read the Microsoft documentation it says the following:
The maximum number of concurrent runs the pipeline can have. By default, there is no maximum. If the concurrency limit is reached, additional pipeline runs are queued until earlier ones complete.

The concurrency option is working in Azure Synapse Analytics and in Azure Data Factory.

I started to test this functionality and there are certainly some nice use cases for that:

  • If the Pipeline was started via a Schedule and someone else triggers this Pipeline Manually, the Pipeline is placed in a queue.
  • Sometimes it happens that there is a delay in the processing of data or that more data is delivered. If you process this data every 30 minutes and the 1st run is not yet ready and the 2nd starts again, this could result in incorrect data. Also in this case the to be executed run is placed in a queue and only starts when the previous one is ready.

It is a fairly simple process but can be quite useful especially in the case of short loading windows.

Azure-Synapse-Concurrency

Please pay attention, running the pipeline in a Debug modus has no effect on this and will run directly.
Check the monitoring regularly to check if this situation is not happening all the time, if so,  you better change the recurrence ​of your Triggered Pipeline. You still have the option to cancelled a queued pipeline.

How to enable concurrency?

 

To enable concurrency in an Azure Synapse pipeline, you can use the Concurrency property in the pipeline settings. The default value is 1, which means that only one copy of the pipeline will run at a time. By default, there is no maximum. If the concurrency limit is reached, additional pipeline runs are queued until earlier ones complete. Setting the concurrency level to a higher value will cause multiple copies of the pipeline to run concurrently, which can improve performance if the pipeline is CPU-bound or if the data source can handle the increased load. If you leave the property blank the pipeline will not be queued. 

Enable-concurrency-Azure-Synapse

When you have any questions regarding concurrency, please let me know.

Feel free to leave a comment

6 Comments

  1. Santosh

    Thanks.
    Quick question:
    What happens if I have scheduled my pipeline to run every hour and the current run is taking longer than an hour and currently running?
    Does the second run gets queued state?
    Since its schedule based, the second run also gets running , performing the activities in it?

    Reply
    • Erwin

      Hi Santosh,

      If you have a hourly schedule, all scheduled pipelines will be queued, like you can see in the picture. With this option no scheduled runs are mixed up.
      When you have a lot of queueing pipelines you should consider to change the trigger time. Hopefully this will solve your question.

      Erwin

      Reply
  2. Palak

    Hi Erwin,

    I’m having a similar kind of use case running in ADF like I want to trigger my same pipeline concurrently with multiple value of same parameter so I’m using lookup to read my config from ADLS in json format and then using the list of that input parameter based on that I want to trigger my pipeline multiple times in parallel , for running parallel processing I’m using for each activity and executing sub activities in that such as .py script , notebook etc. so the issue I’m facing here is when I running my pipeline with concurrency count = 2 in general pipeline settings its still initiating 20 runs in parallel however I didn’t mention any batch count in foreach activity because with batch count my pipeline is taking lot of time to complete.

    Can you please assist why concurrency count is not working ?

    Thanks a lot!
    Palak

    Reply
  3. David

    Hi Erwin, thanks for your post.

    QQ: I have 70 pipelines using the same pipeline template, and I left the concurrency setting to blank, but one strange thing was that each time there were 43 pipelines triggered first, and then once one of the 43 pipelines was done, a new pipeline would be triggered to run. Why only 43 pipelines were triggered instead of all 70 pipelines? Thanks.

    Regards,
    David

    Reply

Submit a Comment

Your email address will not be published. Required fields are marked *

13 − 3 =

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Data Factory Pricing

Data Factory pricingAre you also having problems to understand the Pricing Model for Azure Data Factory? After some research on the internet I came across an article which I wanted to share with you. ADFV2 Pricing ExamplesFeel free to leave a comment

Azure Data Factory: Save a pipeline as a Template

Saving a PipelineAnother great improvement in Azure Data Factory. Saving you template!How can you save your template? First of all you need to connect your Azure Data Factory to  a GIT integration. Both Azure DevOps GIT and GitHub are supported. Please follow this...

A new year with a new job

Changing jobsAfter almost 11 years and 4 months I have decided to leave Axians and to start a new adventure in the new year. On January 2, 2020 I will start my day as Lead Data and AI at InsparkThe past years have flown by. I started at Eniac BI, which was...

Goodbye 2020 Hello 2021

Goodbye 2020 Started to work for InSpark Last year was certainly an eventful year. Started with a new job at InSpark and after 10 weeks we all know what happened, the first intelligent lockdown. The Netherlands was partially locked, but our office was immediately...

ADF: Get Metadata Activity stopped working

Meta Data ActivityToday my pipelines in Azure Data Factory (ADF) suddenly stopped working. The output structure was not found. Quit strange while these pipelines have been running for weeks.    Invalid Template After debugging my Pipeline I found out the...

Azure Synapse Analytics

Azure Synapse Analytics  Insights for all Azure Synapse provides a breathtaking view of your data across data warehouses and big data analytics systems. Bringing these two worlds together into a single service is challenging as it requires unifying similar...

Goodbye 2021, Hello 2022

Goodbye 2021Recap First of all, I would like to wish everyone a very beautiful and healthy 2022. We are now 3 days on the road into the new year and it is always good to look back at what happened last year. It's certainly been an eventful year, topped off with my MVP...

SQLBits session: Microsoft Purview Data Policy App

SQLBits 2023 Thanks everyone for visiting my session during SQLBits. It's great to see such a full room and that so many people have started using Microsoft Purview.  SLIDES The slides can be downloaded via the link below, so that you can view them again at...

Microsoft Fabric pricing (Preview)

Microsoft Fabric PricingMicrosoft Fabric pricing in Public Preview is announced as of the 1 st of June.These are currently the Pay as You go pricing, later this year the Azure Reservation will follow. OneLake storage pricing is comparable to Azure ADLS (Azure Data...

New Microsoft Azure Certifications

Microsoft Certification by Solution Area Handy overview of the new Microsoft Azure Certifications. More details can be found here Feel free to leave a comment