Connecting Event Hubs in Microsoft Fabric

Connecting Event Hubs in Microsoft Fabric

Erwin

by Erwin | May 26, 2023

Connecting Azure Event Hubs with Eventstream in Microsoft Fabric

In my previous blog I did give you an introduction of the possibilities of Real-Time Analytics in Microsoft Fabric.

In this blog we will have a closer look into how we can connect data from one of our existing Azure Event Hubs.

End to End scenario Microsoft Fabric Real-Time Analytics

Looking to the above picture, you see an end to end workflow for a Real-Time Analytics scenario. We can directly see which Fabric Artifact we need to use to build the solution. To build the complete solution below took me maximum 20 minutes,.

Loading data from Azure Event Hubs to Lakehouse

Requirements:

  • An existing Azure Event Hub.
  • New consumer group(never you use an existing). If you use an existing consumer group then it can happen that the event hub stop sending messages to your existing environment.
  • Fabric Workspace

Note:

Adding a consumer group is not available in the Basic tier but only in the Standaard Tier.

Creating a Shared Access Policy on the Event Hub

Create a new Shared Access Policy on the Event Hub, with the manage option enabled.

Event Hubs Shared Access Policy

Note down the SAS Policy name and the Primary Key. We will need this later to setup the Connection in Microsoft Fabric.

Create a Data Connection in Microsoft Fabric

In the menu bar(top right) open the settings toggle and open the Manage Connection option.

Microsoft Fabric Manage Connection

Search for Event Hub.

Microsoft Fabric add Event Hub connection

Connection name Name of the Connection
Event Hub Namespace https://xxxxxxx.servicebus.windows.net:443/
Authentication

Username: Name of the SAS Policy

Password: Primary Key of the SAS Policy

Now we have created a connection to our Azure Event Hub, we’re ready to receive our streaming data and to setup an Eventstream.

So lets start to open the the Synapse Real-Time Analytics Experience. This can be found in the left bottom corner of your Microsoft Fabric environment.

Microsoft Fabric

Microsoft Fabric Real-Time Analytics

Fabric Capacity

Make sure you have a Microsoft Fabric or Power BI Premium capacity assigned to this workspace. 

Create Eventstream in Microoft Fabric

Within our Fabric Workspace, select NEW on the left upper corner and select Eventstream.

Microsoft Fabric Create Event Stream

Define a name for the Evenstream and click on create.

Microsoft Fabric Name Eventstream

This can take a couple of minutes to setup, but don’t worry there are a lot of things happening in the background. Microsoft Fabric is a SaaS application so things needs to be deployed for you.

The great advantage for you, things will much easier to setup.

So once everything is ready you will see this new screen:

Microsoft Fabric Overview Eventstream

Create the Eventstream Source

Next step is to connect our Source, in this case the connection to the Event Hub.

Microsoft Fabric create source Eventstream

Select the Azure Event Hubs, a new pane will open.

Microsoft Fabric configure source eventstream

   
Source name Define a name for your source, you can use the name of the Event Hub or a custom name
Cloud Connection Select the connection you’ve created in the beginning of this blog
Data Format

Define the correct format based on your Event Stream

Microsoft Fabric Configure data format Eventstream

Consumper group

You can select a group you have a created in the beginning of this blog. Or you create a new one as well.

Microsoft Fabric set consumer group

 

Note: Never you use an existing Consumer Group, because your current application connected to this Consumer Group will stop receiving data.

Once all the required field are filled in, click on Create. Now the source of your Eventstream will be created.

Microsoft Fabric Create Source

After the connection is setup successfully you can click on Data Preview, to see what kind of data is coming in and if this is the correct data.

Microsoft Fabric Created Source SuccesfullMicrosoft Fabric Preview Source Eventstream

If you data is not shown the correct way, you can change data format to csv or avro.

Destination

One of our last steps in our configuration is to setup the destination for the Eventstream.

In this blog we will use a Lakehouse(more destination are available), so that we can store our data and use it in a later stadium to build reports on top of the data.

Lakehouse

You can choose if you want to create a new Lakehouse or use an existing one.

If you do not have created a Lakehouse , you need to create one.

Select in left bottom corner, the option Data Engineering.

Microsoft Fabric Data Engineering

Create a New Lakehouse, define a name and click on create.

Microsoft Fabrics Create Lakehouse

After creating a Lakehouse, you will see that Automatically a Dataset and a SQL Endpoint are created by default. How easy is that!

Microsoft Fabric Lakehouse artifact

Create the Eventstream Destination

Create Lakehouse as Eventstream Destination

Microsoft Fabric Eventstream Destination

A new windows will open were we can configure the Lakehouse connection/destination.

Microsoft Fabric create table in Lakehouse for Eventstream

Destination Name The name of the destination
Workspace The workspace were you’re Lakehouse is located
Lakehouse The Lakehouse you want to use(you can have more than 1 in the same workspace)
Delta table The Delta Table were you want to store the data, you can also create a new table from here.
Data format Mostly the same format as the data you added to in Source

Event Processing

Before you create the destination, you can transform and preview the data that is being ingested for the destination with the Event Processor. The event processor editor is a no-code experience that provides you with the drag and drop experience to design the event data processing logic.

Microsoft Fabric Real-Time Analytics Event Processing detailed

As you can see there’re a lot of operations/transformation possible to transform your data in a correct way, renaming a field is a matter of seconds with a no-code experience.

The last step is to create the destination. It is just as easy as it is, click on Create.Microsoft Fabric Real-Time Analytics Eventstream working

The Eventstream is ready, Source is streaming data and the destination is Ingesting data.

Navigate to your Lakehouse to verify  the ingested data.

Microsoft Fabric Eventstream Lakehouse

If you prefer to verify with a TSQL command, you can easily switch to a SQL Endpoint mode, which is located in the upper right corner.

Microsoft Fabric switch to sql endpoint

And now you can run any type of query you want.

Microsoft Fabric Warehouse tsql querie

 

Next Steps

Build Power BI report with the ingested eventdata in the Lakehouse. As mentioned before a default dataset is already created.

In my next blog I will explain how we can start using the KQL database as a destination, so stay tuned.

Documentation

Click below to read more about Microsoft Fabric and Real-Time Analytics.

Microsoft Fabric Real -Time Analytics documentation

Exploring the Fabric technical documentation

OneLake in Fabric blog

Exploring the Fabric technical documentation

More information about Microsoft Fabric can be found at:

Microsoft Fabric Content Hub

 

Like always, I case you have some questions left, do hesitate to contact me.

Feel free to leave a comment

Introduction to Real-Time Analytics in Microsoft Fabric

Erwin

by Erwin | May 24, 2023

Introduction to Real-Time Analytics in Microsoft Fabric

Real-Time Analytics is one of the data and analytical workloads/experiences available in Microsoft Fabric, the new platform currently in Public Preview at Microsoft. With Real-Time Analytics, companies and developers can gain valuable insights and analysis from real-time data streams.

A unified analytics solution for the era of AI

Microsoft Fabric brings a unified SaaS-based solution that stores all organizational data where analytics workloads operate. Microsoft Fabric brings together existing offerings such as Data Factory, Azure Synapse Analytics, and Power BI into one unified product for all data and analytics workloads.

Key pilars:

  • Complete analytics platform
  • Lake centric and open
  • Empower every Office user
  • AI Powered

When Microsoft Fabric is not yet activated in your tenant, you can activate it in the Admin Portal. Please note that Microsoft Fabric Capacity(Trial) or Power BI Premium Capacity is required to get started with Microsoft Fabric.

Microsoft Build

Now that we have seen the initial sessions during Microsoft Build, it's time to delve deeper into a topic. But what an announcement! We have all worked hard on this in the last couple of months. We have done a lot of testing and provided a lot of feedback. And personally, I can say that all feedback has been listened to carefully.

Microsoft-Fabric-Workloads

In this blog, I will delve deeper into Real-Time Analytics, one of the available experience in Microsoft Fabric. An experience is a look and feel of various Fabric Artifacts for a specific role such as a Data Engineer, Data Analyst or Data Scientist. For all available experiences see picture above.

Real-Time Analytics

Real-Time Analytics is critical in today's fast-paced business environment. It enables organizations to react immediately to events and trends as they happen, rather than reacting to historical data afterwards. The Real-Time Analytics workload allows users to monitor, analyze, and visualize data in real-time to make fast and data-driven decisions.

Here are some key features and functionalities of Real-Time Analytics in Microsoft Fabric:

  1. Real-time data processing: The workload supports processing large amounts of data in real-time, giving users instant access to up-to-date information.
  2. Advanced analytics: Built-in analytics capabilities enable users to apply complex calculations and statistical models to real-time data for deep insights.
  3. Flexible visualizations: The app offers a wide range of visualization options, such as graphs, charts, and dashboards, to present data in a clear and understandable manner.
  4. With Data Activator(coming soon): Users can set up custom notifications and alerts based on predefined criteria, keeping them informed of important events or anomalies in real-time.

As you can see, you can use Real-Time Analytics for a range of solutions, such as IoT analytics, Telemetry data, human and system logs and in many scenarios including manufacturing operations, cybersecurity, oil and gas, automotive and many more.

Benefits

One of the great benefits of using Real -ime Analytics in Microsoft Fabric is that you have a seamless integration with other artifacts in Fabric such as Lakehouse, Data Warehouse and Machine Learning Models for Predictive Analytics. One of the other benefits in Microsoft Fabric is that you don’t have to start from scratch, is very easy to connect to existing Event Hubs to load your streaming events into Fabric. Which I will explain in my next blog.

Real-Time Analytics Artifacts

Currently the Real-Time Analytics workload supports 3 different artifacts:

KQL Database:          A Kusto database exactly the same as you were used to in Azure Data Explorer

KQL Queryset:          Collection of queries which you can run on top of your KQL Database

Eventstream:            Capture, transform and route real-time event stream to various destinations with a no-code experience. Similar to Azure Stream Analytics

OneLake: The foundation for Microsoft Fabric

OneLake eliminates today’s pervasive and chaotic data silos by providing a data lake as a service without you needing to build it yourself. OneLake is the OneDrive for data and like OneDrive, OneLake is provisioned automatically with every Fabric tenant with no infrastructure to manage. All Fabric Artifacts, such as mentioned above for Real-Time Analytics are deployed/ provisioned automatically into the Onelake upon on creation. How easy is that?

Happy Path - Real-time Analytics

Having a closer look at the picture above, you see an end to end workflow for a Real-Time Analytics scenario.

  • Ingest the data from Event Hub, custom apps, structured and Unstructured data source with pipelines and Dataflows.
  • Store the data in a KQL Database or Lakehouse.
  • Expose the data in Power BI and/or make available in Notebooks and KQL Queriesets.
  • Train and test the data with Machine Learning Models and Experiments.

With this end to end workflow you can directly see which artifacts you need to use to build your Real-Time Analytics Solution.

Public Preview

It's important to note that as Microsoft Fabric is currently in Public Preview, additional functionality is still being developed, and feedback is being incorporated. This presents a great opportunity for users to get involved early, provide feedback, and contribute to the further development of Microsoft Fabric.

When you decide to start using Microsoft Fabric and encounter any issues with the Real-Time Analytics workload, please don't hesitate to reach out to me. I’m here to assist and appreciate your feedback to further enhance the platform.

Click below to read more about Microsoft Fabric and Real-Time Analytics.

Microsoft Fabric Real -Time Analytics documentation

Exploring the Fabric technical documentation

OneLake in Fabric blog

More information about Microsoft Fabric can be found at:

Microsoft Fabric Content Hub

In my next blog I will get a bit deeper how easily you can connect existing Event Hubs to Microsoft Fabric. So stay tuned(published on may 26th 2023)

Note:

Please be aware that Microsoft Fabric is currently not authorized for production use as it is still in the Public Preview phase. It's important to consider this when planning deployments or making critical business decisions.

In the video below, Tzvia Gitlin Troyna, a Principal Manager with Synapse Real-Time Analytics experience in Microsoft Fabric, shares a first look at what's included in the first release of Real-Time Analytics in Microsoft Fabric.

Feel free to leave a comment

Custom comments in Azure Synapse Analytics

Custom comments in Azure Synapse Analytics

Erwin

by Erwin | May 16, 2023

Add custom comments to your Azure DevOps and Github commits

Finally

​Finally and secretly hidden, we can now add a Comment to our commits in Azure Synapse Analytics and Azure Data Factory to Azure Dev Ops.

How do you activate this custom comment option in your existing environment. Read below.

Existing environment

In Azure Synapse Analytics, go to the Git Configuration in the Management Activity Hub.

Synapse-overiview-custom-command

If the custom command is not enabled, you will see that this new feature is available.

When you click on edit you can enable this new feature. Make sure you're are allowed to make changes to your current branch. Otherwise create a new feature and make the change in this feature before you merge(Pull request) it into your develop branch.

Enable-custom-command

You will get a warning, this is mainly because you're updating a configuration file.

Update-repo-custom-comment

Once the option is enables, you will have the possibility to add a custom command on your commit. Which can be very useful.

add-custon-command-in-synapse

This message will be pushed to Azure DevOps as well and can be found on your commit, including the custom commands you added.

Overview-comments-in-DevOps

custom-command-synapse

New environment

The new option is now also available when you connect Azure DevOps for the first time, just enable the option in the configuration pane.

custom-command-new-connection

Azure DevOps

You can enable this option also directly in Azure DevOps by adding the following option in enableGitComment":true  in de publish_config.json file, which is located in the root folder of you Azure Synapse repository.

Azure Data Factory

The above steps are working the same in Azure Data Factory

Github

Custom comments are also available in Github, works the same as Azure DevOps

Remarks

The custom comment option is only working when you apply the Commit All button, it is not working on a single artifact commit.

Have fun with it and let me now your findings!

It's a fairly simple process, but you just need to know it. And it will ultimately makes the collaboration with your team members much easier.

If you have any questions regarding the above, please let me know.

Documentation:

Source control in Synapse Studio

Feel free to leave a comment

Microsoft Purview Pricing and Applications

Microsoft Purview Pricing and Applications

Erwin

by Erwin | Apr 25, 2023

Microsoft Purview Pricing and introduction of Purview Applications

The Microsoft Purview pricing page has been updated. Below I have listed most of the changes. The most important changes are the introduction of the Microsoft Purview Applications and the pricing of the Insights Generation. The standard level of 1 capacity unit of 2 GB metadata storage and 25 operations per sec has been increased to 10 GB.

Post has been updated on April 25th.

Microsoft Purview Data Map

The Microsoft Purview Data Map stores metadata, annotations and relationships associated with data assets in a searchable knowledge graph.

Data Map is billed across three types of activities:

  • Data Map Population– examples include metadata & lineage extraction or classification based on metadata & content inspection.
  • Data Map Enrichment– examples include use of resource sets to optimize storage of data lake assets, or aggregation of classifications to generate insights
  • Data Map Consumption- examples include serving up search results or rendering lineage graph. This also includes the use of Apache Atlas API to build apps on Data Map.

Data Map Population

Automated Scanning, Ingestion & Classification

Data Map population is serverless and billed based on the duration of scans (includes metadata extraction and classification) and ingestion jobs. Automated scans using native connectors trigger both scan and ingestion jobs. Push based updates from a Microsoft Purview client (e.g., lineage push from Azure Data Factory or Azure Synapse Analytics) only trigger ingestion jobs.

Price
For Power BI online Free for a limited time
For SQL Server on-prem Free for a limited time
For other data sources €0.582 per 1 vCore Hour

Data Map Enrichment

Advanced Resource Set

Advanced Resource Set is a built-in feature of the Data Map used to optimize the storage and search of data assets associated with partitioned files in data lakes. Billing for processing the resource set data assets is serverless and based on the duration of the processing, which can vary based on the change in partitioned files and resource set profile configured. In the Management Center you have an option to toggle on or off.

Note:  By default, the advanced resource set processing is run every 12 hours for all the systems configured for scanning with resource set toggle enabled.

Price
Advanced Resource Set €0.194 per 1 vCore Hour

Insights Generation

Insights Generation aggregates metadata and classifications in the raw Data map into enriched, executive-ready reports that can be visualized in the Data Estate Insights application and granular asset level information in business-friendly format that can be exported. Report visualization and export incurs charges from Insights Report Consumption in the Data Estate Insights application.

Price
Report Generation €0.758 per 1 vCore Hour

Insight Generation is new for me, currently it looks like around 70,00.

Note: By default, Insights Generation is enabled and provisioning and can be turned off in the Management center of Microsoft Purview governance portal. In the Management Center you have now an option to toggle on  or off the Insight Generation. If the  toggle is on and the report frequency is off than you can still see the reports with the latest report generation. If set to automatic your reports will refreshed based on your scanning and activities in de Portal. Currently the automatic refresh is weekly.

Microsoft Purview Data Estate Insights Feature enabling

 

If the toggle is off the Insight Generation activity will you give you the following warning:

Microsoft Purview Data Estate Insights Feature disabled

Data Map Consumption

Elastic Data Map

By default, a Microsoft Purview account is provisioned with a Data Map of at least 1 Capacity Unit. 1 Capacity Unit supports requests of up to 25 data map operations per second and includes storage of up to 10 GB of metadata about data assets.

Price
Capacity Unit €0.380 per 1 vCore Hour

Note: The storage size was until last week 2 GB for 1 capacity Unit and has been resized to 10 GB. so that is a major change.

Microsoft Purview Data Map Capacity Unit

Microsoft Purview Applications

Microsoft Purview Applications are replacing the C0, C1 and D0 options which we had previously. Microsoft Purview Applications are a set of independently adoptable, but highly integrated user experiences built on the Data Map including Data Catalog, Data Estate Insights and more. These applications are used by data consumers, producers, data stewards and officers that enable enterprises to ensure that data is easily discoverable, understood, high quality, and all use is per corporate and regulatory requirements.

Data Catalog

Data Catalog is an application built on Data Map for use by business users, data engineers and stewards to discover data, identify lineage relationships and assign business context quickly and easily.

Price
Search and browse of data assets Included with the Data Map
Business Glossaries Included with the Data Map
Lineage Visualization Included with the Data Map
Self-Service Data Access Free in preview

Data Estate Insights

Price
Insights Consumption €0.194 per API call

Note: Insights consumption is billed per API call. One API call returns up to 10,000 rows of tabular result. Like Insight Generation I've no idea yet what this will do with the cost. As soon this is available I will update this article.

Data Access Policies for SQL and Data Lakes(preview)

Data owners can centrally manage thousands of SQL Servers and data lakes to enable quick and easy access to data assets mapped in the Data Map for performance monitors, auditors, and data users.

Price
SQL DevOps access Free in preview
Data Lake data asset access Free in preview

Workflows(Preview)

Data owners and stewards can automate commonly used repetitive tasks associated with business processes like glossary curation and approval tracking using workflow management.

Price
Business Workflows Free in preview

Data Sharing(Preview)

In-place Data Sharing lets users share data easily from within Microsoft Purview governance portal both within and between organizations, providing near real-time access to data without duplication.

Price
In place sharing for Azure Blob Storage and Azure Data Lake Storage (ADLS Gen2) storage accounts Free

Purview Data Share

More details on data sharing in Microsoft Purview can be found here.

Pricing Example

Based on the example which is published on the pricing page, I've done a Calculation:

Example Scenario:
Data Map can scale capacity elastically based on the request load. Request load is measured in terms of data map operations per second. As a cost control measure, a Data Map is configured by default to elastically scale up to a peak of 8 times the steady state capacity.

For dev/trial usage:

Data Map (Always on): average of 2 capacity unit x Price per capacity unit per hour x 730 hours per month

Scanning (Pay as you go): Total duration (in minutes) of all scans in a month / 60 min per hour x 32 vCore per scan x €0.582 per vCore per hour

Resource Set: Total duration (in hours) of processing resource set data assets in a month * Price per vCore per hour

The total cost per month for Azure Purview = cost of Data Map + cost of Scanning + cost of Resource Set

Assuming above Scenario that we only use 1 Capacity Unit and use not more then 10 GB of Metadata storage and we scan our data once a week for 2 hours.

Data Map 2 CU x €0.380 X 730 hours = €554

Scanning 4 scans x 4 hours x 32 VCore x €0.582 per vCore per hour = €297

Resource Set 30 days x every 12 hrs x 8 Vcore x €0.194 per vCore per hour €93

In Total €944 including 4 scans, Data Estate Insight excluded. If you leave Microsoft Purview as is and no scanning you base fee will be €277 for 1 CU and Resource Set toggle need to be switch off

Data Estate Insights   every week(4) x 8 Vcore x 4 hours x €0.758 = €97

Like always, in case you have questions, leave them in the comments or send me a message.

Useful links

 

Feel free to leave a comment

SQLBits session: Microsoft Purview Data Policy App

SQLBits session: Microsoft Purview Data Policy App

Erwin

by Erwin | Mar 17, 2023

SQLBits 2023

Thanks everyone for visiting my session during SQLBits. It's great to see such a full room and that so many people have started using Microsoft Purview.

 

SLIDES

The slides can be downloaded via the link below, so that you can view them again at home.
It could well be that it was a lot of information in 20 minutes. If you have any questions, be sure to let me know.

SQLBits presentation

During the session we discussed a number of tables where the policies are stored. Below is an overview of the different options

-- Lists generally supported actions
SELECT * FROM sys.dm_server_external_policy_actions

-- Lists the roles that are part of a policy published to this server
SELECT * FROM sys.dm_server_external_policy_roles

-- Lists the links between the roles and actions, could be used to join the two
SELECT * FROM sys.dm_server_external_policy_role_actions

-- Lists all Azure AD principals that were given connect permissions
SELECT * FROM sys.dm_server_external_policy_principals

-- Lists Azure AD principals assigned to a given role on a given resource scope
SELECT * FROM sys.dm_server_external_policy_role_members

-- Lists Azure AD principals, joined with roles, joined with their data actions
SELECT * FROM sys.dm_server_external_policy_principal_assigned_actions

-- Force immediate download of latest published policies
exec sp_external_policy_refresh reload

 

Feel free to leave a comment