Microsoft Fabric Content Hub Update September

by Erwin | Sep 20, 2023 | Microsoft Fabric Content HUB

Synapse Data Engineering

by Erwin | Sep 20, 2023

Stay up-to-date with the latest and most valuable content about Microsoft Fabric, all in one place! From insightful articles and tutorials to engaging videos and community blogs, you’ll find a treasure trove of resources to deepen your understanding.

Back to main Hub

Latest Posts

Categories

Feel free to leave a comment

Azure Open AI and Microsoft Fabric

by Erwin | Sep 14, 2023 | Microsoft Fabric

Synapse Data Engineering

by Erwin | Sep 14, 2023

Get ready for data enrichement in Microsoft Fabric

Azure OpenAI is fun and exciting and we can use it to do amazing stuff. In combination with Spark on Microsoft Fabric or Azure Synapse Analytics, we can transform and generate large amounts of text data and make use of OpenAI’s flexibility in defining the transformation. The SynapseML library that comes pre-installed on all Synapse Spark pools and Fabric workspaces includes an OpenAI module that allows you to perform OpenAI transformations on spark dataframes, enabling OpenAI at scale. Azure OpenAI is fun and exciting and we can use it to do amazing stuff. In combination with Spark on Microsoft Fabric or Azure Synapse Analytics, we can transform and generate large amounts of text data and make use of OpenAI’s flexibility in defining the transformation.

Together with Floris Berends we had a look into the possibilities and wrote the post below

Requirements

To run this example you need to have:

An Azure OpenAI service
A model deployment
A Microsoft Fabric workspace Alternatively, a Synapse Analytics workspace
A Spark Notebook

Extracting text fields from raw social media posts

Let’s say we are scraping social media posts and are interested in some of the details. Usually, scraping text fields results in some pretty messy data. For this example, we are using the Scikit-Learn newsgroups open dataset.

Set up a Spark Dataframe

In order to load the open dataset into a spark dataframe, we first load it into a pandas dataframe. Of course if you are using your own data, you can load the data from anywhere, as long as it fits into a spark dataframe

import pandas as pd
from sklearn.datasets import fetch_20newsgroups
newsgroups = fetch_20newsgroups(subset="train", categories=['talk.politics.misc'])
pd_df = pd.DataFrame(newsgroups["data"], columns=["data"])
df = spark.createDataFrame(pd_df)

Set up our parameters

To prepare the OpenAI transformation, we need to provide the API with a number of connection and configuration parameters. These include the Azure OpenAI service name, the name of the model deployment, and a prompt that will specify our transformation. The parameters can be found in the Azure Portal, on your Azure OpenAI resource. If you have not yet deployed a model, do this now. Note that the prompt specifies what we want the model to do, but also specifies the format in which we want the model to respond. This is crucial in getting reliable results from the model and this is what enables us to use the transformation as part of a pipeline.

openai_service_name = "<YOUR SERVICE NAME>"
openai_deployment_name = "<YOUR DEPLOYMENT NAME>"
openai_key = "<YOUR SERVICE KEY>"
source_content_column = "data"
system_prompt = """
You will read the raw text of an e-mail and extract the senders e-mail
address and subject from the text. You will also list the topics of the email, provide a short one-sentence summary, and output the sentiment of the email. Ensure that the sentiment is one of the following: negative, neutral, positive.

Your response will be in the following format
{{
"EMAILADDRESS": "",
"SUBJECT": "",
"SUMMARY": "",
"SENTIMENT": "",
"TOPICS: []
}}
"""

Set up the prompt column

Because OpenAI needs a prompt in order to generate a completion, we need to setup a prompt column that includes both the instruction (system_prompt) we set up earlier and our data. The way that Azure OpenAI chat completions work, is that you can provide the ‘chat history’ as a message column. This column is what we will use as input for the transformation. Additionally, Azure OpenAI chat completion messages include a ‘role’ parameter. The role specifies who sent the message. In a normal chat interaction, there are 2 roles: the user and the assistant (i.e. the model). However, it is possible to provide a ‘system’ message that will instruct the model how to behave. We will use a ‘system’ message in order to instruct the model on how to transform our data. In order to do this, we need to set up the prompt column in the following way:

A message with the ‘system’ role and our instruction as content.
A message with the ‘user’ role and our data as content.

import pyspark.sql.functions as F

from pyspark.sql.types import ArrayType, StructType, StructField, StringType
df = df.withColumn("prompt", F.udf(
    lambda system_prompt, content: [{"name":"system", "role":"system", "content": system_prompt},{"name":"user", "role":"user", "content": content}],
        ArrayType(
            StructType([
                StructField("name", StringType(),False),
                StructField("role", StringType(),False),
                StructField("content", StringType(),False)
                ]
            )
        )
    )(F.lit(system_prompt),F.col(source_content_column)))

Calling the Azure OpenAI API

Now that we have the input dataframe with the data and prompt just how we want it, we can set up the call to the Azure OpenAI API. Note that Spark will not immediately execute the transformation, but will simply setup the plan for the dataframe. The API will only be called when we actually need the data (e.g. when we save or display the dataframe).

from synapse.ml.cognitive import OpenAIChatCompletion
completion = (
    OpenAIChatCompletion()
        .setSubscriptionKey(openai_key)
        .setDeploymentName(openai_deployment_name)
        .setUrl(f"https://{openai_service_name}.openai.azure.com/")
        .setMessagesCol("prompt")
        .setErrorCol("error")
        .setOutputCol("output")
        .limit(10)
)

Transforming the results

The OpenAIChatCompletion mehthod simply puts the completion results into the output column, but we want to have the results in separate columns. Before we can do this we need to define the output schema.

output_columns = "EMAILADDRESS,SUBJECT,SUMMARY,SENTIMENT,TOPICS"
prompt_schema = StructType(
                   [StructField(col, StringType(), True)
                      for col in output_columns.split(",")
                   ])
df_result = completion.transform(df.limit(10)).withColumn(
                 "response",
                  F.from_json(
  F.col("output.choices.message.content").getItem(0)
  ,prompt_schema)
                  ).select("response.*","error")

Displaying and Verifying the results

There are a number of things that can go wrong. For any row, errors returned by the API will be put into the error column that you provided by .setErrorCol. We can display the dataframe to inspect the results:

display(df_result)

Final

It might seem that this setup is so versatile that you can use it to apply any transformation you desire on any column in any dataset. Although this might not be far from the truth, there are a couple of things you need to consider:

Cost: Azure OpenAI transformations are more expensive then those that do not rely on external APIs (e.g. Spark Native transformation like map(), flatten(), explode(), or using regular expressions and the like).
Complexity: This example applies a transformation with a simple output schema. It might very well be the case that asking a LLM to output data in a very complex schema will not turn out well.
Language: This example applies a transformation that is primarily a language based transformation: extracting and summarizing information that is available as natural language. Using LLMs to apply math-based, logic-based, or code-based transformations might not show reliable results.

The main take-away is that using Azure OpenAI to transform text-fields though natural language operations like summarization, description and extraction can be done fast and reliable. We are looking forward to seeing where this technology will take us.

Learn more

Fabric (preview) trial

Data science in Microsoft Fabric

Azure OpenAI for big data

Questions

If you have any further questions, feel free to ask them in the comments below.

Latest Posts

Categories

Feel free to leave a comment

How to enable Microsoft Fabric

by Erwin | Jun 12, 2023 | Microsoft Fabric

Synapse Data Engineering

by Erwin | Jun 12, 2023

Microsoft Fabric

I got some questions from customers that didn’t know how to enable Microsoft Fabric and that they only see Power BI Items and not the new announced Experiences. In this short blog I will explain how you can easily enable Microsoft Fabric.

How to enable Fabric

If you want to try Fabric in your tenant, you need to enable the Fabric features in your Power BI admin portal.

To do, go to https://app.powerbi.com/.

Note: You must be an Power BI administrator

Please note that Microsoft Fabric Capacity(Trial) or Power BI Premium Capacity is required to get started with Microsoft Fabric.

Open the Microsoft Fabric admin portal.

By default, Microsoft Fabric is disabled (if you do not change the setting, it will be set to ON after July 1st 2023).

You enable Microsoft Fabric for the whole organization or you can just start with a small group(Specify Security Groups). My advice is to start with a small group. Microsoft Fabric is in Public Preview and not ready for Production Environments.
It will take up to 15 minutes to deploy these setting, mostly much faster. After that the new experiences will be available.

Select Data Engineering

In the top of the page you can directly, see which experience you use.

In this case Synapse Data Engineering Experience, check out the logo on the left side and the text behind home?experience

Start Trial

Have a look to this page how easily it is to get started with a free Trial Fabric (preview)

Guy in the Cube

You also watch the video, who Adam Saxton created:

Documentation

If you have any questions, I’d love to hear them.
More information about Microsoft Fabric can be found at my Content Hub:

Microsoft Fabric Content Hub

Feel free to leave a comment

Create capacity for Microsoft Fabric

by Erwin | Jun 5, 2023 | Microsoft Fabric

Synapse Data Engineering

by Erwin | Jun 5, 2023

Microsoft Fabric Capacity

Since the first of June 2023, we can create Fabric capacities in Azure. This are currently the Pay as You go pricing, later this year the Azure Reservation will follow. OneLake storage pricing is comparable to Azure ADLS (Azure Data Lake Storage) pricing and is not included in the price below. These prices are prices in the West-Europe region, prices can be different across regions.

Note: As you can see, the F1024 and F2028 are not having the correct prices, it should be 2 or 4 x F512. The error is already is report to the Fabric Team.

Read the announcement below:

Announcing Microsoft Fabric capacities are available for purchase | Microsoft Fabric Blog

Microsoft Fabric Capacity is a distinct pool of resources allocated to Microsoft Fabric that resides on a tenant. The size of the capacity determines the amount of computation power your organization gets.

Microsoft Fabric has an array of capacities that you can buy. The capacities are split into SKU's. Each SKU provides a different amount of computing power, measured by its Capacity Unit (CU) value.

Creating Fabric Capacity in Azure

Search for the Fabric Capacity in the Azure Marketplace.

Select the appropriate Subscription and resource group. You can move the Fabric capacity to another Resource Group later if needed.
Provide a name for the capacity.
Define the region for the capacity.
Choose the desired size, starting from F2. F64 is equivalent to a Power BI Premium capacity. You can learn more on this page.
Assign a Fabric capacity Administrator.
Click on "Create" to initiate the capacity creation process. Once created, you will see the relevant information on the screen.

Assign capacity in Microsoft Fabric

After creating the Fabric capacity, you need to assign it to a Workspace by following these steps:

Open the Microsoft Fabric admin portal.
Select the capacity option on the right side.
Locate the recently created capacity in the list.

Assign capacity to a Workspace

The last step is to assign the capacity to a Workspace.

On the Workspace level, click on settings.
Go to the Premium tab and select the Fabric capacity.
Define the correct License capacity for the Fabric capacity.

Select the Fabric Capacity and define the correct License capacity to it. That's all, you are now using the new capacity.

Capacity Pause/Resume

With the Fabric capacity set up, you can take advantage of the Pause/Resume feature, which allows you to temporarily halt and resume the capacity, making it useful for development and testing purposes. However, please note that this option will not work if you purchase Azure Reservation in the future.

Microsoft Fabric app

To monitor usage and related to Microsoft Fabric capacities, you can use the Microsoft Fabric utilization and metrics app.

To install the Microsoft Fabric Capacity Metrics app for the first time, follow these steps:

Select one of these options to get the app from AppSource:Go to AppSource > Microsoft Fabric Capacity Metrics and select Get it now.In Microsoft Fabric:
1. Select Apps.
2. Select Get apps.
3. Search for Microsoft Fabric.
4. Select the Microsoft Fabric Capacity Metrics app.
5. Select Get it now.
When prompted, sign in to AppSource using your Microsoft account and complete the registration screen. The app takes you to Microsoft Fabric to complete the process. Select Install to continue.
In the Install this Power BI app window, select Install.
Wait a few seconds for the app to install.

It's a pretty simple process to set it up.

Documentation

If you have any questions, I'd love to hear them.
More information about Microsoft Fabric can be found at:

Microsoft Fabric Content Hub

Latest Posts

Categories

Feel free to leave a comment

Connecting Event Hubs in Microsoft Fabric

by Erwin | May 26, 2023 | Microsoft Fabric

Synapse Data Engineering

by Erwin | May 26, 2023

Connecting Azure Event Hubs with Eventstream in Microsoft Fabric

In my previous blog I did give you an introduction of the possibilities of Real-Time Analytics in Microsoft Fabric.

In this blog we will have a closer look into how we can connect data from one of our existing Azure Event Hubs.

Looking to the above picture, you see an end to end workflow for a Real-Time Analytics scenario. We can directly see which Fabric Artifact we need to use to build the solution. To build the complete solution below took me maximum 20 minutes,.

Loading data from Azure Event Hubs to Lakehouse

Requirements:

An existing Azure Event Hub.
New consumer group(never you use an existing). If you use an existing consumer group then it can happen that the event hub stop sending messages to your existing environment.
Fabric Workspace

Note:

Adding a consumer group is not available in the Basic tier but only in the Standaard Tier.

Creating a Shared Access Policy on the Event Hub

Create a new Shared Access Policy on the Event Hub, with the manage option enabled.

Note down the SAS Policy name and the Primary Key. We will need this later to setup the Connection in Microsoft Fabric.

Create a Data Connection in Microsoft Fabric

In the menu bar(top right) open the settings toggle and open the Manage Connection option.

Search for Event Hub.

Connection name

Name of the Connection

Event Hub Namespace

https://xxxxxxx.servicebus.windows.net:443/

Authentication

Username: Name of the SAS Policy

Password: Primary Key of the SAS Policy

Now we have created a connection to our Azure Event Hub, we’re ready to receive our streaming data and to setup an Eventstream.

So lets start to open the the Synapse Real-Time Analytics Experience. This can be found in the left bottom corner of your Microsoft Fabric environment.

Microsoft Fabric

Fabric Capacity

Make sure you have a Microsoft Fabric or Power BI Premium capacity assigned to this workspace.

Create Eventstream in Microoft Fabric

Within our Fabric Workspace, select NEW on the left upper corner and select Eventstream.

Define a name for the Evenstream and click on create.

This can take a couple of minutes to setup, but don’t worry there are a lot of things happening in the background. Microsoft Fabric is a SaaS application so things needs to be deployed for you.

The great advantage for you, things will much easier to setup.

So once everything is ready you will see this new screen:

Create the Eventstream Source

Next step is to connect our Source, in this case the connection to the Event Hub.

Select the Azure Event Hubs, a new pane will open.


Source name	Define a name for your source, you can use the name of the Event Hub or a custom name
Cloud Connection	Select the connection you’ve created in the beginning of this blog
Data Format	Define the correct format based on your Event Stream
Consumper group	You can select a group you have a created in the beginning of this blog. Or you create a new one as well.

Note: Never you use an existing Consumer Group, because your current application connected to this Consumer Group will stop receiving data.

Once all the required field are filled in, click on Create. Now the source of your Eventstream will be created.

After the connection is setup successfully you can click on Data Preview, to see what kind of data is coming in and if this is the correct data.

If you data is not shown the correct way, you can change data format to csv or avro.

Destination

One of our last steps in our configuration is to setup the destination for the Eventstream.

In this blog we will use a Lakehouse(more destination are available), so that we can store our data and use it in a later stadium to build reports on top of the data.

Lakehouse

You can choose if you want to create a new Lakehouse or use an existing one.

If you do not have created a Lakehouse , you need to create one.

Select in left bottom corner, the option Data Engineering.

Create a New Lakehouse, define a name and click on create.

After creating a Lakehouse, you will see that Automatically a Dataset and a SQL Endpoint are created by default. How easy is that!

Create the Eventstream Destination

Create Lakehouse as Eventstream Destination

A new windows will open were we can configure the Lakehouse connection/destination.

Destination Name	The name of the destination
Workspace	The workspace were you’re Lakehouse is located
Lakehouse	The Lakehouse you want to use(you can have more than 1 in the same workspace)
Delta table	The Delta Table were you want to store the data, you can also create a new table from here.
Data format	Mostly the same format as the data you added to in Source

Event Processing

Before you create the destination, you can transform and preview the data that is being ingested for the destination with the Event Processor. The event processor editor is a no-code experience that provides you with the drag and drop experience to design the event data processing logic.

As you can see there’re a lot of operations/transformation possible to transform your data in a correct way, renaming a field is a matter of seconds with a no-code experience.

The last step is to create the destination. It is just as easy as it is, click on Create.

The Eventstream is ready, Source is streaming data and the destination is Ingesting data.

Navigate to your Lakehouse to verify the ingested data.

If you prefer to verify with a TSQL command, you can easily switch to a SQL Endpoint mode, which is located in the upper right corner.

And now you can run any type of query you want.

Next Steps

Build Power BI report with the ingested eventdata in the Lakehouse. As mentioned before a default dataset is already created.

In my next blog I will explain how we can start using the KQL database as a destination, so stay tuned.

Documentation

Click below to read more about Microsoft Fabric and Real-Time Analytics.

Microsoft Fabric Real -Time Analytics documentation

Exploring the Fabric technical documentation

OneLake in Fabric blog

Exploring the Fabric technical documentation

More information about Microsoft Fabric can be found at:

Microsoft Fabric Content Hub

Like always, I case you have some questions left, do hesitate to contact me.

Latest Posts

Categories

Feel free to leave a comment

Synapse Data Engineering

Latest Microsoft blog posts:

Latest Microsoft Learning posts:

Latest Community blog posts:

Latest Video’s/ Podcasts:

Latest Posts

Categories

Feel free to leave a comment

Synapse Data Engineering

Get ready for data enrichement in Microsoft Fabric

Requirements

Extracting text fields from raw social media posts

Set up a Spark Dataframe

Set up our parameters

Set up the prompt column

Calling the Azure OpenAI API

Transforming the results

Displaying and Verifying the results

Final

Learn more

Questions

Latest Posts

Categories

Feel free to leave a comment

Synapse Data Engineering

Microsoft Fabric

How to enable Fabric

Start Trial

Guy in the Cube

Documentation

Feel free to leave a comment

Synapse Data Engineering

Microsoft Fabric Capacity

Creating Fabric Capacity in Azure

Assign capacity in Microsoft Fabric

Assign capacity to a Workspace

Capacity Pause/Resume

Microsoft Fabric app

Documentation

Latest Posts

Categories

Feel free to leave a comment

Synapse Data Engineering

Connecting Azure Event Hubs with Eventstream in Microsoft Fabric

Loading data from Azure Event Hubs to Lakehouse

Creating a Shared Access Policy on the Event Hub

Create a Data Connection in Microsoft Fabric

Create Eventstream in Microoft Fabric

Create the Eventstream Source

Destination

Lakehouse

Create the Eventstream Destination

Event Processing

Next Steps

Documentation

Latest Posts

Categories

Feel free to leave a comment

Categories