Azure Open AI and Microsoft Fabric

Azure Open AI and Microsoft Fabric

Erwin

by Erwin | Sep 14, 2023

Get ready for data enrichement in Microsoft Fabric

Azure OpenAI is fun and exciting and we can use it to do amazing stuff. In combination with Spark on Microsoft Fabric or Azure Synapse Analytics, we can transform and generate large amounts of text data and make use of OpenAI’s flexibility in defining the transformation. The SynapseML library that comes pre-installed on all Synapse Spark pools and Fabric workspaces includes an OpenAI module that allows you to perform OpenAI transformations on spark dataframes, enabling OpenAI at scale. Azure OpenAI is fun and exciting and we can use it to do amazing stuff. In combination with Spark on Microsoft Fabric or Azure Synapse Analytics, we can transform and generate large amounts of text data and make use of OpenAI’s flexibility in defining the transformation. 

Together with Floris Berends we had a look into the possibilities and wrote the post below

Requirements

To run this example you need to have:

  • An Azure OpenAI service
  • A model deployment
  • A Microsoft Fabric workspace Alternatively, a Synapse Analytics workspace
  • A Spark Notebook

Extracting text fields from raw social media posts

Let’s say we are scraping social media posts and are interested in some of the details. Usually, scraping text fields results in some pretty messy data. For this example, we are using the Scikit-Learn newsgroups open dataset.

Set up a Spark Dataframe

In order to load the open dataset into a spark dataframe, we first load it into a pandas dataframe. Of course if you are using your own data, you can load the data from anywhere, as long as it fits into a spark dataframe

import pandas as pd
from sklearn.datasets import fetch_20newsgroups
newsgroups = fetch_20newsgroups(subset="train", categories=['talk.politics.misc'])
pd_df = pd.DataFrame(newsgroups["data"], columns=["data"])
df = spark.createDataFrame(pd_df)

Set up our parameters

To prepare the OpenAI transformation, we need to provide the API with a number of connection and configuration parameters. These include the Azure OpenAI service name, the name of the model deployment, and a prompt that will specify our transformation. The parameters can be found in the Azure Portal, on your Azure OpenAI resource. If you have not yet deployed a model, do this now. Note that the prompt specifies what we want the model to do, but also specifies the format in which we want the model to respond. This is crucial in getting reliable results from the model and this is what enables us to use the transformation as part of a pipeline.

openai_service_name = "<YOUR SERVICE NAME>"
openai_deployment_name = "<YOUR DEPLOYMENT NAME>"
openai_key = "<YOUR SERVICE KEY>"
source_content_column = "data"
system_prompt = """
You will read the raw text of an e-mail and extract the senders e-mail
address and subject from the text. You will also list the topics of the email, provide a short one-sentence summary, and output the sentiment of the email. Ensure that the sentiment is one of the following: negative, neutral, positive.

Your response will be in the following format
{{
"EMAILADDRESS": "",
"SUBJECT": "",
"SUMMARY": "",
"SENTIMENT": "",
"TOPICS: []
}}
"""

Set up the prompt column

Because OpenAI needs a prompt in order to generate a completion, we need to setup a prompt column that includes both the instruction (system_prompt) we set up earlier and our data. The way that Azure OpenAI chat completions work, is that you can provide the ‘chat history’ as a message column. This column is what we will use as input for the transformation. Additionally, Azure OpenAI chat completion messages include a ‘role’ parameter. The role specifies who sent the message. In a normal chat interaction, there are 2 roles: the user and the assistant (i.e. the model). However, it is possible to provide a ‘system’ message that will instruct the model how to behave. We will use a ‘system’ message in order to instruct the model on how to transform our data. In order to do this, we need to set up the prompt column in the following way:

  1. A message with the ‘system’ role and our instruction as content.
  2. A message with the ‘user’ role and our data as content.
import pyspark.sql.functions as F

from pyspark.sql.types import ArrayType, StructType, StructField, StringType
df = df.withColumn("prompt", F.udf(
    lambda system_prompt, content: [{"name":"system", "role":"system", "content": system_prompt},{"name":"user", "role":"user", "content": content}],
        ArrayType(
            StructType([
                StructField("name", StringType(),False),
                StructField("role", StringType(),False),
                StructField("content", StringType(),False)
                ]
            )
        )
    )(F.lit(system_prompt),F.col(source_content_column)))

Calling the Azure OpenAI API

Now that we have the input dataframe with the data and prompt just how we want it, we can set up the call to the Azure OpenAI API. Note that Spark will not immediately execute the transformation, but will simply setup the plan for the dataframe. The API will only be called when we actually need the data (e.g. when we save or display the dataframe).

from synapse.ml.cognitive import OpenAIChatCompletion
completion = (
    OpenAIChatCompletion()
        .setSubscriptionKey(openai_key)
        .setDeploymentName(openai_deployment_name)
        .setUrl(f"https://{openai_service_name}.openai.azure.com/")
        .setMessagesCol("prompt")
        .setErrorCol("error")
        .setOutputCol("output")
        .limit(10)
)

Transforming the results

The OpenAIChatCompletion mehthod simply puts the completion results into the output column, but we want to have the results in separate columns. Before we can do this  we need to define the output schema.

output_columns = "EMAILADDRESS,SUBJECT,SUMMARY,SENTIMENT,TOPICS"
prompt_schema = StructType(
                   [StructField(col, StringType(), True)
                      for col in output_columns.split(",")
                   ])
df_result = completion.transform(df.limit(10)).withColumn(
                 "response",
                  F.from_json(
  F.col("output.choices.message.content").getItem(0)
  ,prompt_schema)
                  ).select("response.*","error")

Displaying and Verifying the results

There are a number of things that can go wrong. For any row, errors returned by the API will be put into the error column that you provided by .setErrorCol. We can display the dataframe to inspect the results:

display(df_result)

Microsoft-Fabric-Open-AI

Final

It might seem that this setup is so versatile that you can use it to apply any transformation you desire on any column in any dataset. Although this might not be far from the truth, there are a couple of things you need to consider:

  1. Cost: Azure OpenAI transformations are more expensive then those that do not rely on external APIs (e.g. Spark Native transformation like map(), flatten(), explode(), or using regular expressions and the like).
  2. Complexity: This example applies a transformation with a simple output schema. It might very well be the case that asking a LLM to output data in a very complex schema will not turn out well.
  3. Language: This example applies a transformation that is primarily a language based transformation: extracting and summarizing information that is available as natural language. Using LLMs to apply math-based, logic-based, or code-based transformations might not show reliable results.

 

The main take-away is that using Azure OpenAI to transform text-fields though natural language operations like summarization, description and extraction can be done fast and reliable. We are looking forward to seeing where this technology will take us.

Learn more

Fabric (preview) trial

Data science in Microsoft Fabric

Azure OpenAI for big data

Questions

If you have any further questions, feel free to ask them in the comments below.

Feel free to leave a comment

Microsoft Purview new Experience is coming

Microsoft Purview new Experience is coming

Erwin

by Erwin | Jun 28, 2023

Get ready for the next enhancement in Microsoft Purview

Get ready for the next enhancement in Microsoft Purview, as it brings a range of exciting new features and capabilities. To ensure the best experience with Purview, it is recommended that you tag your existing Microsoft Purview accounts appropriately.

Mark your Purview accounts with the proper tags

In short, mark each Purview account with the following tags: Name: "Purview Environment," Value: "Production."

In the long story, you have several tag options available to you. The table below outlines the different tags and their purposes:

In all purposes the tag name is "Purview Environment"

Production - This tag signifies that the account is or will be used for cataloging and governance requirements in a production environment. It is a candidate to be selected as the primary account for the tenant.

Pre-production - This tag indicates that the account is or will be used to validate cataloging and governance requirements before making them available in production. It is a candidate to be merged as a domain or can be deleted.

Test - This tag suggests that the account is or will be used for testing capabilities in Microsoft Purview Governance. It is a candidate to be merged as a domain or can be deleted.

Dev - This tag signifies that the account is or will be used to test capabilities or develop custom code and scripts in Microsoft Purview Governance. It is a candidate to be merged as a domain or can be deleted.

Proof Of Concept - This tag indicates that the account is or will be used to test capabilities or develop custom code and scripts in Microsoft Purview Governance. It can be deleted in the future.

Deprecate - This tag is for accounts that were created a while back but are not in use today. They can be deleted in the future.

Create tag

The following instructions will allow you to successfully tag your Purview account(s) with the desired classification:
1. Sign in to the Azure portal using your Azure account credentials.
2. Use the search bar to find and select your Microsoft Purview account.
3. Locate the Tags (edit) section on the left of the Purview account overview page.
4. Click on the Tags (edit) link to open the tags editor.
5. Add a new tag with a Name and Value according to the provided options (e.g., Name: "Purview Environment," Value: "Production").
6. Click on Save to apply the tag to your Purview account.

What's Next!

The next Microsoft Purview enhancement is coming to you in a few weeks. Along with the features you've enjoyed with Microsoft Purview so far, this enhancement will provide these additional capabilities:
Centralized organization-wide data governance that automatically gives you visibility across your Microsoft Cloud.
Configuration and set up is no longer required to capture metadata for Microsoft Cloud - Microsoft Purview is auto-attached to Microsoft Fabric and Azure SQL.
A clean, crisp, and more intuitive user interface to navigate the platform and apps.
These new features will be turned on automatically and added to your existing capabilities.

How can you use our new experience?

These new features will be turned on automatically and seamlessly integrated with your existing capabilities. The new experience will be available once your organization has been enabled. The exact steps to get started will depend on your organization's current structure, and more information will be provided in the coming weeks.

Follow one of these guides below:

More information will be available in the coming weeks.

You can learn more about the enhancements by going to https://learn.microsoft.com/azure/purview/account-upgrades.
I'm looking forward to new the experience with hopefully an even better integration with Microsoft Fabric.
If you have any further questions, feel free to ask them in the comments below.

Feel free to leave a comment

How to enable Microsoft Fabric

How to enable Microsoft Fabric

Erwin

by Erwin | Jun 12, 2023

Microsoft Fabric

I got some questions from customers that didn’t know how to enable Microsoft Fabric and that they only see Power BI Items and not the new announced Experiences. In this short blog I will explain how you can easily enable Microsoft Fabric.

How to enable Fabric

If you want to try Fabric in your tenant, you need to enable the Fabric features in your Power BI admin portal.

To do, go to https://app.powerbi.com/.

Note: You must be an Power BI administrator

Please note that Microsoft Fabric Capacity(Trial) or Power BI Premium Capacity is required to get started with Microsoft Fabric.

  • Open the Microsoft Fabric admin portal.

Microsoft-Fabric-Admin-portal

  • By default, Microsoft Fabric is disabled (if you do not change the setting, it will be set to ON after July 1st 2023).

Microsoft-Fabric-Enabled

  • You enable Microsoft Fabric for the whole organization or you can just start with a small group(Specify Security Groups). My advice is to start with a small group. Microsoft Fabric is in Public Preview and not ready for Production Environments.
  • It will take up to 15 minutes to deploy these setting, mostly much faster. After that the new experiences will be available.

Microsoft-Fabric-Experiences

  • Select Data Engineering

In the top of the page you can directly, see which experience you use.

Microsoft-Fabric-Data-Engineering-experience

In this case Synapse Data Engineering Experience, check out the logo on the left side and the text behind home?experience 

Start Trial

Have a look to this page how easily it is to get started with a free Trial Fabric (preview)

Guy in the Cube

You also watch the video, who Adam Saxton created:

 

 

Documentation

If you have any questions, I’d love to hear them.
More information about Microsoft Fabric can be found at my Content Hub:

Microsoft Fabric Content Hub

Feel free to leave a comment

Create capacity for Microsoft Fabric

Create capacity for Microsoft Fabric

Erwin

by Erwin | Jun 5, 2023

Microsoft Fabric Capacity

Since the first of June 2023, we can create Fabric capacities in Azure. This are currently the Pay as You go pricing, later this year the Azure Reservation will follow. OneLake storage pricing is comparable to Azure ADLS (Azure Data Lake Storage) pricing and is not included in the price below. These prices are prices in the West-Europe region, prices can be different across regions.

Microsoft Fabric Pricing new

Note: As you can see, the F1024 and F2028 are not having the correct prices, it should be 2 or 4 x F512. The error is already is report to the Fabric Team.

Read the announcement below:

Announcing Microsoft Fabric capacities are available for purchase | Microsoft Fabric Blog

Microsoft Fabric Capacity is a distinct pool of resources allocated to Microsoft Fabric that resides on a tenant. The size of the capacity determines the amount of computation power your organization gets.

Microsoft Fabric has an array of capacities that you can buy. The capacities are split into SKU's. Each SKU provides a different amount of computing power, measured by its Capacity Unit (CU) value.

Creating Fabric Capacity in Azure

  • Search for the Fabric Capacity in the Azure Marketplace.

Microsoft Fabric create capacity

 

  • Select the appropriate Subscription and resource group. You can move the Fabric capacity to another Resource Group later if needed.Microsoft Fabric configure capacity
  • Provide a name for the capacity.
  • Define the region for the capacity.
  • Choose the desired size, starting from F2. F64 is equivalent to a Power BI Premium capacity. You can learn more on this page.
  • Assign a Fabric capacity Administrator.
  • Click on "Create" to initiate the capacity creation process. Once created, you will see the relevant information on the screen.

Microsoft Fabric capacity in Azure

Assign capacity in Microsoft Fabric

After creating the Fabric capacity, you need to assign it to a Workspace by following these steps:

  • Open the Microsoft Fabric admin portal.
  • Select the capacity option on the right side.
  • Locate the recently created capacity in the list.Microsoft Fabric capacity

Assign capacity to a Workspace

The last step is to assign the capacity to a Workspace.

  • On the Workspace level, click on settings.
  • Go to the Premium tab and select the Fabric capacity.
  • Define the correct License capacity for the Fabric capacity.

Microsoft Fabric Capacity assigned

Select the Fabric Capacity and define the correct License capacity to it. That's all, you are now using the new capacity.

Capacity Pause/Resume

With the Fabric capacity set up, you can take advantage of the Pause/Resume feature, which allows you to temporarily halt and resume the capacity, making it useful for development and testing purposes. However, please note that this option will not work if you purchase Azure Reservation in the future.

Microsoft Fabric app

To monitor usage and related to Microsoft Fabric capacities, you can use the Microsoft Fabric utilization and metrics app.

To install the Microsoft Fabric Capacity Metrics app for the first time, follow these steps:

  1. Select one of these options to get the app from AppSource:Go to AppSource > Microsoft Fabric Capacity Metrics and select Get it now.In Microsoft Fabric:
    1. Select Apps.
    2. Select Get apps.
    3. Search for Microsoft Fabric.
    4. Select the Microsoft Fabric Capacity Metrics app.
    5. Select Get it now.
  2. When prompted, sign in to AppSource using your Microsoft account and complete the registration screen. The app takes you to Microsoft Fabric to complete the process. Select Install to continue.
  3. In the Install this Power BI app window, select Install.
  4. Wait a few seconds for the app to install.

It's a pretty simple process to set it up.

Documentation

If you have any questions, I'd love to hear them.
More information about Microsoft Fabric can be found at:

Microsoft Fabric Content Hub

Feel free to leave a comment

Microsoft Fabric pricing (Preview)

Microsoft Fabric pricing (Preview)

Erwin

by Erwin | Jun 3, 2023

Microsoft Fabric Pricing

Microsoft Fabric pricing in Public Preview is announced as of the 1 st of June.

These are currently the Pay as You go pricing, later this year the Azure Reservation will follow.

OneLake storage pricing is comparable to Azure ADLS (Azure Data Lake Storage) pricing and is not included in the price below. These prices are prices in the West-Europe region, prices can be different across regions.

Microsoft Fabric capacity overview new

Updated on June 13th.

Guy in the Cube did create a great video on the Microsoft Fabric Licensing and Cost. The way he tells it is pretty easy and simple, I would definitely check out the video.

Microsoft-Fabric-Sku Overview

 

Microsoft-Fabric-Licensing-Overview

 

 

Feel free to leave a comment