Microsoft Fabric Content Hub Update November

Microsoft Fabric Content Hub Update November

Fabric-Overview-Short

Microsoft Purview

by Erwin | Nov 30, 2023

Stay up-to-date with the latest and most valuable content about Microsoft Fabric, all in one place!  From insightful articles and tutorials to engaging videos and community blogs, you’ll find a treasure trove of resources to deepen your understanding.

This time there was so much content, sorry if I missed yours. Attending Ignite in Person with some lack of sleep due a jet lag and following sessions was the main reason for.

Microsoft Fabric is now generally available

One of the most anticipated announcements at Ignite 2023 was the general availability of Microsoft Fabric, a unified data platform that enables organizations to prepare their data for AI innovation. Microsoft Fabric was first introduced at Microsoft Build 2023 as “perhaps the biggest launch of a data product from Microsoft since the launch of SQL Server”, according to Satya Nadella, CEO and Chairman of Microsoft.

Microsoft Fabric integrates Power BI, Azure Synapse Analytics, and Azure Data Factory into a single service, with a common capacity pricing model and a unified user experience. With Microsoft Fabric, users can access, analyze, transform, and govern their data across multiple sources and formats, using familiar tools and languages. Microsoft Fabric also supports the creation and consumption of foundation models, which are large-scale AI models that can be customized and applied to various domains and scenarios.

Microsoft Fabric has been adopted by thousands of organizations around the world, including 67 percent of the Fortune 500, since its preview announcement

InSpark Feature Partner for Microsoft Fabric

Proud to announce that my employer  InSpark | Innovate to Accelerate has been recognized by Microsoft and named on their list of Partners. Our hard work and dedication to onboarding customers on MicrosoftFabric has paid off, and we’re excited to continue this amazing journey. Let’s keep pushing the boundaries and achieving great things together!Fabric-Feature-Partner-InSpark

Latest Community blog posts:

What is Fabric DWH Bursting  – Project Controls blog (datamonkeysite.com)

Bursting and Smoothing - Yin and Yang of the Fabric Capacity! - Data Mozart (data-mozart.com)

Programmatically Creating, Managing Lakehouses in Fabric

Visualizing JSON Structure In Fabric Notebook

Microsoft Ignite 2023 – Fabric Round-up — Advancing Analytics

Demystifying the Data Lakehouse in Microsoft Fabric – justB smart

dbt Cloud is now available for Microsoft Fabric (getdbt.com)

Controlling Direct Lake Fallback Behavior (fabric.guru)

Delta Lake Change Data Feed in Fabric Lakehouses (serverlesssql.com)

Measure Maze: Visualizing Measure Dependencies Using Semantic Link & Network Analysis (fabric.guru)

Fabric : Engines Resource Consumption. – Project Controls blog (datamonkeysite.com)

Fabric Lakehouse Loading using Data Pipelines & Notebooks – Inspired by MS End-to-End Tutorials (serverlesssql.com)

Services that I recommend when working with Microsoft Fabric - Kevin Chant (kevinrchant.com)

Thoughts about the DP-600 exam for the new Microsoft Fabric certification - Kevin Chant (kevinrchant.com)

Spreading your SQL Server wings with Microsoft Fabric Data Warehouses - Kevin Chant (kevinrchant.com)

Debunking Myths and Embracing Innovation with Microsoft Fabric – Data – Marc (data-marc.com)

Kusto Query Language (KQL) Databases in Microsoft Fabric (mssqltips.com)

Copy Activity, Dataflows Gen2, and Notebooks vs. SharePoint Lists (datameerkat.com)

Understanding Storage Costs for Microsoft Fabric – The White Pages (bifocal.show)

Brian Bonk | KQL Data live copy to OneLake

Working with tables in Microsoft Fabric Lakehouse - Everything you need to know! - Data Mozart (data-mozart.com

CHANGE (IN THE HOUSE OF LAKES) - It's Not About The Cell (itsnotaboutthecell.com)

Latest Video’s/ Podcasts:

(1302) Microsoft Fabric - YouTubee

Feel free to leave a comment

How to Discover and Govern Your Data with Microsoft Purview and Microsoft Fabric

How to Discover and Govern Your Data with Microsoft Purview and Microsoft Fabric

Microsoft Purview

by Erwin | Nov 29, 2023

How Microsoft Purview and Microsoft Fabric work together to empower data discovery and governance

Microsoft Purview is a unified data governance service that helps you manage and govern your on-premises, multi-cloud, and software as a service (SaaS) data. Microsoft Fabric is a new cloud-based data platform that enables you to create, share, and collaborate on data-driven insights with your team. Together, Microsoft Purview and Microsoft Fabric offer a seamless integration that allows you to browse and search Fabric items, access metadata from Fabric items, and apply data policies and classifications to Fabric items.

New Portal Experience

A few months ago, Microsoft announced the new portal Experience in Microsoft Purview as it offers a range of exciting new features and capabilities. Data Governance, Risk and Compliance are increasingly integrating into a unified experience. Microsoft Fabric will have a native integration with Microsoft Purview.

Purvew-New-Portal

Browse and search Fabric items

Just like Microsoft Azure, Microsoft Fabric is a new source for Microsoft Purview. Since Microsoft Purview is attached to every Fabric instance by default, you can click on the tile “Microsoft Fabric” on the front page of Microsoft Purview Data Catalog to start browsing your Fabric items. Automatically, any user can see the workspaces and Fabric items based on the permission setting they have in Fabric. You can also use the search bar to find Fabric items by keywords, filters, or facets.

Purview-Fabric-DataCatalog

Access metadata from Fabric items

In the coming weeks, Microsoft Purview Enterprise customers can provide broader access to metadata from Fabric items by scanning Fabric. When a Fabric is scanned, Microsoft Purview writes information about Fabric items to the Purview data map, and access to that metadata is governed by Microsoft Purview access control. This allows administrators to give users metadata access for data discovery or governance, without requiring those users to have read permissions on the underlying data sources.

Purview-Integration0Fabric

Live view in Microsoft Purview

Resources in live view in the Microsoft Purview Data Catalog automatically have this metadata available:

  • Name
  • Properties
  • Schema
  • Lineage

Creating a new workspace in Fabric will automatically appear in Microsoft Purview.

Purview-LiveView

Available resources

The following Fabric items will be available in Microsoft Purview as part of this public preview release.

Experiences Fabric items
Real-Time Analytics KQL Database
KQL Queryset
Data Science Experiment
ML Model
Data Factory Data pipeline
Dataflow Gen2
Data Engineering Lakehouse
Notebook
Spark Job Definition
SQL analytics endpoint
Data Warehouse Warehouse
Power BI Dashboard
Dataflow
Datamart
Dataset
Report
Paginated report*

* Only available by scanning

Conclusion

Microsoft Purview and Microsoft Fabric are two powerful services that work together to empower data discovery and governance. By integrating Microsoft Purview and Microsoft Fabric, you can leverage the benefits of both services, such as:

  • Browse and search Fabric items in the Microsoft Purview Data Catalog
  • Access metadata from Fabric items without requiring data access permissions

If you want to learn more about Microsoft Purview and Microsoft Fabric, you can visit the following links:

I hope you find this blog post helpful. Please let me know if you have any feedback or questions.

Feel free to leave a comment

Microsoft Fabric Content Hub Update November

Microsoft Fabric Content Hub Update October

Fabric-Overview-Short

Microsoft Purview

by Erwin | Oct 16, 2023

Stay up-to-date with the latest and most valuable content about Microsoft Fabric, all in one place!  From insightful articles and tutorials to engaging videos and community blogs, you’ll find a treasure trove of resources to deepen your understanding.

Latest Community blog posts:

What Happens When You Clone A Fabric Warehouse Table? – Serverless SQL

Flatten Nested JSON in Microsoft Fabric – Turning data into direction (storybi.com)

What does it mean to refresh a Direct Lake Power BI dataset in Fabric? (crossjoin.co.uk)

Tear down walls, no data silos any longer using Microsoft Fabric, and finally, export to Excel will become a breeze - Mincing Data - Gain Insight from Data (minceddata.info)

Does it feel like too much? — DATA GOBLINS (data-goblins.com)

How do you set up your Data Governance in Microsoft Fabric? – Data Ascend (data-ascend.com)

Fabric, Power BI, Power Platform, Data Platform: Pausing a Fabric Capacity - What Does It Actually Mean? (nickyvv.com)

Understanding data temperature with Direct Lake in Fabric – Data – Marc (data-marc.com)

Exploring Direct Lake Framing and warm-up data using Semantic Link in Fabric Notebooks – Data – Marc (data-marc.com)

Microsoft Fabric: setting your spark compute pool size – Reitse's blog (sqlreitse.com)

Microsoft Fabric, capacity usage and a design – Reitse's blog (sqlreitse.com)

Lightening Fast Copy In Fabric Notebook

Fabric Semantic Link and Use Cases

Keep your existing Power BI data and add new data to it using Fabric (crossjoin.co.uk)

Connect Power BI and Spark notebooks with Microsoft Fabric Semantic Link – Seequality

Recommended Microsoft Learn material for Microsoft Fabric - Kevin Chant (kevinrchant.com)

Data Science in Microsoft Fabric - RADACAD

What is OneLake in Microsoft Fabric, and Why You Should Care? - RADACAD

Feel free to leave a comment

Microsoft Fabric Content Hub Update November

Microsoft Fabric Content Hub Update September

Fabric-Overview-Short

Microsoft Purview

by Erwin | Sep 20, 2023

Stay up-to-date with the latest and most valuable content about Microsoft Fabric, all in one place!  From insightful articles and tutorials to engaging videos and community blogs, you’ll find a treasure trove of resources to deepen your understanding.

Feel free to leave a comment

Azure Open AI and Microsoft Fabric

Azure Open AI and Microsoft Fabric

Microsoft Purview

by Erwin | Sep 14, 2023

Get ready for data enrichement in Microsoft Fabric

Azure OpenAI is fun and exciting and we can use it to do amazing stuff. In combination with Spark on Microsoft Fabric or Azure Synapse Analytics, we can transform and generate large amounts of text data and make use of OpenAI’s flexibility in defining the transformation. The SynapseML library that comes pre-installed on all Synapse Spark pools and Fabric workspaces includes an OpenAI module that allows you to perform OpenAI transformations on spark dataframes, enabling OpenAI at scale. Azure OpenAI is fun and exciting and we can use it to do amazing stuff. In combination with Spark on Microsoft Fabric or Azure Synapse Analytics, we can transform and generate large amounts of text data and make use of OpenAI’s flexibility in defining the transformation. 

Together with Floris Berends we had a look into the possibilities and wrote the post below

Requirements

To run this example you need to have:

  • An Azure OpenAI service
  • A model deployment
  • A Microsoft Fabric workspace Alternatively, a Synapse Analytics workspace
  • A Spark Notebook

Extracting text fields from raw social media posts

Let’s say we are scraping social media posts and are interested in some of the details. Usually, scraping text fields results in some pretty messy data. For this example, we are using the Scikit-Learn newsgroups open dataset.

Set up a Spark Dataframe

In order to load the open dataset into a spark dataframe, we first load it into a pandas dataframe. Of course if you are using your own data, you can load the data from anywhere, as long as it fits into a spark dataframe

import pandas as pd
from sklearn.datasets import fetch_20newsgroups
newsgroups = fetch_20newsgroups(subset="train", categories=['talk.politics.misc'])
pd_df = pd.DataFrame(newsgroups["data"], columns=["data"])
df = spark.createDataFrame(pd_df)

Set up our parameters

To prepare the OpenAI transformation, we need to provide the API with a number of connection and configuration parameters. These include the Azure OpenAI service name, the name of the model deployment, and a prompt that will specify our transformation. The parameters can be found in the Azure Portal, on your Azure OpenAI resource. If you have not yet deployed a model, do this now. Note that the prompt specifies what we want the model to do, but also specifies the format in which we want the model to respond. This is crucial in getting reliable results from the model and this is what enables us to use the transformation as part of a pipeline.

openai_service_name = "<YOUR SERVICE NAME>"
openai_deployment_name = "<YOUR DEPLOYMENT NAME>"
openai_key = "<YOUR SERVICE KEY>"
source_content_column = "data"
system_prompt = """
You will read the raw text of an e-mail and extract the senders e-mail
address and subject from the text. You will also list the topics of the email, provide a short one-sentence summary, and output the sentiment of the email. Ensure that the sentiment is one of the following: negative, neutral, positive.

Your response will be in the following format
{{
"EMAILADDRESS": "",
"SUBJECT": "",
"SUMMARY": "",
"SENTIMENT": "",
"TOPICS: []
}}
"""

Set up the prompt column

Because OpenAI needs a prompt in order to generate a completion, we need to setup a prompt column that includes both the instruction (system_prompt) we set up earlier and our data. The way that Azure OpenAI chat completions work, is that you can provide the ‘chat history’ as a message column. This column is what we will use as input for the transformation. Additionally, Azure OpenAI chat completion messages include a ‘role’ parameter. The role specifies who sent the message. In a normal chat interaction, there are 2 roles: the user and the assistant (i.e. the model). However, it is possible to provide a ‘system’ message that will instruct the model how to behave. We will use a ‘system’ message in order to instruct the model on how to transform our data. In order to do this, we need to set up the prompt column in the following way:

  1. A message with the ‘system’ role and our instruction as content.
  2. A message with the ‘user’ role and our data as content.
import pyspark.sql.functions as F

from pyspark.sql.types import ArrayType, StructType, StructField, StringType
df = df.withColumn("prompt", F.udf(
    lambda system_prompt, content: [{"name":"system", "role":"system", "content": system_prompt},{"name":"user", "role":"user", "content": content}],
        ArrayType(
            StructType([
                StructField("name", StringType(),False),
                StructField("role", StringType(),False),
                StructField("content", StringType(),False)
                ]
            )
        )
    )(F.lit(system_prompt),F.col(source_content_column)))

Calling the Azure OpenAI API

Now that we have the input dataframe with the data and prompt just how we want it, we can set up the call to the Azure OpenAI API. Note that Spark will not immediately execute the transformation, but will simply setup the plan for the dataframe. The API will only be called when we actually need the data (e.g. when we save or display the dataframe).

from synapse.ml.cognitive import OpenAIChatCompletion
completion = (
    OpenAIChatCompletion()
        .setSubscriptionKey(openai_key)
        .setDeploymentName(openai_deployment_name)
        .setUrl(f"https://{openai_service_name}.openai.azure.com/")
        .setMessagesCol("prompt")
        .setErrorCol("error")
        .setOutputCol("output")
        .limit(10)
)

Transforming the results

The OpenAIChatCompletion mehthod simply puts the completion results into the output column, but we want to have the results in separate columns. Before we can do this  we need to define the output schema.

output_columns = "EMAILADDRESS,SUBJECT,SUMMARY,SENTIMENT,TOPICS"
prompt_schema = StructType(
                   [StructField(col, StringType(), True)
                      for col in output_columns.split(",")
                   ])
df_result = completion.transform(df.limit(10)).withColumn(
                 "response",
                  F.from_json(
  F.col("output.choices.message.content").getItem(0)
  ,prompt_schema)
                  ).select("response.*","error")

Displaying and Verifying the results

There are a number of things that can go wrong. For any row, errors returned by the API will be put into the error column that you provided by .setErrorCol. We can display the dataframe to inspect the results:

display(df_result)

Microsoft-Fabric-Open-AI

Final

It might seem that this setup is so versatile that you can use it to apply any transformation you desire on any column in any dataset. Although this might not be far from the truth, there are a couple of things you need to consider:

  1. Cost: Azure OpenAI transformations are more expensive then those that do not rely on external APIs (e.g. Spark Native transformation like map(), flatten(), explode(), or using regular expressions and the like).
  2. Complexity: This example applies a transformation with a simple output schema. It might very well be the case that asking a LLM to output data in a very complex schema will not turn out well.
  3. Language: This example applies a transformation that is primarily a language based transformation: extracting and summarizing information that is available as natural language. Using LLMs to apply math-based, logic-based, or code-based transformations might not show reliable results.

 

The main take-away is that using Azure OpenAI to transform text-fields though natural language operations like summarization, description and extraction can be done fast and reliable. We are looking forward to seeing where this technology will take us.

Learn more

Fabric (preview) trial

Data science in Microsoft Fabric

Azure OpenAI for big data

Questions

If you have any further questions, feel free to ask them in the comments below.

Feel free to leave a comment