DEV Community: Ayaka Hara

How to Deploy a PDF Chatbot as a REST Endpoint and Test with Postman

Ayaka Hara — Thu, 25 Jan 2024 00:39:11 +0000

Utilizing Azure AI Studio's Prompt Flow, we were able to easily implement a chatbot that answers questions about PDF documents. Furthermore, we conducted an evaluation of this chatbot to verify its accuracy. In this blog post, we will take you through the next step: moving beyond just testing the chatbot within Azure AI Studio's chat feature. We will demonstrate how to deploy this chatbot as a REST endpoint and test it using Postman, a popular tool for API development and testing by developers and test engineers.

This post is a part of a series that serves as a step-by-step guide to developing a chatbot with RAG:

Step 1 : How to Easily Build a PDF Chatbot with RAG (Retrieval-Augmented Generation) Using Azure AI Studio's Prompt Flow
Step 2 : How to Evaluate a PDF Chatbot Response with Prompt Flow
Step 3 : How to Deploy a PDF Chatbot as a REST Endpoint and Test with Postman ← YOU ARE HERE!

Prerequisites
1. Modify a flow
- 1-1. Replace with vector db lookup
- 1-2. Delete an unnecessary node
- 1-3. Change the value of search result node
- 1-4. Try chat
2. Deploy a flow
- 2-1. Deploy a flow from Prompt Flow tab
- 2-2. Test the deployed endpoint
3. Test the endpoint with Postman
- 3-1. Copy Rest endpoint and Primary key
- 3-2. Configure on Postman
- 3-3. Test the endpoint
Conclusion

Prerequisites

Complete to create a flow (see How to Easily Build a PDF Chatbot with RAG (Retrieval-Augmented Generation) Using Azure Prompt Flow)
Download Postman
Optional : Complete to evaluate a flow (see How to Evaluate a PDF Chatbot Response with Prompt Flow)

1. Modify a flow

1-1. Replace with vector db lookup

Need to replace the "index lookup tool" with "vector db lookup".
Add "Vector DB Lookup" from More tools.

Then, enter a node name (e.g. search) and add it to the flow

Finally, input values for the node and Save the change.

If you want to know where these information come from, you can check them in Azure AI Search on Azure portal and navigate to indexes as below.

After creating the "Vector DB Lookup" node your graph will look like the screenshot below.

1-2. Delete an unnecessary node

Delete the "Vector Index Lookup - search_question_from_indexed_docs" and Save the change

1-3. Change the value of search result node

Change the value of search_result to "$(search.output)" in generate_prompt_context node and Save the change

After completing all the steps to replace the "index lookup tool" with "vector db lookup" your graph will look like the screenshot below.

1-4. Try chat

Try the chat with your PDFs to make sure if it works as expected.

2. Deploy a flow

2-1. Deploy a flow from Prompt Flow tab

Click "Deploy",

and change basic settings if you wish.

Review the deployment settings and Create it.

It will take around 10 minutes to complete the deployment.

2-2. Test the deployed endpoint

Move to Deployments tab and select the deployed endpoint,

then test the endpoint.

3. Test the endpoint with Postman

3-1. Copy Rest endpoint and Primary key

Copy REST endpoint and Primary key from Consume tab

3-2. Configure on Postman

POST : REST endpoint
Headers

KEY	VALUE
Authorization	Bearer {Primary key}
Content-Type	application/json

Body : raw (json)

{"question":"who are you","chat_history":[]}

3-3. Test the endpoint

Click "Send" and get the response!

Conclusion

In this blog, we have explored how to deploy the chatbot, implemented using Azure AI Studio's Prompt Flow, as a REST endpoint and how to test it using Postman. Next, we plan to introduce a method for implementing a simple application that utilizes the REST endpoint we've just deployed.

How to Evaluate a PDF Chatbot Response with Prompt Flow

Ayaka Hara — Thu, 25 Jan 2024 00:38:37 +0000

Using Azure AI Studio's Prompt Flow, we've managed to easily implement a chatbot that can answer questions about PDF documents. However, it's crucial to verify whether this chatbot is accurately extracting and providing answers from the PDFs. In this blog, as the next step, we'll delve into preparing test data and conducting an evaluation of the chatbot. This process will help us determine the accuracy of its responses, ensuring that the chatbot is effectively serving its intended purpose.

This post is a part of a series that serves as a step-by-step guide to developing a chatbot with RAG:

Step 1 : How to Easily Build a PDF Chatbot with RAG (Retrieval-Augmented Generation) Using Azure AI Studio's Prompt Flow
Step 2 : How to Evaluate a PDF Chatbot Response with Prompt Flow ← YOU ARE HERE!
Step 3 : How to Deploy a PDF Chatbot as a REST Endpoint and Test with Postman

Prerequisites
1. Prepare test data
2. Add test data on Azure AI Studio
3. Evaluate a flow
4. Check the evaluation result
Conclusion

Prerequisites

Complete to create a flow (see How to Easily Build a PDF Chatbot with RAG (Retrieval-Augmented Generation) Using Azure Prompt Flow)

1. Prepare test data

Prepare test data based on the PDF data used to create chatbot. Here, as an example, we will prepare a csv file with the expected correct answers to the questions to azure-search-openai-demo sample data.

Ideally, it would be desirable to prepare between 50 to 100 test data samples, and if possible, up to 200. However, for this trial, we will start by preparing 5 samples.

Note: you should choose an appropriate format when saving the file, as Japanese and other characters may be garbled.
file format : e.g. CSV UTF-8



question,chat_history,answer,context

What is PerksPlus?,[],It's the ultimate benefits program designed to support the health nd wellness of employees.,

What are not covered under the PErksPlus?,[],"Non-fitness related expenses, Medical treatments and procedures, Travel expenses (unless related to a fitness program), and Food and supplements.",

What is Contoso Electronics?,[],"It's a leader in the aerospace industry, providing advanced electronic components for both commercial and military aircraft.",

What is Northwind Health Plus?,[],"It's a comprehensive plan that provides comprehensive coverage for medical, vision, and dental services.",

How much does it cost for one employee to enroll in Northwind Health Plus?,[],$55.00,

Add test data on Azure AI Studio

Before starting the evaluation, test data should be registered on Azure AI Studio.

Move to "Data" tab and click "New data".

Next, select "Upload files/folders" and upload the test data from local.

Then, put the data name and finally create your data.

Once your test data is correctly uploaded you can see the data details like the following screenshot.

3. Evaluate a flow

Now, we're ready to start evaluating a flow.

Move to "Evaluation" tab and click "New evaluation".

Put an evaluation name and select "Question and answering pairs" as kind of evaluation scenario this time.

Next, select a flow which you want to evaluate.

Select the metrics, for instance, Groundedness, Relevance, Coherence, and GPT similarity.
Please refer to more details of metrics: built-in evaluation metrics

Select configuration test data to evaluate. Since we have already registered test data, select "Use existing dataset".

Change dataset mapping. Answer should be what comes out from the flow, so configure answer and ground_truth as follows.

Lastly, review the evaluation configuration and submit it.

4. Check the evaluation result

Once evaluation has been done, you can find the completed sign like below.

There are metrics scores and detailed metrics result.

Metrics scores
- Coherence : The measure evaluates the coherence and naturalness of the generated text. It measures how well the language model can produce output that flows smoothly, reads naturally, and resembles human-like language.
- Similarity : Similarity is a measure that quantifies the similarity between a ground truth sentence (or document) and the prediction sentence generated by an AI model. It is calculated by first computing sentence-level embeddings using the embeddings API for both the ground truth and the model's prediction. These embeddings represent high-dimensional vector representations of the sentences, capturing their semantic meaning and context.

The Similarity bar graph illustrates that one of the five test data had the lowest score, 1.

Further review of the delayed metrics results shows that a speculative price of "$55.00" is obtained for the question "How much does it cost for one employee to enroll in Northwind Health Plus? However, no such response was received.

The actual PDF data showed that this price was in a table, from which it could not be successfully output as a response.　(Ref : Benefit_Options.pdf)

Conclusion

In this blog post, we've shared how to prepare test data and conduct an evaluation of the chatbot, a necessary step to confirm the accuracy of its answers. Having established this, our next focus will be on deploying the chatbot, developed using Azure AI Studio's Prompt Flow, as a REST endpoint. Next in our series, "How to Deploy a PDF Chatbot as a REST Endpoint and Test with Postman", we will then explore how to test this deployment using Postman.

How to Easily Build a PDF Chatbot with RAG (Retrieval-Augmented Generation) Using Azure AI Studio's Prompt Flow

Ayaka Hara — Thu, 25 Jan 2024 00:38:20 +0000

Developing a chatbot that can answer questions about PDF documents might seem like a task requiring extensive time and effort. However, with Azure AI Studio's Prompt Flow, the implementation becomes surprisingly straightforward. In this blog, we will delve into the process of creating a chatbot with RAG (Retrieval-Augmented Generation) that can respond to queries related to sample PDF data. We’ll also guide you through testing this chatbot using the chat feature in Azure AI Studio.

This post is a part of a series that serves as a step-by-step guide to developing a chatbot with RAG:

Step 1 : How to Easily Build a PDF Chatbot with RAG (Retrieval-Augmented Generation) Using Azure AI Studio's Prompt Flow ← YOU ARE HERE!
Step 2 : How to Evaluate a PDF Chatbot Response with Prompt Flow
Step 3 : How to Deploy a PDF Chatbot as a REST Endpoint and Test with Postman

Prerequisites
1. Create a project on Azure AI Studio
- 1-1. Build your own copilot
- 1-2. Configure project details
- 1-3. Create an Azure AI resource for your projects
- 1-4. Review and create a project
2. Deploy Azure OpenAI models
- 2-1. Create a new deployment
- 2-2. Select a model
- 2-3. Deploy model
3. Create an index on Azure AI Search
- 3-1. Select your dataset
- 3-2. Configure index storage
- 3-3. Configure search settings
- 3-4. Configure index settings
- 3-5. Review and create an index
4. Configure Prompt Flow
- 4-1. Create runtime
- 4-2. Configure parameters in each node
- 4-3. Save all configuration changes
5. Try Chat!
Conclusion

Prerequisites

Prepare PDF data As a demo I'm going to use sample data from azure-search-openai-demo.
Create Azure AI Search

1. Create a project on Azure AI Studio

1-1. Build your own copilot

Access to ai.azure.com and make sure that the right directory (subscription) is selected.

Select "Build your own copilot".

Then, click "Create a new project".

1-2. Configure project details

Input project name and select "Create a new resource", then click next.

1-3. Create an Azure AI resource for your projects

Input resource name(Your Azure AI resource name must be different from your project name) and select an appropriate Azure location, then next.

1-4. Review and create a project

Click "Create a project" in Review and finish pane

It will take around 2 minutes to deploy all required resources to Azure.

2. Deploy Azure OpenAI models

Once deploying all resources to Azure, you will automatically navigate to Playground in Azure AI Studio.
At the same time, you will find the information message "No deployment exists: You need a deployment to work in the playground. Navigate to the Deployment page to create a deployment." Please click the link "Deployment page".

You will need at least two models including one for embedding.

2-1. Create a new deployment

Move to "Deployment" pane and click "Create".

2-2. Select a model

Select a model:text-embedding-ada-002, then Confirm

2-3. Deploy model

Finally deploy a selected model.

Repeat 2-1 to 2-3 above for other model (e.g. gpt-35-turbo) as well.

Once done with deploying models, they are listed in the Deployment pane.

3. Create an index on Azure AI Search

3-1. Select your dataset

Select "Upload files/folders" as Data source,

and then click "Upload" > "Upload files"

3-2. Configure index storage

Select the following settings:

"Connect other Azure AI Search resource"
Azure subscription which you deployed the Azure AI Search on
Azure AI Search service which you deployed already.

3-3. Configure search settings

Confirm the acknowledgement

3-4. Configure index settings

Insert index name and select virtual machine* (e.g. auto select)
*Selected virtual machine will be used to run indexing jobs.

3-5. Review and create an index

Finally click "Create"

It will take around 10 minutes to get done with all jobs.
If you want to know what is happening behind, "job details" navigates you to Azure ML Studio for more details.

Once all jobs are completed you can find the index with "ready" sign.

4. Configure Prompt Flow

If you move to Prompt flow pane you can find "(your-index-name)-sample-flow" in the flow list.

Once you select the flow you may notice that the basic flow is already prepared. However, there are still some manual configuration required.

4-1. Create runtime

Create runtime by simply selecting "automatic runtime start".
It will take 5 minutes or so.

4-2. Configure parameters in each node

modify_query_with_history : Select deployment name and set max token (e.g. 1000), then click "Validate and purse input"
embed_the_question : Click "Validate and purse input"

Note : Perhaps a deployment name will be removed due to some reasons after clicking validate button. Please make sure that the right deployment name for embedding is set.

search_question_from_indexed_docs : Click "Validate and purse input"
generate_prompt_context : Click "Validate and purse input"
Prompt_variants : : Click "Validate and purse input"
answer_the_question_with_context : Select deployment name and set max token (e.g. 1000), then click "Validate and purse input"

4-3. Save all configuration changes

5. Try Chat!

Finally it's time to chat with your PDFs.
Click "Chat" button,

and then put your question in the chat!

The answer will be returned in the same language with your input.

Conclusion

In this blog, we have introduced how to implement a chatbot with RAG (Retrieval-Augmented Generation) that answers questions about sample PDF data using Azure AI Studio's Prompt Flow, and how to practically test it within Azure AI Studio's chat feature. Next in our series, "How to Evaluate a PDF Chatbot Response with Prompt Flow", we will delve into methods for evaluating the performance of the chatbot we've just implemented.

Optimizing satellite image processing with pyvips

Ayaka Hara — Mon, 27 Nov 2023 06:03:45 +0000

Developing Payload Applications (PAs) for satellites involves a crucial step of preparing and processing test satellite images. In the wide array of image processing libraries available today, choosing the right one can significantly influence the efficiency and success of your project.

In this blog, we will explore why pyvips emerged as the best fit for processing satellite images, particularly for our needs, and conclude with a practical example using pyvips to process a JPEG2000 (jp2) format image.

Obtaining test satellite images

Before diving into image processing, let's briefly touch on how to acquire test satellite images. While this topic is extensively covered in another blog post, it's worth noting that there are several free resources available for obtaining satellite imagery, crucial for testing and development phases of PAs.

Comparing image processing libraries

When working with satellite images, it's important to choose an image processing library that fits your needs. Let's look at some popular libraries in a simple way and see how they compare:

Library	Pros	Cons	Suitable Use-case
OpenCV	Real-time processing, advanced tasks	Can be too complex for simple tasks, uses more resources	Image analysis and processing requiring real-time capabilities and advanced functionality
Pillow	Easy to use, good for basic tasks	Not great for very large or complex images	Basic image manipulation and simple tasks where ease of use is important
scikit-image	Scientific analysis, more complex tasks	Not as efficient for very large images	Scientific image analysis and more complex image processing tasks
pyvips	Large images, efficient processing	Lesser known, less community support	Handling very large images efficiently, especially in scenarios where performance is critical

Each library has things it's good at. For example, OpenCV is great if you need to do a lot of different things with your images, but it might be too much for simple tasks. pyvips is really good for big images like the ones from satellites, because it doesn't use a lot of computer memory and works fast. The one you choose depends on what you need for your project, like how big your images are and what you want to do with them.

Why pyvips for satellite images?

When processing satellite images, pyvips stands out as an excellent choice for a number of compelling reasons:

1. Efficient handling of large images

Satellite images are often very large, posing a significant challenge for many image processing libraries. pyvips is designed for low memory usage, even with large files. It cleverly manages memory by loading images in smaller parts rather than all at once, making it much more efficient for handling these massive images.

2. Speed

In image processing, speed is crucial, especially when dealing with large datasets like satellite images. pyvips processes images faster than many other libraries, an advantage that saves considerable time in both development and execution phases.

3. Support for various formats

pyvips supports a wide range of image formats, including JPEG2000, a popular choice in satellite imagery due to its efficient compression. This versatility is invaluable when working with different types of satellite data.

4. Low-memory reading

One of pyvips' standout features is its ability to read images using minimal memory. This is particularly useful for satellite images, as it allows for the processing of high-resolution images on machines with limited memory resources.

In summary, pyvips' ability to efficiently process large images, its speed, support for multiple formats, and low-memory image reading capabilities make it a top choice for satellite image processing. These features collectively ensure that pyvips is well-equipped to handle the challenges of satellite imagery.

Implementing pyvips – Sample Code

To work with JPEG2000 format images using pyvips, you need to install it by running the following command:

$ conda install --channel conda-forge pyvips

You can also refer to the README for more details.

Now, let's dive into some practical applications of pyvips with satellite images. Here is sample code demonstrating various operations:

Sample Code 1: Resizing and color space conversion of images

Satellite image preprocessing often involves resizing the captured image to a uniform size and converting its color space. In this example, we will demonstrate how to resize the image to 50% of its original size and perform a color space conversion from sRGB to CMYK.

import pyvips

# Load the image from a file
image = pyvips.Image.new_from_file('input.jp2', access='sequential')

# Resize the image
resized_image = image.resize(0.5)  # Resize to 50% of the original size

# Perform a color space conversion (e.g., from sRGB to CMYK)
cmyk_image = resized_image.colourspace("cmyk")

# Save the resized and color-converted image in JPEG2000 format
cmyk_image.jp2ksave('resized_cmyk_image.jp2', lossless=True)

pyvips.Image.new_from_file(): Load an image from a file.
image.resize(): Resize the image.
image.colourspace(): Perform a color space conversion.
image.jp2ksave(): Save the image in JPEG2000 format.

Sample Code 2: Rotation and monochrome conversion of images

Depending on the angle and orientation of satellite imagery, it might be necessary to rotate the image or convert it to monochrome for processing in PAs. This example will introduce how to rotate the image by 90 degrees and convert the rotated image to monochrome (black and white).

import pyvips

# Load the image from a file
image = pyvips.Image.new_from_file('input.jp2', access='sequential')

# Rotate the image by 90 degrees
rotated_image = image.rot90()

# Convert the image to monochrome
monochrome_image = rotated_image.colourspace("b-w")

# Save the rotated and monochrome image in JPEG2000 format
monochrome_image.jp2ksave('rotated_monochrome_image.jp2')

image.rot90(): Rotate the image by 90 degrees.

Sample Code 3: Image splitting

By splitting the image into smaller segments, we can process large images more effectively in machine learning models, potentially improving accuracy in tasks like image recognition. In this example, we will demonstrate how to calculate the division of the loaded image into a specified number of segments, in this case, 7 x 7, and how to save these divided images.

import pyvips

# Load the image from a file
image = pyvips.Image.new_from_file('input.jp2', access='sequential')

# Define the number of segments per axis (7x7 grid)
num_segments = 7

# Calculate segment dimensions
segment_width = image.width // num_segments
segment_height = image.height // num_segments

# Loop through the grid and save each segment
for row in range(num_segments):
    for col in range(num_segments):
        # Calculate the position of the current segment
        left = col * segment_width
        top = row * segment_height

        # Extract the segment
        segment = image.crop(left, top, segment_width, segment_height)

        # Save the segment
        segment.jp2ksave(f'segment_{row}_{col}.jp2')

image.width and image.height (Properties of pyvips Image Object): Retrieve the width and height of the loaded image.
image.crop: Extracts a specified portion of the image based on given coordinates and dimensions.

Sample Code 4: Adding an alpha channel

Due to the hardware of some satellites, the captured images may include a fourth dimension, such as an alpha channel. This example will show how to add an alpha channel to the loaded image and verify the change in the number of channels before and after processing.

import pyvips

# Load the image from a file
image = pyvips.Image.new_from_file('input.jp2', access='sequential')

# Print the current number of bands in the image
print(f"Original number of bands: {image.bands}")

# Add an alpha channel to the image
image_with_alpha = image.addalpha()

# Save the image with the alpha channel in JPEG2000 format
image_with_alpha.jp2ksave('image_with_alpha.jp2')

# Reload the saved image to verify the band count
pyvips.Image.new_from_file('image_with_alpha.jp2')

# Print the number of bands in the modified image
modified_image = pyvips.Image.new_from_file('image_with_alpha.jp2', access='sequential')
print(f"Modified number of bands: {modified_image.bands}")

image.bands (Property of pyvips Image Object): Returns the number of bands (channels) in the image.
image.addalpha(): Adds an alpha channel to the image for transparency.

These examples showcase the capability of pyvips to handle complex image processing tasks efficiently, making it a valuable tool for working with high-resolution satellite imagery.

Conclusion

When making payload applications for satellites, picking the right tool to process your images is really important. pyvips is a really useful tool for anyone working with satellite images, whether you're just starting out or have lots of experience.

Getting free satellite images for your own payload app development

Ayaka Hara — Mon, 27 Nov 2023 06:01:58 +0000

Creating payload applications for satellites means you need test images from space. But, getting these images can be really expensive. They can cost a few thousand dollars each, and even more if you need very detailed or special kinds of images – sometimes more than tens of thousands of dollars. This makes it hard for many people to buy these images, especially when they're just starting to develop their payload applications.

Also, these satellite images come in different formats, like JPEG 2000, PNG, and GeoTIFF, depending on who's giving them and what you need them for.

In this blog, I'm going to show you how to get JPEG 2000 and PNG images for free, which you can use for testing your own payload applications.

Copernicus Browser

If you're looking for JPEG2000 data, you might find it on Copernicus Data Space Ecosystem　　- Copernicus Browser.
Previously, it was possible to search via the Copernicus Open Access Hub, but its operations ended at the end of October 2023. Copernicus Sentinel data are now fully available in the Copernicus Data Space Ecosystem. This new service provides not only access to a wide range of Earth observation data and services but also offers new tools, a GUI, and APIs to support the exploration and analysis of satellite imagery.

The primary goal of the service is to ensure instant data availability to users. The full data archive acquired by the Copernicus Sentinel satellites will be available online and can be accessed in real-time.

License

The data will be available free of charge via designated quotes for individual use. Users that wish to build large scale operations can use practically unlimited resources available under commercial terms. (Ref - About the
Copernicus Data Space Ecosystem)

How to download data

1.Open Copernicus Browser in a new window or tab.

2.Register or log in.

3.Zoom in on somewhere you like on the map with the scroll wheel of your mouse.
4.Specify the data criteria via Search tab:

Select the satellite (as of Nov 2023, Sentinel-1, Sentinel-2, Sentinel-3, Sentinel-5P and Sentinel-6 are supported. Please refer to the documentation about data)

For instance, if you select Sentinel-2,

Select the MultiSpectral Instrument (MSI): L1C(Level-1C) and L2A(Level-2A). (Ref - Sentinel-2 MSI User Guide - Product Types)
Specify cloud cover percentage (e.g., 100% for all images, 50% for images with less than 50% cloud cover)
Select the Auxiliary Data File: AUX_GNSSRD, AUX_PROQUA and AUX_POEORB. Please refer to Sentinel-2 Precise Orbit Determination (POD) products.
Observation date/time range

Here is the comparison between L1C and L2A (Ref - Sentinel-2 MSI User Guide - Product Types)

Type	Code	Description	Users	Production & Distribution
User Product	Level-1C	Top-of-atmosphere reflectances in cartographic geometry	All Users	Systematic generation and online distribution
User Product	Level-2A	Atmospherically corrected Surface Reflectances in cartographic geometry	All Users	Systematic generation and online distribution

5.Select from the list or map.

6.Click on the download icon for the desired image.

If you would like to try more advanced search, please refer to the Product Search page.

Kaggle

If you're looking for PNG data, you might find it on Kaggle.
Kaggle is a platform primarily known for hosting data science competitions but offers much more, including a vast repository of datasets, a public data platform, and an online community for data scientists and machine learning practitioners. It was acquired by Google in 2017 and is part of the Google Cloud suite of services.

License

Regarding copyright of the data on Kaggle, it varies depending on the dataset. Each dataset on Kaggle comes with its own licensing terms, set by the dataset provider. These terms specify how the data can be used, and they can range from completely open and free for any use (like those under the Public Domain or Creative Commons licenses) to more restrictive terms that might limit use to non-commercial purposes or require attribution. It's important to note that just because data is accessible on Kaggle, it does not automatically mean it's free of copyright or restrictions on use.

How to download data

Here are some examples of Kaggle Dataset you might want to check.

Dataset	Image Format	Number of Data	License
Ship Detection from Aerial Images	PNG	621 (with ship)	CC0: Public Domain - No Copyright
Ships in Satellite Imagery	PNG (80x80 RGB images)	1000 (with ship), 3000 (without ship)	CC BY-SA 4.0

Need to purchase satellite images for testing?

Some might wonder if free satellite images are sufficient for testing purposes. However, upon conducting actual searches, you'll realize that finding images that meet specific and detailed criteria can be extremely challenging. For instance, let's say you need satellite images taken near a port, where several ships are visible on the sea, with a cloud cover of around 30%, and even airplanes in the sky. Searching for such images using the tools mentioned above can be a time-consuming task, or you might not find them at all.

As the time to deploy your payload application to the satellite approaches and you need to test it with more detailed test cases, you may need to consider purchasing expensive satellite images.

Conclusion

To wrap up, making payload applications for satellites doesn't have to be expensive when it comes to getting test images. This blog has shown you how you can easily get JPEG 2000 and PNG images without spending a lot of money. You can use the Copernicus Data Space Ecosystem to get a wide variety of free Sentinel satellite images, which is great for anyone needing fresh and different types of images. Also, Kaggle is a good place to find different datasets, including PNG images, where you often don't have to worry much about copyright because of flexible licensing.

These resources are a big help in cutting down costs for satellite images. They open up more chances for creative work in making payload applications, whether you're experienced or just beginning. With these platforms, you have access to lots of data for your projects, making it easier to work on satellite-based applications without worrying about high costs.

Reference

Deploying Minimum Viable Dataspace to Azure: A Step-by-Step Guide

Ayaka Hara — Wed, 19 Apr 2023 04:39:30 +0000

What is Minimum Viable Dataspace (MVD)?

The Minimum Viable Dataspace (MVD) is a sample implementation of a dataspace that leverages the Eclipse Dataspace Components (EDC). The main purpose is to demonstrate the capabilities of the EDC, make dataspace concepts tangible based on a specific implementation, and to serve as a starting point to implement a custom dataspace.

Since I will not explain what EDC is in this article, I recommend that those who want to learn about EDC first refer to the official Eclipse Foundation documents.

Azure deployment for MVD

In the past, there was an Azure deployment workflow on the MVD repository. However, it does not exist on the repo anymore.

There were some reasons for removing the deployment to Azure. Please refer to this GitHub issue if you want to know more details.

Some of you may still want to deploy MVD to Azure like me. In this article, how to achieve Azure deployment with the previous workflow will be illustrated.

Prerequisites
Initializing an Azure environment for CD
- 1. Create a service identity for GitHub Actions
  - 1-1. Create App Registration for GitHub Actions
  - 1-2. Configure main Branch Credentials
  - 1-3. Configure Pull Request Credentials
  - 1-4. Create a new application secret
  - 1-5. Grant Permissions for Azure Subscription
  - 1-6. Configure GitHub Secrets for GitHub Actions
- 2. Create Service Identity for MVD Runtimes
  - 2-1. Create App Registration for MVD Runtimes
  - 2-2. Create Client Secret for MVD Runtimes
  - 2-3. Get Application Object ID
  - 2-4. Configure GitHub Secrets for MVD Runtimes
- 3. Configure CD Settings
- 4. Deploying CD resources
Create a new dataspace instance
Ready to use MVD!
Conclusion

Prerequisites

Before proceeding to the actual preparation of the environment, we must first revert to the old commit where the Deploy pipeline for Azure existed.

A fork of the MVD repository
Back to a previous commit

Please run the following commands on your local to reset the code in your fork to a previous commit where the Deploy pipeline exists. The commit id should be 9362f36f09123d7ceca67b8239cae2b82bacca3a.

git log
git reset --hard (COMMIT-ID)
git push REMOTE-NAME LOCAL-BRANCH-NAME:REMOTE-BRANCH-NAME

If a previous commit is successfully backed you will find that there are 6 yaml files (cd.yaml, check.yaml, cloud-cd.yaml, deploy.yaml, destroy.yaml, and initialize.yaml) in the workflow folder.

Prepare an Azure subscription

A GitHub workflow then needs to be run to provision the Azure resources used for CD.

Initializing an Azure environment for CD

1. Create a service identity for GitHub Actions

Further documentation for the following steps can be found in the Microsoft Docs to Configure an app to trust a GitHub repo.

1-1. Create App Registration for GitHub Actions

Name: Provide a display name for the application, e.g. "MVD GitHub Actions App"
In Supported Account Types, select Accounts in this organizational directory only.
Don't enter anything for Redirect URI (optional).

Select Register to create the app registration.

Take note of the Application (client) ID (will be required to configure a GitHub secret below).

Next, we create two types of credentials: two federated credentials to authenticate GitHub Actions, and a client secret for Terraform (required as Terraform does not yet support Azure CLI login with a service principal).
Follow the instructions to Configure a federated identity credential for the main branch and Pull requests.

1-2. Configure main Branch Credentials

Select the previously created application (e.g. "MVD GitHub Actions App") and navigate to Certificates & secrets.

Select Federated credentials, click Add credential and define the following values.

For Federated credential scenario, select GitHub actions deploying Azure resources.
For Organization, add your GitHub organization. For example, your organization is YourGitHubOrg if the URL of your GitHub repository is https://github.com/YourGitHubOrg/MinimumViableDataspace.
For Entity Type, select Branch.
For GitHub branch name, enter main.
For Name, enter a name for the credential, e.g. "mvd-main-branch".

Click Add to create the credential.

Note: You can add additional credentials to deploy from other branches.

1-3. Configure Pull Request Credentials

Now, we set up a federated credential for pull requests, which allows to run a cloud deployment to validate pull requests.

Note: This step is only required if you plan to create pull requests for the MVD.

Click Add credential to add another credential (for the previously created application (e.g. "MVD GitHub Actions App"), under Certificates & secrets, Federated credentials).

For Federated credential scenario, select GitHub actions deploying Azure resources.
For Organization, add your GitHub organization. For example, your organization is YourGitHubOrg if the URL of your GitHub repository is https://github.com/YourGitHubOrg/MinimumViableDataspace.
For Entity Type, select Pull Request.
For Name, enter a name for the credential, e.g. "mvd-pull-requests".

Click Add to create the credential.

1-4. Create a new application secret

Create a client secret by following the section "Create a new application secret" in the page on Creating a an Azure AD application to access resources.

Take note of the client secret and keep it safe.

1-5. Grant Permissions for Azure Subscription

To allow GitHub Actions to deploy resources to your Azure subscription, grant the application created above Owner permissions on your Azure subscription.
Further documentation for the following steps can be found under Assign Azure roles using the Azure portal.

Navigate to your subscription you want to deploy the MVD resources to, select Access control (IAM) and click on Add role assignment.

On the Add role assignment page, select Owner and click Next.

Now click on Select members, search for the application created above (e.g. "MVD GitHub Actions App"), click Select and then click on Next and then on Review + assign to assign the application Owner permissions on your Azure subscription.

You need to enter the full application name when searching for the application, the application will not show up if you only enter a partial name (e.g. "MVD GitHub Act" in the example above).

1-6. Configure GitHub Secrets for GitHub Actions

Finally, the application (client) ID needs to be made available to your GitHub repository using GitHub secrets.

To configure GitHub Secrets, navigate to your MinimumViableDataspace repository, select Settings, navigate to Secrets and then Actions, and click New repository secret to create a new secret.

Configure the following GitHub secrets with the value from the steps above:

Secret name	Value
`ARM_CLIENT_ID`	The application (client) ID of the application created above (e.g. "MVD GitHub Actions App").
`ARM_CLIENT_SECRET`	The application client secret.

2. Create Service Identity for MVD Runtimes

Further documentation for the following steps can be found in the Microsoft Docs to Create and configure an Azure AD application for the application runtimes.

2-1. Create App Registration for MVD Runtimes

Name: Provide a display name for the application, e.g. "MVD Runtimes App"
In Supported Account Types, select Accounts in this organizational directory only.
Don't enter anything for Redirect URI (optional).

Select Register to create the app registration.

Take note of the Application (client) ID (will be required to configure a GitHub secret below).

2-2. Create Client Secret for MVD Runtimes

Navigate to Certificates & secrets and then to the Client secrets tab (for the previously created application (e.g. "MVD Runtimes App"), and select New client secret. Create a new client secret by entering a Description (e.g. "mvd-runtimes-app-client-secret") and clicking Add.

Take note of the client secret (Value) and keep it safe (will be required to configure a GitHub secret below).

2-3. Get Application Object ID

Navigate to Azure Active Directory and select Enterprise Applications.

Take note of the enterprise application Object ID.

Make sure you use the Object ID of the Enterprise application, and not the Object ID of the App Registration!

2-4. Configure GitHub Secrets for MVD Runtimes

Configure the following GitHub secrets with the values from the steps above:

Secret name	Value
`APP_CLIENT_ID`	The application (client) ID.
`APP_CLIENT_SECRET`	The application client secret.
`APP_OBJECT_ID`	The ID of the service principal object associated with this application.

See instructions under Configure GitHub Secrets for GitHub Actions on how to configure GitHub secrets.

3. Configure CD Settings

Configure the following GitHub secrets which are required by the CD pipeline:

Secret name	Value
`ARM_TENANT_ID`	The Azure Active Directory Tenant ID. Navigate to Azure Active Directory and copy the Tenant ID from the Overview page.
`ARM_SUBSCRIPTION_ID`	The Azure Subscription ID to deploy resources to. Navigate to Subscriptions and copy the Subscription ID of your subscription.
`COMMON_RESOURCE_GROUP`	The Azure resource group name to deploy common resources to, such as Azure Container Registry. Choose any valid resource group name, e.g. rg-mvd-common.
`COMMON_RESOURCE_GROUP_LOCATION`	The location where common resources should be deployed to, e.g. eastus.
`ACR_NAME`	The name of the Azure Container Registry to deploy. Use only lowercase letters and numbers.
`TERRAFORM_STATE_STORAGE_ACCOUNT`	The name of the storage account used to store the Terraform state container, e.g. mvdterraformstates.
`TERRAFORM_STATE_CONTAINER`	The name of the container used to store the Terraform state blob, e.g. mvdterraformstates.

4. Deploying CD resources

Common resources need to be deployed once, these resources will be used by all CD pipelines.

Manually run the Initialize CD GitHub Actions workflow and make sure that it passes successfully.

Since workflow_dispatch is already in deploy pipeline a ‘Run workflow’ button should be on the Actions tab, enabling you to easily trigger a run. (GitHub Actions: Manual triggers with workflow_dispatch
) If you cannot find the button like the screenshot below I would suggest you to try changing the file extension, from deploy.yaml to deploy.yml.

Your infrastructure is now set up to run deployments, you can now run the Deploy GitHub Actions workflow in the next step.

Create a new dataspace instance

Once your environment is set up, follow these steps to create a new dataspace instance:

Select the Deploy GitHub Actions workflow and provide your own resources name prefix. Please, use at most 3 characters, composed of lower case letters and numbers.

Click on Run workflow to manually run the Deploy GitHub Actions workflow.

Make sure that it passes successfully.

Ready to use MVD!

Once Deploy pipeline passes successfully then you can now access to MVD which just has been deployed with your own resources name prefix.

The Data Dashboard is a web application (development UI) on top of EDC's DataManagementAPI and is deployed for each participant. It can be accessed at the URLs provided in the GitHub workflow run page like below.
For instance, if the name prefix was dragon:

You will have three Data Dashboards for three participants: company 1, company 2 and company 3 as below.

company 1

company 2

company 3

If you want to demonstrate the possibilities of Dataspaces please refer to Vision Demonstrator document.

Conclusion

Azure deployment workflow has been removed from the MVD repo but it still can be deployed with the previous commit. Please be aware of that it will not be updated anymore.
Hope this step-by-step guide is helpful for some of you who want to try MVD on Azure.

How to configure Azure SQL Always Encrypted for Mac users

Ayaka Hara — Thu, 18 Aug 2022 00:26:00 +0000

Always Encrypted is a feature included in Azure SQL Server. Data is encrypted all the time, not only at rest but also in motion. Furthermore, the encryption keys which are essential for both encrypting and decrypting are not stored in the database.
For more information on Always Encrypted, please refer to the official documentation.

There are multiple ways to configure Always Encrypted.

Of the methods listed above, SSMS and Visual Studio Database Project are only available for Windows.
If you need to run on macOS or Linux, Azure Data Studio or Visual Studio Code is for you.
In this article, I'm going to explain how to configure Always Encrypted with Azure Data Studio on macOS.

Pre-requisites
1. Install the SQL Database Projects extension
2. Create new database project
3. Create keys
- 3.1 Create Column Master Key (CMK)
- 3.2 Create Column Encryption Key (CEK)
4. Create table with encrypted columns
5. Build/Publish with Data Studio
6. Confirm the result
How DACPAC looks like?
Conclusion
References

Pre-requisites

There are several combination patterns, but for this example we will use SQL database and Key Vault to store a master key. The following Azure resources will be used here.

SQL Server
SQL Database
Key Vault

In addition, if you haven't installed yet:

1. Install the SQL Database Projects extension

Firstly, install the SQL Database Projects extension in Azure Data Studio.
It is an Azure Data Studio and VS Code extension for developing SQL databases including for SQL Server, Azure SQL Database, and Azure SQL Managed Instance in a project-based development environment.
*This extension is still in preview. (as of 18th Aug 2022)

2. Create new database project

Now, let's create a database project using the extensions we have just installed.
Click Create new in Database Projects pane,

then select Azure SQL Database as type and give it a project name such as DB.

3. Create keys

Two keys are required for Always Encrypted.

Column Master Key (Store in Key Vault)
Column Encryption Key (Store in SQL database)

3.1 Create Column Master Key (CMK)

3.1.1 Create Column Master Key in Azure Key Vault

We can manually create CMK in Azure Key Vault via Azure portal or by running the following PowerShell script. (Ref: Azure Key Vault without Role Separation (Example))

# Create a column master key in Azure Key Vault.
Import-Module Az
Connect-AzAccount
$SubscriptionId = "<Azure SubscriptionId>"
$resourceGroup = "rg-ayhara-playground"
$azureLocation = "japaneast"
$akvName = "kv-ayhara-sample"
$akvKeyName = "CMKAuto1"
$azureCtx = Set-AzConteXt -SubscriptionId $SubscriptionId #Sets the context for the below cmdlets to the specified subscription.
New-AzResourceGroup -Name $resourceGroup -Location $azureLocation # Creates a new resource group - skip, if your desired group already exists.
New-AzKeyVault -VaultName $akvName -ResourceGroupName $resourceGroup -Location $azureLocation # Creates a new key vault - skip if your vault already exists.
Set-AzKeyVaultAccessPolicy -VaultName $akvName -ResourceGroupName $resourceGroup -PermissionsToKeys get, create, delete, list, wrapKey,unwrapKey, sign, verify -UserPrincipalName $azureCtx.Account
$akvKey = Add-AzKeyVaultKey -VaultName $akvName -Name $akvKeyName -Destination "Software"

Please confirm that the CMK has been created in the Key Vault as expected and copy the Key Identifier for use in the next step.

We need to make sure that the required permissions as well, i.e. get, create, delete, list, wrapKey,unwrapKey, sign, verify, are granted.

3.1.2 Set Key Vault information to master key

Let's go back to Data Studio.
Since the CMK and CEK templates in Visual Studio are not available in Data Studio unfortunately, we require to add script as item.

Click Add new item

Then put ColumnMasterKey in the field and press Enter to confirm.

The following Transact-SQL should be added.

CREATE COLUMN MASTER KEY [CMK_Auto1]
WITH (
     KEY_STORE_PROVIDER_NAME = N'AZURE_KEY_VAULT',
     KEY_PATH = N'https://kv-ayhara-sample-ado.vault.azure.net/keys/CMKAuto1/ecffa3fdcb2f432b9b0b8474770ade38'
);

3.2 Create Column Encryption Key (CEK)

Next, Column Encryption Key (CEK).
Let's creates the encrypted value of a column encryption key with New-SqlColumnEncryptionKeyEncryptedValue.
(Unfortunately, The New-SqlColumnEncryptionKeyEncryptedValue cmdlet is only available in PowerShell 5 which is available in Windows only. Please run the following commands to create the encrypted value of a column encryption key by using PowerShell 5 in Windows. - As of Aug 2022)

$cmkSettings = New-SqlAzureKeyVaultColumnMasterKeySettings -KeyUrl "https://kv-ayhara-sample-ado.vault.azure.net/keys/CMKAuto1/ecffa3fdcb2f432b9b0b8474770ade38"
$encryptedValue = New-SqlColumnEncryptionKeyEncryptedValue -TargetColumnMasterKeySettings $cmkSettings 
$encryptedValue | Set-Clipboard

As with CMK, we need to add script as item.

Then use the following Transact-SQL with the encrypted value which we just copied above.

CREATE COLUMN ENCRYPTION KEY [CEK_Auto1]
WITH VALUES
(
     COLUMN_MASTER_KEY = [CMK_Auto1],
     ALGORITHM = 'RSA_OAEP',
     ENCRYPTED_VALUE = 0x01B6000001........
);

4. Create table with encrypted columns

Let's say we creates a table named User, with columns named Name and Password, and that the password column is encrypted.

Click Add table

Then put User in the field and press Enter to confirm.

Paste the following Transact-SQL.

CREATE TABLE [dbo].[User]
(
    [Name] NVARCHAR(50) NOT NULL PRIMARY KEY,
    [Password] NVARCHAR(50) 
        ENCRYPTED WITH (COLUMN_ENCRYPTION_KEY = CEK_Auto1,  
        ENCRYPTION_TYPE = RANDOMIZED,  
        ALGORITHM = 'AEAD_AES_256_CBC_HMAC_SHA_256') NOT NULL
)

5. Build/Publish with Data Studio

After creating the Data Project, the first step is to build it.
Right-click on DB and select Build.

Once build succeeded now we're ready to publish it to database.
If the build fails, saving all files once and reopening the sqlproj file again in data studio may help.

Click Publish,

and select connection.

We need to put all required connection details here and click Connect.

If Server and Database are set as expected, then publish it.

Please confirm that 'Deploy dacpac succeeded' is displayed in the Window.

6. Confirm the result

Let's see if the database is configured as expected.
Go to Connection pane in Data Studio and click new query.

We can confirm it by running the following Transact-SQL.

SELECT * FROM sys.column_master_keys
SELECT * FROM sys.column_master_key_definitions
SELECT * FROM sys.column_encryption_keys
SELECT * FROM sys.column_encryption_key_values
SELECT encryption_type_desc FROM sys.all_columns WHERE name = 'Password'

Of course, we can also add data and retrieve plaintext values stored in encrypted columns.
Firstly, enabling Always Encrypted for a database connection.
Right-click on the server name and select Edit Connection.

Click Advanced Properties on the lower right,

Change Always Encrypted in the Security section to Enabled and press OK.

Next, enable Parameterization for Always Encrypted. (Ref - Parameterization for Always Encrypted)Parameterization for Always Encrypted is disabled by default.
Click on the Manage icon and open Settings.

Use the search box and set Enable Parameterization for Always Encrypted.

Finally, let's add the actual data and see the result. For instance, add sample data as follows.

DECLARE @password NVARCHAR(50) = 'password'
INSERT INTO [User] (Name, Password) VALUES ('Ayaka', @password)

SELECT * FROM [User]

How DACPAC looks like?

A data-tier application (DAC) is a logical database entity that defines all of the SQL Server objects - such as tables, views, and instance objects, including logins - associated with a user's database. A DAC is a self-contained unit of the entire database model and is portable in an artifact known as a DAC package, or .dacpac. Please refer to official document for more details.

We can export DACPAC with Data Studio.
Firstlly, SQL Server dacpac extension should be installed.

Go to Connections tab and click Data-tier Application Wizard.

Step 1 : Select Extract a data-tier application.

Step 2 : Select extract DACPAC settings.

Step 3 : If the contents of the Summary are as expected, click Extract.

When the dacpac file is created at the specified location, compress it by appending .zip after .dacpac.

Extract the zip file and open it in Editor such as Visual Studio Code. We can see that DACPAC consists of four files:

[Content_Types].xml
DacMetadata.xml
model.xml
Origin.xml

model.xml contains information on CMK, CEK, and columns which we just configured.

Conclusion

Always Encrypted is a great feature when Azure SQL Server is used. As I introduced above it can be configured by the Data Studio extension, which is currently in preview, even if we are using MacOS.

References

Query table data in Azure Data Explorer with Kusto to analyse load test results

Ayaka Hara — Thu, 19 Aug 2021 03:11:19 +0000

There are situations where we want to query Table data, such as analysing load test results.
Since the maximum number of entities* that are returned in a single query with LINQ Take operator is 1,000 (Ref - Returning the Top n Entities), you may need to code more to retrieve what you want.

*Entities are sets of properties and can be thought of like rows in a database.

LINQ has an upper limit of 1000, while Data Explorer with Kusto allows to query large numbers of entities.

In this article, how to query with Kusto in Azure Data Explorer (ADX) will be explained in particular.

Pre-requisites

First of all, you need to complete the steps to ingest data from Table Storage into Data Explorer via Data Factory to prepare to query large numbers of entities with Kusto.
Please refer to the post "Ingest data from Azure Table Storage into Data Explorer".

Once done with the steps above, you're ready to query data ingested from Azure Table Storage with Kusto.

Objective

In this article, I will use an example from the post "Cost comparison between Azure services to determine architecture" to illustrate how to query Table data with Kusto.

The purpose of querying the table data here is to make sure that one of the requirements, the processing time between device and storage is less than 10 seconds, is met.

Example table data

Let's see how an example table data looks like. It is assumed that there are around 1.2 million entities adding into a single table.

Table Name : telemetry202108180820

PartitionKey	RowKey	Timestamp	Data
992c9af9-b490-44cd-bf95-d9fa61bc3aa4	abcdefghigklmn	2021-08-18T10:33:25.355Z	{ "deviceId": "992c9af9-b490-44cd-bf95-d9fa61bc3aa4", "connectivity": "Online", "eventType": "Telemetry", "timestamp": "2021-08-18T19:12:08.1844379+09:00", "telemetry": { "6E8E2CE5-3A7D-4997-9056-297BAD62C601": "12345678901234567890123456789", "1023EF00-093C-4702-886F-6C9C8B4D3102": "12345678901234567890123456789", ... } }
c06145f6-7843-420c-ae80-fc52710198b5	abcdefghigklmn	2021-08-18T10:33:25.433Z	{ "deviceId": "c06145f6-7843-420c-ae80-fc52710198b5", "connectivity": "Online", "eventType": "Telemetry", "timestamp": "2021-08-18T19:12:08.1933468+09:00", "telemetry": { "6E8E2CE5-3A7D-4997-9056-297BAD62C601": "12345678901234567890123456789", "1023EF00-093C-4702-886F-6C9C8B4D3102": "12345678901234567890123456789", ... } }

timestamp in column Data is the time when telemetry message is sent from each device. Column Timestamp is the time when telemetry message is ingested into Table Storage after processing it with Function App.
These timestamp data will be used to calculate processing time between device and table storage.

Kusto queries to calculate processing time

Here is an example of Kusto queries to calculate processing time between device and table storage.

telemetry202108180820
| project data = parse_json(Data), ingestedTime = Timestamp
| project generatedTime = todatetime(data, timestamp), ingestedTime
| project diff = datetime_diff("Millisecond", ingestedTime, generatedTime)
| summarize avg(diff), max(diff), min(diff), percentiles(diff, 5, 90, 99)

What each step is doing will be explained as the following.

1. Reference to a table

telemetry202108180820

The statement starts with a reference to a table. In this article, the table 'telemetry202108180820' is being used as shown in the example table data section above.

2. Interpret column Data as a JSON

| project 
    data = parse_json(Data), // Interpret column Data as a JSON and rename it to data
    ingestedTime = Timestamp // Rename column Timestamp to ingestedTime

'project' is an operator to select the columns to include, rename or drop, and insert new computed columns.

In the example here, Data is a string and it needs to be interpreted as a JSON value to extract just some properties from the JSON column later.
Also, column Timestamp is renamed to ingestedTime to clarify the difference from another timestamp.

3. Convert timestamp in column Data to datetime

| project 
    generatedTime = todatetime(data, timestamp), // Convert timestamp in column data to datetime scalar and rename it to generatedTime
    ingestedTime // Include ingestedTime

'todatetime' is a function to convert input to datetime scalar. Timestamp in column data is renamed to generatedTime after converting with 'todatetime' function.
ingestedTime which was renamed above is also included again.

4. Calculates calendarian difference between two datetime values

| project 
    diff = datetime_diff("Millisecond", ingestedTime, generatedTime) // Calculates calendarian difference (millisecond) between two datetime values: ingestedTime and generatedTime, and rename it to diff

'datetime_diff' is a function to calculates calendarian difference between two datetime values.
By calculating the difference in millisecond between the ingestedTimeand (when telemetry message is ingested into Table Storage) and the geteratedTime (when telemetry message is sent from each device), the processing time can be obtained. The value is renamed to diff.

5. Produce a table with aggregation functions

| summarize 
    avg(diff), 
    max(diff), 
    min(diff), 
    percentiles(diff, 5, 50, 90, 99) // Produce a table including the average, maximum, minimum, percentile approximate of diff which is calendarian difference (millisecond) between ingestedTime and generatedTime

'summarize' is an operator to produce a table that aggregates the content of the input table.
As an example, the following four aggregation functions are used.

avg : Returns an average value across the group
max : Returns the maximum value across the group
min : Returns the minimum value across the group
percentiles : Returns the percentile approximate of the group

Result

Here is the result of the Kusto queries explained ealier. The average is about 5,023 milliseconds which means 5.023 seconds.

The result shows that the requirement which the processing time between device and storage should be less than 10 seconds.

Conclusion

This is just an example of how to query table data in Azure Data Explorer with Kusto to analyse load test results.
If you want to retrieve/query large numbers of entities from Table Storage, much more than 1000, one way to do it is to use Kusto instead of LINQ.

Reference

Cost comparison between Azure services to determine architecture

Ayaka Hara — Tue, 10 Aug 2021 05:54:32 +0000

Cost is an important factor to consider when developing a cloud-based solution. The main purpose of cost estimation is to determine the architecture and to predict the future costs.

First of all, let's take a look at the finalized architecture from the cost estimation results as well as project's specific requirements in the case we are going to use as an example.

In this article, I will explain how the cost estimation was done to determine this architecture, with actual data and requirements from a real project.

Note : The content of this article is as of August 2021. The cost was calculated using "Pay as you go" rate.

Requirements for cost estimation
- 1. Projected amount of data in the future
- 2. Data storage
- 3. Budget
Architecture options
Cost comparison
- 1. Where to send : IoT Hub vs Event Hubs
  - Selection criteria
  - How to calculate
  - Tips
- 2. Where to process : Stream Analytics vs Functions
  - Selection criteria
  - How to calculate
  - Tips
- 3. Where to persist : CosmosDB vs Table Storage
  - Selection criteria
  - How to calculate
  - Tips
- Other option : Data Explorer
Result summary
Importance of load testing
Conclusion
References

Requirements for cost estimation

1. Projected amount of data in the future

We perform cost estimation using projected scale information for the future, such as several years from now. The reason for this is that if we use the most recent data, scalability may not be able to be taken into account, or if the amount of data increases, the cost may greatly exceed the budget and the architecture may need to be reexamined.

In this article, the projected data is going to be used as an example for cost estimation.

The assumption in this example is that telemetry messages from 10 devices are consolidated into one array and it is sent to one connector, which then goes to Azure.

Actual telemetry data

Here is the example of how telemetry messages sent to Azure look like:

[
    { // device 1
        "deviceId": "ffee4208-eaca-4f7b-8882-fee956b3776a",
        "connectivity": "Online",
        "eventType": "Telemetry",
        "deviceTime": "2021-07-16T13:34:00.000Z",
        "connectorTime": "2021-07-16T13:34:00.000Z",
        "telemetryData": {
            "6E8E2CE5-3A7D-4997-9056-297BAD62C617": false, // data point - the max data size could be 0.05KB
            "1023EF00-093C-4702-886F-6C9C8B4D3169": 123,
            "46B219AF-E355-479D-B02C-274E09A38BDC": "1.234"
            ...
            ...
        }
    },
    { // device 2
        "deviceId": "44B5EF8A-0F54-4DBC-A343-58828892E2D2",
        "connectivity": "Online",
        "eventType": "Telemetry",
        "deviceTime": "2021-07-16T13:34:00.000Z",
        "connectorTime": "2021-07-16T13:34:00.000Z",
        "telemetryData": { 
            "2F91F10F-63BC-4E84-A72D-95BABB37C155": false, // data point - the max data size could be 0.05KB
            "CB7C127D-D3EB-475F-9FCB-0F721B582C58": 123,
            "4ADE8439-961D-493D-B306-D2567F87429A": "1.234"
            ...
            ...
        }
    },
    ...
]

As mentioned inline above, the data size of each data point could be up to 0.05KB (= 50Byte) and each device sends up to 50 data points (messages). In addition, messages from 10 devices are consolidated into one array.

This maximum size should be applied when cost is estimated.

Number of devices

10 devices are connected to a connector. The total number of devices will expect to be 2000 which means there will be 200 connectors.

Frequency

Telemetry messages for 10 devices are bundled together in connector units and sent every second.

2. Data storage

Keep data for a certain period of time

Telemetry messages (raw data) are required to be kept for a certain period of time in hot/warm storage without deleting them since creating summaries and querying them frequently are planned.

In this article, the cost will be calculated assuming that the data is retained for 48 hours as an example.

3. Budget

Compared with competitors

It is important to consider differences in both cost and functionality when comparing services. We modeled different usage scenarios representing differences in data volumes, frequency and data retention. Competing services' end-user pricing was used as a guide to determine the budget for our solution.

Architecture options

Now it's time to go through the cost comparison and decide on the architecture.

For comparison, the architecture of the Azure part will be divided into three major categories.

Where to send
Compare IoT Hub and Event Hubs to see where to receive per-connector telemetry messages.
Where to process
Compare Stream Analytics and Functions to see where telemetry messages sent in bulk by connector are decomposed to per device.
Where to persist
Compare Cosmos DB and Table Storage to see where to store per-device telemetry messages.

The architecture decision was not based on solely on cost but also on project's specific requirements. Those requirements were explained in each selection criteria below.

Cost comparison

This estimation will be based on the projected data volume and number of devices.

1. Where to send : IoT Hub vs Event Hubs

Selection criteria

A message size : 25KB
Frequency : telemetry message from 200 connectors sent every second

How to calculate

IoT Hub

Step 1 : Basic or Standard tier

Firstly you need to decide which tier you want to use.
In this project, Cloud-to-Device messaging is planning to be used in the future. Since Basic tier does not support that feature Standard tier is selected this time.

Please refer to the document - Choose the right IoT Hub tier for your solution for more detailed information.

Step 2 : Number of messages per day

Next, the amount of messages sent to the IoT Hub per day should be calculated.
As mentioned earlier, a message is sent every second per connector, which means 200 messages are sent every second. The formula is as shown below.

Step 3 : Number of billing messages per day

Then the amount of billing messages sent to the IoT Hub per day should be calculated.
The message meter size for both the Standard and Basic tiers is 4KB. (Ref - IoT Hub pricing)
As mentioned above, since messages from 10 devices are consolidated into one array a message size per connector is 25KB.
Based on the above information, the formula is as follows.

Step 4 : Edition Type, Number of units

There are three edition types under Standard tier : S1, S2, and S3. Each edition type has the limitation of total number of messages per day per IoT Hub unit. (Ref - IoT Hub pricing)

Based on the above information, the formula to calculate the number of units required respectively is as follows.

Event Hubs

Step 1 : Basic, Standard or Dedicated tier

Since the max retention period for the Basic tier is 1 day, Standard tier needs to be selected to retain data for 48 hours.

If Event Hubs events are retained for up to 90 days the Dedicated tier should be selected.

Step 2 : Ingress events

Since 200 connectors send telemetry messages every second, the number of ingress events per month can be calculated by multiplying the number of connectors by the number of seconds per month.

Step 3 : Throughput units

Then, the message size is multiplied by the number of connectors to calculate the ingress data size per second. In practice, in addition to the 25KB message size, the system message size and other factors need to be taken into account. Therefore, the total is calculated to be 6TU since 1TU is required for every 1000KB.

Step 4 (Optional) : Capture feature

The Azure Event Hubs Capture feature automatically processes and stores event data in your Azure storage account. The price is based on the number of Throughput Units selected for the Event Hubs. This time capture feature is not applied.
For more information, please see the pricing details page.

Tips

Tips 1 - IoT Hub : Increase in cost

The cost does not increase in proportion to the amount of data, but rather in a staircase pattern.

Tips 2 - IoT Hub: Number of connectors allowed in S3 and calculation method

The number of Connectors allowed in 1 unit of S3 is 555. The formula is as follows.

This means, of course, that the cost per connector will be lower when 555 connectors are used than when 200 connectors are used.

Tips 3 - Event Hubs: Consider the Auto-Inflate setting

Event Hubs traffic is controlled by TUs (standard tier). For the limits such as ingress and egress rates per TU, see Event Hubs quotas and limits.
Auto-inflate enables you to start small with the minimum required TUs you choose. The feature then scales automatically to the maximum limit of TUs you need, depending on the increase in your traffic. Auto-inflate provides the following benefits:

An efficient scaling mechanism to start small and scale up as you grow.
Automatically scale to the specified upper limit without throttling issues.
More control over scaling, because you control when and how much to scale.

Note : Auto-Inflate is a scale-up only feature. It will not automatically scale down.

More informations is available here.

Tips 4 - Event Hubs: Increase in cost

As with IoT Hub, the cost does not increase in proportion to the amount of data, but rather in a staircase pattern.

Tips 5 - Compare IoT Hub and Event Hubs

While the IoT Hub can manage devices and provide two-way communication (C2D, D2C), the Event Hubs can only provide one-way communication. Thus, when choosing Event Hubs, it is another option to use it together with S1 of IoT Hub and security should also be considered.

Please refer to Connecting IoT Devices to Azure: IoT Hub and Event Hubs.

2. Where to process : Stream Analytics vs Functions

Selection criteria

A message size : 25KB
Frequency : telemetry message from 200 connectors sent every second
Processing details :
- Decompose telemetry messages from per connector to per device (i.e. 200 messages from connectors to 2000 messages from devices)
- Save telemetry messages to specified storage dynamically
- The processing time between device and storage should be within 10 seconds

How to calculate

Stream Analytics

Step 1 : Standard or Dedicated plan

There are two plans: Standard and Dedicated.
For Dedicated plan, at least 36 streaming units (SUs) are required. Therefore, Standard plan is selected.

Please refer to Standard streaming unit section in the pricing page.

Step 2 : Number of streaming units (SUs)

Choosing how many SUs are required for a particular job depends on the partition configuration for the inputs and on the query defined for the job. You can select up to your quota in SUs for a job. By default, each Azure subscription has a quota of up to 500 SUs for all the analytics jobs in a specific region.
Valid values for SUs per job are 1, 3, 6, and up in increments of 6.

The keys to determining the appropriate SU from load testing are

SU% utilization should not greater than 80% (must be less than 80%)
Any backlogged input events should not be occurring (slowly increasing or non-zero) In this case, the workload may require more computing resources, and the number of units needs to be increased.

Please refer to Understand and adjust Streaming Units.

Load testing should be conducted to determine how many SUs are needed.

Let's see how we checked the metrics to determine the number of SUs required when we conducted our load tests. We performed load tests on 1, 3, 6, 12, and 24 SUs respectively.

Here are the metrics for the 3 SUs as an example.

As shown in the figure above, SU % utilization is 52%, which means that it meets the criteria of less than 80%.
Watermark delay is the time stream got out minus the time stream got in. 3 SUs had a maximum delay of 3.18 min.
Backlogged Input Event should be as close to 0 as possible. If not 0, it means that the number of SUs is not enough for the job. For 3 SUs, the backlogged input event was a large 5.95k, indicating that the processing was not able to keep up.

The table below summarizes the results for the other patterns.

As a result of the load testing in our case, we found out that 6 SUs are required for the Standard plan.

Additional required step : Consider combining services to save to a specified table name dynamically

Stream Analytics has limited flexibility in export destination and cannot dynamically save to a specified table. Therefore, other services need to be combined to achieve that. In our project, we selected Functions and conducted load testing in combination with Stream Analytics.

The cost of the Functions required to dynamically save to the specified table is shown in the figure below (The detailed costing method is described in the next section).

Just as explained above, if you choose Stream Analytics, you will need to combine it with Functions, which will cost you the sum of the costs of both services.

Functions

Step 1 : Consumption, Premium or App Service Plan

In our case, the system will potentially be scaled down or switched off periodically to reduce cost. Therefore, the consumption plan is not an option because it cannot be set to Always on and will result in a cold start.
Cold start is a term used to describe the phenomenon that applications which haven’t been used take longer to start up. In other words, a cold start is an increase in latency for Functions which haven’t been called recently. The Always on setting is available only on an App Service plan, which means that cold start isn’t really an issue.

As for the Premium plan, it can avoid cold starts with perpetually warm instances. (Ref: Azure Functions Premium plan)

In our case, the cold start issue in needs to be avoided since the system may be scaled down or switched off periodically to reduce cost.

Based on the above, we will compare the premium plan and app service plan in the next step.

Step 2 : Instance, Number of instances

Load testing should be conducted to determine which instance to use and how many instances are needed.

The following are some examples of points to check the metrics in Functions during load test execution.

Check if all the messages you sent are processed
Check if the CPU usage rate is not over 80%
Check if the memory is settled
Check that the total execution count matches the number of inputs to the functions as expected
Check if the average duration for processing in the functions is not too long

As a result of the load test, we found out that the Premium plan requires 6 EP2 instances, and the App Service plan requires 4 P1v3 instances.
The results of the cost estimation are as follows.

Aside from the cost, the Premium Plan was not able to process as stably as the App Service Plan even when it scaled out sufficiently.

Therefore, when selecting Functions, setting up 4 P1v3 instances of the App Service Plan was the optimal option for us.

Tips

Tips 1 - Stream Analytics : Limited flexibility for output destinations

For example, if you want to store your telemetry messages in appropriate tables created in 10-minute increments based on the timestamps contained in the telemetry messages, Stream Analytics alone will not be able to accomplish this.
If you have a requirement to dynamically specify the destination table like our case mentioned above, you will need to use a different service together, which will increase the cost.

3. Where to persist : CosmosDB vs Table Storage

Selection criteria

Keep data for 48 hours
Retrieve the latest n data
2 regions (Japan East / Japan West) for redundant failure

How to calculate

CosmosDB

I recommend using capacity planner to calculate the cost of Cosmos DB.

Step 1 : API

There are multiple choices: SQL API, Cassandra API, Gremlin API, Table API and Azure Cosmos DB API for MongoDB etc.

In this example, SQL API is selected.

If you are using API for MongoDB, see how to use capacity calculator with MongoDB article.

Step 2 : Number of regions

Azure Cosmos DB is available in all Azure regions. The number of regions required should be selected for your workload.

In this example, the requirement is for two regions, Japan East and Japan West, so enter 2 as the number of regions. The conditions will be aligned since we have selected GRS for the Functions described later.

Step 3 : Total data stored in transactional store

Total projected data stored(GB) in the transactional store in a single region.

The data will be stored for 48 hours and then deleted, which means that 48 hours of data will always be stored in the storage. Thus, the calculation formula is as follows.

Step 4 : Expected size of items/documents

The expected size of the data item (for example, document), ranging from 1 KB to 2 MB.

As mentioned above, the data size of a per-device telemetry message is 2.5KB.

However, you can only input data in units of 1KB into the capacity planner. Therefore, in this example, we will use 2KB.

Tiny advice : When manipulating the expected size of items/documents in the capacity planner, the key cursor can be used to change small values.

Step 5 : Number of Point reads/Creates/Updates/Deletes operations expected per second per region to calculate RU (Request Unit)

The calculation of RU (Request Unit) for Azure Cosmos DB is not as simple as 2k docs x 1000 = 2000RU.

I highly recommend to use capacity planner to calculate RU by imputing number of Point reads/Creates/Updates/Deletes operations expected per second per region.

The following figure shows the result of the cost estimation after inputting the above information into the capacity planner.

Table Storage

Step 1 : Redundancy

First of all, you need to select the best redundancy option.
There are 6 options :

Locally redundant storage (LRS)

Within a single physical location in the primary region, the data is copied three times synchronously.

Zone-redundant storage (ZRS)

Copy data synchronously between the three Azure Availability Zones in the primary region.

geo-redundant storage (GRS)

Replicate synchronously three times (at one physical location) in the primary region using local redundant storage (LRS), and then asynchronously to the secondary region.

read-access geo-redundant storage (RA-GRS)

In addition to geo-redundant storage (GRS), you have read access to data located in a secondary region. If the primary becomes unavailable, you can read the data from the secondary.
RA-GRS is more expensive to use than GRS, but avoids data read downtime while the primary region is unavailable and a failover to the secondary region is performed.

geo-zone-redundant storage (GZRS)

Replicate synchronously between the three Azure Availability Zones in the primary region using Zone Redundant Storage (ZRS), and then asynchronously to the secondary region.

read-access geo-zone-redundant storage (RA-GZRS)

In addition to geo-zone-redundant storage (GZRS), you have read access to data located in a secondary region. If the primary becomes unavailable, you can read the data from the secondary.
Although the cost of using RA-GZRS is higher than GZRS, it is recommended to use RA-GZRS when even a small amount of downtime due to failover is not acceptable.

You can find out more about each redundancy option here.

In our case, we chose GRS, an option that allows data redundancy in another region hundreds of kilometers away geographically, and is more available, and sustainable than LRS or ZRS.

Step 2 : Storage capacity in GB per month

The data will be stored for 48 hours and then deleted, which means that 48 hours of data will always be stored in the storage. Thus, the calculation formula is as follows.

Step 3 : Storage transactions

$0.00036 per 10,000 transactions for tables will be charged. (Ref - Table Storage pricing)
Any type of operation against the storage is counted as a transaction, including reads, writes, and deletes.

Since we plan to delete the entire table at once, the number of delete operations is very small. Therefore, only write operations are counted here.

Tips

Tips 1 - Table Storage : Easy to retrieve the n entities most recently added

If you consider only the cost, Blob Storage is cheaper than Table Storage. The cost of Blob Storage (Standard/48 hours in hot/GRS) is as follows.

However, as mentioned in the selection criteria, there is a requirement to retrieve the latest n data, and Blob Storage, which is not searchable, is not suitable for this requirement.

Please see more details about log tail pattern and the solution.

Tips 2 - Table Storage : Easy to delete a table instead of entities

As mentioned in the selection criteria, telemetry data will be stored for 48 hours and then deleted.

Another advantages of using Table Storage is that it allows you to delete a table from the database, instead of deleting it by entity. In other words, the cost of the operation is lower than the cost of deleting them entity by entity because it can be deleted by table.

Please refer to the document about deleting table.

Other option : Data Explorer

Data Explorer was also considered as an option that combines both where to process and where to persist.

I recommend using Azure Data Explorer (Kusto) Cost Estimator to calculate the cost of Data Explorer.
The data collected per day is 2160000 KB (i.e. 0.00216TB). However, since the estimator does not allow to be entered less than 0.01TB, 0.01TB was entered.

The result of the cost estimation shown below is the minimum cost of Data Explorer (when data is retained in hot for 48 hours) without any load testing.

At first glance, Data Explorer may seem low cost, but again, it is the minimum cost to prepare two E2A_v4, the smallest machine. (Ref - Azure Data Explorer pricing)

Moreover, in our case, the cost estimate was made considering future scalability. However, it should also be noted that this cost will be incurred even when the amount of data is not much larger than this estimate.

With Data Explorer, the more data you have, the greater the cost benefit. Also,
since the powerful analysis function is one of the most attractive features, choosing Data Explorer should be considered appropriately depending on the requirements, such as analyzing the stored data.

Result summary

So far we have detailed the cost estimation for each service to determine the architecture.
The data used in the cost estimation was the projected data.

Let's go over the requirements and potential services for each part of the overall architecture again.

Where to send

Event Hubs resulted in significantly lower costs than IoT Hub.
However, as mentioned earlier, while the IoT Hub can manage devices and provide two-way communication (C2D, D2C), the Event Hubs can only provide one-way communication. Thus, when choosing Event Hubs, it is another option to use it together with S1 of IoT Hub and security should also be considered.

Where to process

Since Stream Analytics has limited flexibility in output destination, it needs to be combined with other services such as Functions in order to dynamically store the data in the specified table storage, which is more expensive than Functions alone.

In addition, there is a requirement that the processing time between device and storage should be less than 10 seconds, so performance must be checked in the load test to determine the appropriate number of throughput units and instances.

Where to persist

Table Storage results in significantly lower cost than Cosmos DB.
The reason for this is that with Cosmos DB, as the number of operations (Point reads, Creates, Updates, Deletes) increases, the RU increases and the cost becomes higher.

Also, 2 regions was one of the selection criteria this time, which made the cost higher. If only one region is applied, the cost of Cosmos DB is simply halved, but Table Storage is still cheaper.

Furthermore, the fact that Table Storage can be used to retrieve the latest n data was also a big advantage in this scenario.

Estimated cost of the determined architecture

Based on the above cost comparison as well as project's specific requirements, the final architecture we decided on is shown in the figure below.

Here is the total cost of Azure for this architecture.

The architecture fits neatly into the budget of $8 per connector.
The processing time between device and table storage also cleared the requirement of 10 seconds or less, and the average processing time was 3.934 seconds by querying 20 minutes of data.The method of how to calculate the processing time will be written in another article.

Importance of load testing

While some calculations can be done theoretically based on the amount of data etc, load testing was necessary to estimate the cost of Stream Analytics and Functions and measure the processing time between device and storage.

In our case, we conducted load tests using the IoT telemetry simulator. Just as the maximum size of telemetry messages sent by 10 devices is applied in the cost estimation, the message to be sent using the simulator was also made to be the maximum size to put an assumed load on it (i.e. 25KB).

Conclusion

Cost is an unavoidable issue in developing a cloud-based solution.

Cost estimation has advantages beyond understanding the cost, such as finding the best method with a limited budget, having elements to beat the competition, and considering the architecture with future scalability.

Although this might be a slightly complicated task, it is recommended to do a cost estimation when considering the architecture.
Hope this article will be helpful for you to understand how to determine the architecture from the cost estimation.

References

Ingest data from Azure Table Storage into Data Explorer

Ayaka Hara — Fri, 23 Apr 2021 12:16:22 +0000

LINQ allows us to query multiple entities from Azure Table Storage. However, the maximum number of entities that are returned in a single query with LINQ Take operator is 1,000 (MS doc) and you may need to code more to retrieve what you want.

This blog illustrates how to ingest data from Table Storage into Data Explorer via Data Factory to prepare to query large numbers of entities with Kusto.

Step 1: Create Azure Data Explorer

Create Azure Data Explorer Here is an example setting:
Go to the resource and click "Create database"
Create Azure Data Explorer Database (e.g. Database name: loadtest)
Click the database (e.g. "loadtest") and select "Query"
Click "Ingest new data"
Create table in the database and ingest data Note: table name should be not including "-(dash)"
Click "Edit schema"
Select "Ignore the first record"
Make sure all data type are correct
Click "Start ingestion" Note: Please copy the mapping name

Step 2: Create Azure Data Factory

Create Azure Data Factory

Step 3: Prepare Azure Active Directory

Go to Azure Active Directory
Click "App registrations" and register an application
Go to "Certificates & secrets" and add a client secret Note: Please don't forget to copy the secret
Go to Azure Data Explorer and click "Permissions"
Add the service principal just created

Step 4: Set Azure Data Factory to copy data from Azure Table Storage

Step 4-1: Create a base pipeline on Azure Data Factory

Create Data Factory (You can select "Configure Git later")
Go to resource
Click "Author & Monitor"
Click "Author (pencil icon)"
Click "Add new resource (plus icon)" and select "Pipeline"
Click "Move & transform" and drag "Copy data" to the right pane
Set General in the bottom pane

Step 4-2: Set up input data (Source) from Table Storage - Source

Click "Add new resource (plus icon)" again and select "Dataset"
Search "Azure Table Storage" and click "Continue"
Click "New" in the bottom pane and set the linked service (Table Storage)
- Select the table storage from Azure subscription or Enter manually
- Click "Test connection"
- Click "Create"
Select the table you want to copy from pulldown list and update the dataset name in the right pane
Back to pipeline setting and select the dataset on the "Source" section in the bottom pane

Step 4-3: Set output data (Sink) to Data Explorer

Click "Add new resource (plus icon)" again and select "Dataset"
Search "Azure Data Explorer" and click "Continue"
Click "New" in the bottom pane and set the linked service (Data Explorer)
- Select the data explorer cluster from Azure subscription or Enter manually
- Put your Service principal Id (= e.g. Application (client) ID of "sp-adf-ayhara-loadtest") and the client secret which you copied earlier
- Select Data Explorer database
- Click "Test connection"
- Click "Create"
Select the table as destination from pulldown list and update the dataset name in the right pane
Back to pipeline setting and select the dataset, table, and ingestion mapping name on the "Sink" section in the bottom pane

Step 4-4: Set mapping

Click import schemas
Check if the mapping is correct

Step 5: Ingest data from Table Storage into Data Explorer database

Click "Debug"
Click "Details (glasses icon)"if you want to see the progress
Once successfully data is copied:
Go to Data Explorer and try to query something
Back to Data Factory and click "Validate All"
Publish all if there is no error!

Now, you're ready to query data ingested from Azure Table Storage with Kusto.

Next step - Query table data in Azure Data Explorer with Kusto to analyse load test results

DEV Community: Ayaka Hara

How to Deploy a PDF Chatbot as a REST Endpoint and Test with Postman

Table of Contents

Prerequisites

1. Modify a flow

1-1. Replace with vector db lookup

1-2. Delete an unnecessary node

1-3. Change the value of search result node

1-4. Try chat

2. Deploy a flow

2-1. Deploy a flow from Prompt Flow tab

2-2. Test the deployed endpoint

3. Test the endpoint with Postman

3-1. Copy Rest endpoint and Primary key

3-2. Configure on Postman

3-3. Test the endpoint

Conclusion

How to Evaluate a PDF Chatbot Response with Prompt Flow

Table of Contents

Prerequisites

1. Prepare test data

Add test data on Azure AI Studio

3. Evaluate a flow

4. Check the evaluation result

Conclusion

How to Easily Build a PDF Chatbot with RAG (Retrieval-Augmented Generation) Using Azure AI Studio's Prompt Flow

Table of Contents

Prerequisites

1. Create a project on Azure AI Studio

1-1. Build your own copilot

1-2. Configure project details

1-3. Create an Azure AI resource for your projects

1-4. Review and create a project

2. Deploy Azure OpenAI models

2-1. Create a new deployment

2-2. Select a model

2-3. Deploy model

3. Create an index on Azure AI Search

3-1. Select your dataset

3-2. Configure index storage

3-3. Configure search settings

3-4. Configure index settings

3-5. Review and create an index

4. Configure Prompt Flow

4-1. Create runtime

4-2. Configure parameters in each node

4-3. Save all configuration changes

5. Try Chat!

Conclusion

Optimizing satellite image processing with pyvips

Obtaining test satellite images

Comparing image processing libraries

Why pyvips for satellite images?

1. Efficient handling of large images

2. Speed

3. Support for various formats

4. Low-memory reading

Implementing pyvips – Sample Code

Sample Code 1: Resizing and color space conversion of images

Sample Code 2: Rotation and monochrome conversion of images

Sample Code 3: Image splitting

Sample Code 4: Adding an alpha channel

Conclusion

Getting free satellite images for your own payload app development

Copernicus Browser

License

How to download data

Kaggle

License

How to download data

Need to purchase satellite images for testing?

Conclusion

Reference

Deploying Minimum Viable Dataspace to Azure: A Step-by-Step Guide

What is Minimum Viable Dataspace (MVD)?

Azure deployment for MVD

Table of Contents

Prerequisites

Initializing an Azure environment for CD

1. Create a service identity for GitHub Actions