DEV Community

Cover image for Managing Datasets with Azure ML Studio
Mehran Davoudi
Mehran Davoudi

Posted on

Managing Datasets with Azure ML Studio

If you are working on a Machine Learning project, you know how important it is to explore and transform your datasets. You want to get a good sense of what your data looks like and how you can improve it for your model. That’s why Azure Machine Learning Studio is a powerful tool that lets you manage different versions of your datasets with ease. In this post, I will show you some of the cool features of Azure Machine Learning Studio that can help you with your data analysis and preparation.

This picture shows you how to find or create your datasets and how to manage different versions of them.

Azure Machine Learning Studio Datasets

When you select a dataset, you can see its details and some useful features:

  • Tags: You can use tags to organize and categorize your datasets according to your team’s needs. For example, I created a MelkRadar State tag with the value of Final to filter it later.
    • Markdown Description: You can write anything in Markdown in the description field. This is very handy for documenting your data profiling and analysis in a structured and clear way. You can set up a standard format for your team to follow in the description.

Azure Machine Learning Studio Dataset details

As you see in the tabs, you can see which models or jobs have used this dataset in the Models and Jobs tabs.

Azure Machine Learning Studio Dataset Action Bar

You can also create a new version of the dataset or archive it from this page.

But the feature that I find most impressive is that you can profile the dataset and see the results attached to that version right here.

By profiling a dataset, you can gain insights and clarity from the vivid diagrams.

Azure Machine Learning Studio Dataset Profile

Conclusion

Azure Machine Learning Studio offers a lot of benefits for working with datasets, as you can see from this blog. Our MelkRadar AI team uses it to manage our datasets and collaborate more effectively.

Top comments (0)