DEV Community

Cover image for Machine Learning for Beginners
Justin Goldstein
Justin Goldstein

Posted on • Edited on

Machine Learning for Beginners

Machine Learning includes computer programs that improve a prediction function (called a model) using a dataset, clustering data points, and detecting anomalies. For each of these use cases, you choose which algorithm you want to implement. If you have chosen a prediction model, you must decide if it will have many inputs or few, an output that is like 0 or 1 or all the values from 0 to 1 , and how many outputs it has. If you are clustering data points or detecting anomalies, there are several other ways to adjust your model. All of this is to say that although your machine is "learning," you have to do a lot of the work yourself. This article will walk through the basic types of Machine Learning, common algorithms associated with them, and conclude by describing the tools developers and data scientists use to implement them.

Machine Learning can be broken down into supervised learning and unsupervised learning. There is also reinforcement learning, which I will not touch on here because it is likely too complex to use when you are getting started. Supervised learning enables you to create a model from labeled data, which means that the given inputs you use to train your model have been assigned the correct outputs by humans beforehand. Examples of supervised learning include single or multi-class classification and regression. Neural networks are a type of supervised learning algorithm that perform single or multi-class classification. They are essentially a very complex model that can be represented like a network.

Unsupervised learning enables you to create a model from unlabeled data. Examples
include K-means clustering and principal component analysis. These can be used for customer segmentation and increasing the speed of fitting a model while maintaining its performance. Unsupervised learning typically finds the relationships between data points and categorizes them. In my experience, supervised learning is more common because it is more versatile and can often do the same things as unsupervised learning. However, unsupervised learning does not require labeled data, which is more difficult to acquire.

There are various tools that developers and data scientists use to implement Machine Learning algorithms. Often, engineers will use python with libraries such as scikit-learn, TensorFlow, and Keras to build their models, and rely on pandas and numpy to work with their data. There are also a suite of automated Machine Learning tools which further abstract away the nitty gritty. The most popular of these are Google Cloud AutoML, Amazon SageMaker, and Microsoft Azure AutoML. Almost every tool you will find requires you to have some knowledge of the intuition behind Machine Learning. Tools like SageMaker can be really difficult to implement because they require you to understand how to improve your model on your own, which takes time. If you are looking for an end-to-end platform that takes away this hassle, check out Telepath AI’s AutoML tools. In the field of data science, tools such as JupyterLab, Jupyter Notebook, and Anaconda are widely used and they are often prerequisites for tutorials online. If your aim is to build models yourself, you might also consider using a high level language such as Octave/MATLAB because this allows you to get started implementing algorithms much more quickly.

If your goal is to learn about Machine Learning in considerable depth, I strongly recommend taking an online class on Coursera because Machine Learning can seem like a daunting field to explore, and much of the literature online is uninformed. If you are at the beginning stage of your Machine Learning journey, I hope this article acquainted you with the areas of interest within this field and exposed you to some of the tools that will be using along the way.

Top comments (0)