DEV Community: Ogunbiyi Ibrahim

Feature Engineering: The Transformation phase (How to Encode Categorical Features)

Ogunbiyi Ibrahim — Sun, 15 May 2022 11:12:01 +0000

Introduction

When developing a predictive model, one of the most important phases is feature engineering. Its major objective is to help our model understand some of the underlying features in our datasets so that our model can predict on those features. Remember that machine learning models operate best with numerical values because they are based on mathematical equations, therefore feeding the model with values other than numerical values may be daunting for our model. So, through feature engineering, we can assist our model in understanding specific features in our datasets.

I'll be writing a series of posts in which I'll discuss several types of feature engineering. This is the first post of the series.

To give you a quick rundown of what I'll be covering in this post. This is a summary of the posts.

What is Feature Engineering?
The Categories of Feature Engineering
What is a Categorical Feature?
What are the type of Categorical Feature
How do you encode a Categorical feature and what are the use cases that will allow you to use a particular encoding technique.

What is Feature Engineering?

In order for me to clarify what feature engineering entails. I'd like to start with this Wikipedia definition because it gave me an intuition of what Feature Engineering is all about.

According to Wikipedia, feature engineering is the process of extracting features from raw data using domain knowledge . The purpose for this is to leverage the extra features to improve the quality of a machine learning process rather than providing merely raw data to the machine learning process.

As we all know, for ML models to perform well, the data must be in an appropriate format (mostly numerical values), and in order for us to do that, there must be domain knowledge or expertise that will guide us as to how we want to engineer these features so that after we are finished with the engineering, the new features will still portray what the older ones are all about.

Consider the following scenario: you have a categorical feature that you want to encode in your dataset. As a data scientist, you must have domain knowledge or seek expertise so that when you encode a variable using a specific technique, how will it effect the data and will the encoding technique still convey what the data is all about?

So my take on feature engineering will be that: In layman's words, feature engineering is the process of selecting, extracting, manipulating, and transforming data using domain expertise so that the resulting features still reflect what the data is all about and are easily understood by the model.

Feature engineering can be a difficult process to do because it involves expertise knowledge and takes a long time. However, it is critical to concentrate on. When I think of feature engineering, I always think of these two quotes from two of the data space's most famous figures.

Coming with features is difficult, time consuming, requires expert knowledge. Applied Machine learning is basically feature engineering — Prof. Andrew NG.

The feature you use influences more than everything else the result. No algorithm alone, to my knowledge, can supplement the information gain given by correct feature engineering — Luca Massaron

Categories of Feature Engineering

Feature Engineering is divided into 4 different categories which include:

Feature Transformation.
Feature Extraction.
Feature Creation.
Feature Selection.

As our topic suggests in this post, we will discuss feature transformation; other categories will be discussed in later series.

Feature Transformation

The process of turning raw features into a format suitable for model construction is known as feature transformation. Our model will be able to understand what each feature represents thanks to feature transformation. Feature transformation includes things like numerical transformation and scaling, categorical variable encoding, and so on. We will talk about numerical transformation and scaling in the next series.

What are Categorical Variables/Features?

Categorical features represent data that may be classified into groups. Colors, animals, and other things could be represented by them. Examples include race, color, animal, and so on.

Type of Categorical Features

Categorical variables can be divided into 3 categories which include:

Binary Values: They often depict two quantities. They are regarded as either/or values. Examples include yes or no, True or False.
Nominal Values: In the real world, this type of category feature does not depict order. When you try to order them, nothing makes sense. Example include (Colors: black, blue, and red), (Animal: cat, dog, fox), etc. We can see that each element represents a separate unit, therefore attempting to organize them makes no sense.
Ordinal Values: In the real world, this type of data can be ordered. For example, (cold, hot, very hot), (low, medium, high), grade score(A, B, C) and so on.

How to Encode Categorical Features

We can encode our category features using a variety of encoding techniques. We'll go through the most common one you'll come across on a daily basis. However, encoding binary and ordinal categorical features is simple and does not need much effort, whereas encoding nominal data can be difficult and sometimes requires domain expertise when working with them. We'll go over how to encode all of these categorical features.

Note: For encoding categorical features I will be using a library in python called category_encoders if you don't have it installed you can use the following below code

#For Pip install
pip install category_encoders

#For conda environment 
conda install -c conda-forge category_encoders

Encoding Binary Values

It is not difficult to encode binary categorical variables. Because binary categorical variables like True or False can be related to the value of a logic gate (0's and 1's). We may simply accomplish this with a Python function or method. True values will be treated as 1s, while false values will be treated as 0s. It also works for Yes and No. We can use either of these codes to achieve this.

#converting the first feature win1 using the replace method
df["win1"] = df["win1"].replace({"N":0, "Y":1})

#converting the second feature win2 using an anonymous function
df["win2"] = df["win2"].apply(lambda x: 1 if x=="T" else (0 if x == "F" else None))

Encoding Ordinal Values

Encoding ordinal values is also a pretty task to perform since they those values are ordered i.e. they can be ranked it is easier to get straight with them. Consider the following scenario. You have an ordinal value that ranges from extremely weak to weak to strong. Because extremely weak is the lowest value and strong is the highest, we can encode this value in the logical order of 1, 2, 3. 1 represents very weak, 2 represents weak, and 3 represents strong. Now that we've understood that for us to do this in python we type.

from category_encoders import OrdinalEncoder
transformer = OrdinalEncoder(mapping= [{'col': 'ordinal_data',
                            'mapping': {"strong": 3, 'weak': 2, 'extremely weak': 1}}])

df["ordinal_data"] = transformer.fit_transform(df["ordinal_data"])

You must provide the column that you want to encode in the mapping parameter (i.e. the one with the col key in the code above), otherwise it will encode any categorical feature it sees. Also, the mapping key tells the encoder how to rank the variable, because if you don't supply it, the encoder will choose the first value as 1. For example, in our scenario, if we don't provide the first value, it will be encoded as 1, and the other values will be encoded as they occur.

Encoding Nominal Data

As I previously stated, encoding nominal values can be tricky because it necessitates knowledge of how to encode them. Some of the most common methods of encoding nominal data are as follows.

One-Hot Encoding

Consider a column with three values: black, red, and blue to better comprehend one hot encoding. One-hot encoding will take each categorical value in that column and convert it into a new column, assigning the values 1 and 0 to them. Let's look at an example to better grasp what's going on.

from category_encoders import OneHotEncoder
transformer = OneHotEncoder(cols="one_hot_data", use_cat_names=True)
df = transformer.fit_transform(df)

As seen in the preceding figure, new columns were constructed from those values, and whenever each value occurs in that row, it will be allocated 1 while the others will be assigned 0. We can see in the previous image that black was the first value that occurred in the row, and in the second image that it was allocated 1 while other values were assigned 0 because only black occurred in the row. Similarly, blue occurred in the second row, therefore in the new image blue was assigned 1 only then black and blue column were assigned 0.

Frequency Encoding

It a very simpler technique. What is does is that it counts the number of times a nominal value occurs. Then it then maps the frequency back to the column. Let's go over an example so we understand what's going on.

from category_encoders import CountEncoder
transformer = CountEncoder(cols="one_hot_data")
df = transformer.fit_transform(df)

We can see that in the previous figure black occurred 2 times and red occurred 3 times. So Frequency encoder assigns the frequency at which those value occur to the column back. This technique is very efficient. You can also add a normalize parameter if you wish to have the relative frequency of each value. Something like this

from category_encoders import CountEncoder
transformer = CountEncoder(cols="one_hot_data", normalize=True)
df = transformer.fit_transform(df)

Note: You might be thinking when to use the relative frequency or the Frequency only. Consider for instance you have a column with high cardinality. It will be efficient for you to make use of the relative frequency rather than the normal frequency.

When To Use OneHot Or Frequency Encoding.

If the number of unique nominal values in your column is small, you can use OneHot encoding. However, if they are numerous, i.e. there is a high cardinality, utilize frequency or relative frequency. However, I prefer the relative frequency😀.

Conclusion and learning More.

Please keep in mind that there are other encoding techniques available, each with its own set of applications. If you want to learn more about them, I will employ you to investigate more online . However, the encodings listed above are some of the more typical ones you will see on a daily basis. Thank you for taking the time to read this. I'll be releasing my second series soon, which will cover numerical transformation and scaling. Watch out for it 🚀🚀.

Intuitive Introduction to Logistic Regression (Understanding the mathematics behind the model)

Ogunbiyi Ibrahim — Sun, 24 Apr 2022 08:42:50 +0000

Introduction

I've been working with Logistic Regression to fit my data and make good predictions as a beginner. But as soon as I'm done with that I feel empty and the reason for that was simply because I was performing the same task iteratively (i.e. Just fitting and predicting which could get boring because of not understanding what's going on behind the scene). I've always thought of how is the model able to perform the predictions.

But then I sat down one day and studied how the Logistic regression can perform its prediction. Funny enough, I was able to have a little knowledge of it, so I will be sharing my knowledge on what's learned so far. Relax and feel free because it's going to be interesting. Heads up! thanks in advance.

What is Logistic Regression?

I will like to use the definition of Aurelian Geron which he defined in his book which says:

Logistic Regression (also called Logit Regression) is commonly used to estimate the probability that an instance belongs to a particular class.

To better understand the above definition. I love to use this analogy to explain. Consider for instance if a doctor was observing a patient record to know if the patient has a headache or not. Given all those records the doctor can predict that the patient has a headache (this is usually referred to as the positive class and it is labeled 1) or the patient does not have a headache (negative class usually labeled with 0). This is how Logistic Regression works as well. If the estimated probability (i.e. the chance of the headache occurring) is greater than 50% or 0.5 then the model predicts 1 (positive class) or else 0 (the negative class).

Please note that Logistic Regression is not based only on predicting discrete values (i.e. 0 and 1 or whether something is this or not) usually referred to as binary classification. You can also use it to predict more than 2 labels. For instance, you wanted to build a model that can predict whether an image is a dog, cat, or wolf. Logistic Regression can also do that. This is referred to as Multinomial Classification.

How Does Logistic Regression Makes its Prediction?

Hmm, now we've gotten to the part where I and most people are curious about. To comprehend how it makes its prediction I will go through the formula step by step. I will be using a binary classification to explain and it will be based on one input variable(or predictor variable) in this article with that we can grasp how it predicts more input variables and also the Multinomial Classification.

Consider, for instance, we want to predict whether an individual will go bankrupt given his credit card balance.

In simple mathematical terms, this is how Logistic Regression does its prediction.

Pr(individual=bankruptcy|balance)
The above expression can be interpreted as the probability of an individual going into bankruptcy given his balance. The pipe symbol | is used to denote given.

However, that doesn't stop there. The above expression is just a way of how a lazy statistician will explain the formula of Logistic Regression.

Going deep the above expression miraculously transforms into this

Now you might be thinking what the heck is this. Let's unpack what we have above.

To better understand the above equation let's look at another equation that we might be familiar with.

The above equation is the most popular linear regression formula. More conventionally it is written as

Looking at the above equation we can see that it looks somewhat like the logistic regression if not all. Linear regression is used to find the best fit line that can be used to predict the value of y. Check the image below.

Let's say we want to predict the value of y where the value of x (input variable) is 5

Now we might be curious about the two parameters.
These two parameters are needed for our model to pick the best fit line (i.e. they are needed for us to accurately predict the values of y). However, to get the values for these parameters there are mathematical approaches that can help us solve those values such as normal equation formula, gradient boosting, SVD, etc. But since our main focus is not on linear regression. We will head back to how we can calculate the parameters as well for Logistic Regression.

Note that the formula is used for predicting whether an individual will go bankrupt given his balance. On the other hand, if we want to calculate whether an individual will not go bankrupt given his credit card balance we can easily do that using the beautiful law of probability. The formula is stated later in the article.

To calculate the parameters in a Logistic function we use a technique called the maximum like hood estimation. Now before I go into the mathematical details of this technique. I will like to point out some terms in computer science called abstraction and leap of faith. This means that you don't necessarily need to go deep down into details of how things work (I.e. if you don't know how things work for now it's okay as you advance in your learning you will surely know). Thank to those mathematicians and statisticians that came up with this formula. So therefore I won't be talking about how the parameters are being calculated using maximum likely hood. However, if you wish to know more about it you can check out this link.

Another way of using Logistic regression is in terms of the log odds ratio. Before we do that let's define the odds

Odds can be defined as the probability that an event will occur divided by the probability that an event will not occur.

From the logistic function. The probability that an individual will go bankrupt is

The probability that the individual will not go bankrupt is:

Now for us to find the odds, we divide the probability of the individual going bankrupt by the probability of not going bankrupt

When you perform the cross multiplication your result should be this:

Now when we take the natural log of both sides we have:

Remember that the natural log of e^a = a

We can see that the Logistic regression formula now looks like that of linear regression. That means we can arrive at logistic regression in two ways.

By finding the probability estimate of the logistic function based on x
By finding the log odds of the linear function based on x

Now that we've understood that let's look at the reason why we can predict classification problems using linear regression.

Why Linear Regression Won't work

The reason why Linear regression won't work simply is that looking at the below image we can be sure that using linear regression the model can predict values that are less than 0 i.e. we can have negative values and also the model can predict values greater than 1, imagine our model predicts values such as 5. This is not efficient enough. The goal of classification is to have values between 0 and 1.

(c)Author:Saishruthi Swaminathan Linear Regression
The logistic function solves this problem. The numerator part of it ensures that the values are greater than 0 or equal to 0 and also the denominator ensures that the values are less than or equal to 1. With this, the Logistic regression graphs form an S shape curve, unlike the Linear regression which is just a straight line.

(c) Author: z_ai towardsdatascience

Conclusion

Thank you for taking your time to read this article Logistic regression is one of the popular estimators used to predict classification problems. This article is just a beginner guide on how it makes its prediction. You can check books and blog posts if you wish to know more about the mathematical concept behind them. Thank you.

4 Core things to always do when cleaning your data for predictive models.

Ogunbiyi Ibrahim — Mon, 28 Mar 2022 13:08:46 +0000

Introduction

It's always vital to know what to do while cleaning data so that you can get correct insights from it. Data that isn't accurately cleansed for analysis can produce false results, as well as prevent your model from generalizing properly when tested with new data.

In this article, we'll go through the four things you should do every time you clean data so it's ready for analysis and predictions.

Let's get this party started.

Prerequisite

Make sure you have your environment set up (i.e. you must have a jupyter notebook)
Have pandas library installed on your notebook.
Have Seaborn library installed on your notebook.
Have Numpy library installed on your notebook.

What is Data Cleansing?

The process of finding and fixing inaccurate data from a dataset or data source is known as data cleaning. In simple terms, it refers to the process of preparing your data for exploratory data analysis (EDA).

Now that we've understood what data cleaning entails, we can begin working on the four items necessary to prepare the data for EDA.

So, due to proprietary issues, I won't be able to disclose the datasets that we'll be utilizing in this article, but I'll provide screenshots so we can see what's going on.

These are the four things to do when you want to clean your data for analysis and predictions.

Note: make sure you have imported these two libraries into your environment.

import pandas as pd
import seaborn as sns
import numpy as np

1. Dealing with missing values

The first thing as a data scientist/analyst is to always check the info of your data for missing values. To do this type the following code

df = pd.read_csv(file_path) #Input the filepath

df.info()

In the case of my file, this is my screenshot.

We can see from the above image (the annotation part) that we have 8606 entries. Now looking at the columns/features in the dataframe we can see obviously that we have missing values. The next thing for us is to deal with those missing values.

Now this is where the question comes

How can we deal with missing values?

When it comes to data that isn't Time-Series (I will talk about that in another article). There are two ways to go about it.

Dropping the values.
Imputation.

Dropping values

The only time you should drop values in a dataframe is if more than 60% of the observations are missing. So, if you review the dataframe and discover that 60% of the data is missing, it may be wise to drop the column with that missing value if you think it is unnecessary. If you want to drop those observations you can type

Another thing to consider is if you're building a model to predict something and you discover that the target column (i.e. this is the column you want to forecast) has missing values. You must remove all missing values from the target column, which will have an impact on other columns as well.

So in our case now our target column("price_usd_per_m2") has missing values so we have to drop those values. To do that we have to type:

df.dropna(subset = ["price_usd_per_m2"], inplace=True)

Now let's check the info of our data frame.

df.info()

We can now observe that our dataframe has shrunk from 8606 to 4895 entries. However, as you can see, there are still some missing values. Now let's have a look at the two criteria for dealing with missing data. We can see that the first criteria, Dropping values, applies to some of our columns which are floor and expenses the best thing to do is to drop those columns away(though it is not advisable to drop the floor column, bit for the case article we'll have to drop it).

df.drop(columns = ["floor", "expenses"], inplace=True)

We can see that we still have some missing values, but they are not up to 60% so we move to the other types of dealing with missing values.

Imputation

When it comes to the imputation of missing data, the most common method is to use the mean value of all the total values in the column. However, there are several guidelines for when you should not use mean which I will outline below. They are as follows:
-If your column contains values with different variances, such as positive and negative numbers, you should not utilize mean.
-If your column contains just binary values of 1s and 0s, you should avoid using the mean approach.

If your columns as the following features it is advisable to drop those observations.

Note don't confuse observation with a column. Observation means row so if you want to drop rows you make use of the following code

df.dropna(subset=[""], inplace=True)

As previously stated, this approach does not work with time-series data. It has a different approach to dealing with missing values, which I'll cover in a later article.

Also note it is always advisable to fill a column that has numeric missing values such as float and integer.

Now back to our dataset since our dataset does not have the above criteria I'm going to use the mean approach and I have only 1 numeric column(which is room) that has missing values. So I'm going to fill it with the mean value of the total values. To do that type the below code.

df.fillna(np.mean(df["rooms"], inplace=True)

Now that we don't have numerical columns that have missing values we are good to go.

2. Checking for Multicollinearity

Multicollinearity is a statistical concept in which independent variables in a feature matrix (which includes all variables in the dataframe except the target variable) are correlated. Now that we've understood what multicollinearity is. Let's check it out

Back to our dataframe since price_aprox_usd is our target value we are going to copy drop it (what this means is I'm going to pass in the drop attribute without using inplace parameter) so it will return a copy of it. So to see the multicollinearity type this:

df.drop(columns = "price_aprox_usd").corr()

Notice that I didn't pass in inplace so it returns a copy of the data frame without price_usd_per_m2 and also notice the .corr() attribute it is used to find the correlation.

After you run the above you should have something like this.

Now that we've seen the correlation numerically it will be nice to visualize so we can know what's going on. So we will be using seaborn. Just save the above code into a variable vis_corr. Check below to see what I mean.

vis_corr = df.drop(columns = "price_aprox_usd").corr()

Now to visualize it we will use seaborn just type the following code to see the visualization.

sns.heatmap(vis_corr);

So let's unpack what's going on here. The diagram above shows the level of correlation among different features in the dataframe. However, if we have features in which their correlation value is greater than 0.5 it means those features are highly correlated with one another.

Now that we've understood that we can see that price, price_aprox_local_currency are correlated with one another. Also surface_total_in_m2 and surface_covered_in_m2 are also very correlated. So we have to drop one of those features.

The issue now is "how do we know which feature to drop now that we've noticed the concerns with multicollinearity in our dataframe?"

To figure out which feature to eliminate, we'll need to look at the association between each of these features and our target variable. After we've done that then the feature with the highest correlation value will be the one to be dropped. Please see the code below to understand what I'm saying, and I'll also include the correlation value for each variable as a comment.

df["price"].corr(df["price_aprox_usd"]) #corr_value = 0.68

df["price_aprox_local_currency"].corr(df["price_aprox_usd"]) #corr_value = 1.0

df["surface_covered_in_m2"].corr(df["price_aprox_usd"]) #corr_value = 0.13

df["surface_total_in_m2"].corr(df["price_aprox_usd"]) #corr_value=0.21

Okay now that we've seen the correlation values of all the features so the features that have to be dropped are price_aprox_local_currency and surface_covered_in_m2.
I will be dropping surface_covered_in_m2 rather than surface_total_in_m2. I feel like the adjective "total" makes it have more information well.
So to speak there might be sometimes in which you have to go for the one which has the lowest value. But based on my preference I choose to drop surface_covered_in_m2
So to do that we have to type

df.drop(columns = ["price_aprox_local_currency", "surface_covered_in_m2"], inplace=True)

Okay so having done that we are good to go. Let us look at another core thing we need to look at.

3. Dealing with leaky features

To understand what leaky features are. Consider the following scenario: you were scheduled to take an exam for which you had not prepared, but you were fortunate enough to see the answers to the exam questions before entering the room. How will you perform as opposed to you didn't see the answers at all? You will pass all the exams right!. That is what leaky features are also. They're features that fool the model into thinking it's some kind of geek who can predict anything unusual, while, when it comes to predicting data that hasn't been seen before, the model will completely fail.

So back to our dataframe let's check maybe we have any leaky features in our data frame.

df.info()

So in our dataframe we can see that we have leaky features in it. Our leaky features are the price, price_usd_per_m2, price_per_m2. We can see that these features are cheats to our dataframe considering that we want to build a model that can predict house prices. It will be very stupid to have prices columns again in our dataframe. These columns are cheats to our target variable which is price_aprox_usd. We have to drop them from our dataframe.

df.drop(columns = "price_usd_per_m2", "price_per_m2", inplace=True)

Now that we are done we are good to go. So let's look at the last core thing we need to focus on when building a predictive model.

4. Dropping high and low cardinality categorical variables.

A cardinality categorical variable (e,g places, names, emails, etc.) is said to be low or high when it has few or many unique numbers of value.

So for us to be able to check the cardinality of our categorical variable which are most types of object datatype. We will use the following code.

df.select_dtypes("O").nunique()

After running the code we can see that we have 1 low and high cardinality variable which are operation and properati_url. So we have to drop them. The essence of dropping the cardinality variable is that it will cause insanity in our model which we don't want.

df.drop(columns = ["operation", "properati_url"], inplace=True)

Conclusion

So we have gotten to the end of our journey, there are still other core things to look at when cleaning the data like dealing with outliers and other kinds of stuff. Thank you for reading. I'm open to suggestions, feedback, and where I need to improve on the article. You can follow me for more articles.Thanks

How to Control your Home Device (DVD) Via tuya Link SDK and ESP 32

Ogunbiyi Ibrahim — Fri, 24 Dec 2021 23:59:31 +0000

Introduction

Controlling your home devices via the cloud can be so cool and not stressful. With you just picking up your phone and controlling your home device is something you should know much about. In this tutorial, we are going to learn how to control your DVD/ Home theater(Works the same way) via Tuya link SDK and ESP 32.

What we will be building

In this tutorial, we will learn how to control our DVD/ Home theater via Tuya link SDK and ESP32.

Hardware component

ESP32
Infrared led transmitter
Transistor Bc457
Ac/dc 5v power supply

Software Component

Tuya IoT platform
Tuya smart App
Arduino IDE
Command Prompt

GitHub Link: https://github.com/Folksconnect123/DVD-CONTROL

What is Tuya
Tuya Smart (NYSE: TUYA) is a global IoT development platform that creates interconnectivity standards to connect brands, OEMs, developers, and retailers across a variety of smart devices and industries. Tuya integrates many intelligent scenarios and smart devices by providing hardware development tools, integrating public cloud services, and giving an intelligent business development platform, all of which are based on the global public cloud.

What is Tuya link Sdk?
Tuya link SDK makes life a lot easier. It facilitates the creation of cloud development projects involving OpenAPI or message subscription capabilities.

Tuya Link SDK is one such SDK that encapsulates fundamental services like device activation, DP (Data-Points) upstream and downstream, and firmware OTA (Over-The-Air) upgrading into an interface. It is appropriate for connecting the logic services of a self-developed gadget to the cloud by developers.

What is ESP32?
The ESP32 is a low-cost Wi-Fi microchip by Espressif Systems that has built-in TCP/IP networking software and a microprocessor/ Microcontroller capability.

Setting up the environment

Before we get started let me give you a heads up that if you haven’t created a tuya account before do create because you will be given a free device license to create a hands-on project.

To use Tuya Link SDK, we will need to make a product to interact and connect to Tuya IoT Cloud. Go to your Tuya IoT account and log in, if you haven't registered for an account register one click here.

Step 1: Create a product by clicking on Create button on the home page

Step 2: A page will appear where you will need to select a category for your project or product, since we are using Esp32 as a custom device so we will need to click on “Can’t find the category” and enter the details as shown below.

Step 3: We need a standard function for the power state of the DVD (On/Off) and a custom function to be able to control the DVD like (open and close, Skip, Play, etc.). To create a standard function to either switch the DVD on/ off. Click on add and search for Switch then click Ok. Now you have to edit the name of the Switch for more clarity. You can rename it to DVD power _state. If done click on add again and search for Switch 2, then click ok. Also, rename it to Open/close. We also need to create a standard function for Play/pause. Click on add again. And search for switch then rename to Play/Pause. We need to create a custom function for skip (Note that we can also create a standard function for Skip.). On the custom function tab click on add, then set the Data-point to (SkipFor), the identifier to (skipfor) Datatype to (Bool), Data transfer type (Send only). Do the rest for the other function To know more about creating custom functions click on this link. Keep note of the PID we will need it later.

Step 4: To interact with the device we need to have a beautiful Panel with all required functionalities, although you can drag and drop UI design in the Tuya design panel since we are working on a prototype and debugging I like to have things very quick I would click on the tuya DIY Style Panel. You can change it to your preference as soon as you are done.

Step 5: Go to the Hardware Development Section of your product and click on Link SDK. You will need a license to be able to use link SDK, but don’t bother Tuya issues free licenses for new tech hobbyists.

For ESP32 we are going to use General CPU as a hardware type, also you would see an option to generate a free license if you are new to the platform. Click place the order and download the License list. The** license is composed of tuya UUID and AuthKey** which is needed to authorize the device to receive and send data to the Tuya Cloud.

Now we are done with the major part of the software development for our Tuya IoT platform now we move to the setting up of the hardware and the implementation of our code.

Installing Tuya Link SDK

Now is time to install Tuya link SDK to be able to connect our just completed application above with ESP32. To do that we have to type this code on our command prompt in order to clone the GitHub repository and install tuya link SDK.

Note: make sure you have Git installed on your pc. If you don’t you can download and install the application.

The first step is to type this code on the command prompt.



git clone https://github.com/tuya/tuyaos-link-sdk-python.git

The second step is to install the tuya link SDK. Note you have to have wheel installed on your machine if you don’t have one. Install it using this line of code. If done, you can finally type the below code.



pip install wheel



python -m pip install ./tuyaos-link-sdk-python

Now we are done with the installation of the Tuya link SDK you should have this pop-up on your screen if due procedures are followed.

Hardware setup:

Grab your ESP32 and other required hardware projects we are ready to get our hands dirty.

Make sure the required hardware items are complete. Follow the schematics of the hardware and make sure everything is connected properly.

For receiving the address use this schematic

You connect the Signal/Data pin of the infrared to D15 on the ESP. The anode of the reciever to 5v and the cathode to the ground.

For sending the Address to the DVD use this schematic

You connect the base pin of the transistor to pin D4 on the ESP 32. The Collector pin of the transistor to ground. Then the Emitter to the cathode of the LED.Then the Anode of the led must be connected with 220 ohms resistor to Vin.

Implementing your Program

Getting the address from the remote.

Now that we have gotten to this level, the next step to be done is to implement the code. Download the IRremote_decode file from the Git hub account.

Now run the IRremote_decode file and open the Serial monitor to get the address from your Remote after that you can now modify the DVD_control file to input the address.

This is the hardware schematics for the remote decoder address.

For ESP32 connect the Signal pin of the IR receiver to 15 and the Positive pin to 5v, the last pin to the ground.

This is the address from my remote controller. Then I have inserted the address into the DVD code.

Modifying the DVD_Control file

After you’ve gotten the address you can now modify the DVD_control file from GitHub to your own address like this.

You can run the file and test the code via the serial monitor with your DVD on to see if everything is going perfectly.

Finalizing our work

After you’ve successfully uploaded and run the above code it is time to control our microcontroller via the cloud. Now Download the TuyaLinkSDk file from GitHub and modify the code for your PID, Auth key and UUID, and the COM port of your microcontroller. After those are done then you can now run the code on your command prompt by using the cd method. Like in my own case I copied the file to the desktop, then I typed this on my command prompt


 desktop

Then you type the name of the file. In my own case TuyaLinkSDK.py

You should have this after you’ve successfully run the code. Now you can download the Tuya smart app from Playstore and click on add new device, then scan the code to add the device.

Here is the video to the output. https://youtu.be/VcfcegE9fgk. Thank you for reading.

Creating an Image Sketch with OpenCV (10 Lines of Code).

Ogunbiyi Ibrahim — Wed, 10 Nov 2021 13:42:10 +0000

Introduction

Have you ever considered utilizing a computer language to create a rough sketch of an image?
I'll show you how to make a sketch of an image without having any programming skills in this article. Hold on to your seat and take out your pen, because this is going to be enlightening and comprehensive.

Note: To follow along with this guide, you simply need a basic understanding of how computers function.

If you are curious about trying this out, here is the 10 lines of code below 👇

    import cv2
    path = r'C:\Users\folksconnect\Pictures\2020-07'
    image_path = cv2.imread(path)
    grey_image = cv2.cvtColor(image_path,  cv2.COLOR_BGR2GRAY
    cv2.imshow('Image', grey_image)
    invert = cv2.bitwise_not(grey_img)
    blur = cv2.GaussianBlur(invert, (30, 30), cv2.BORDER_DEFAULT)
    invertedblur = cv2.bitwise_not(blur)
    sketch = cv2.divide(grey_img, invertedblur, scale =256.0)
    cv2.imwrite('sketch.png', sketch)

But if you are not in a hurry, this article explains everything you need to know about creating an Image Sketch with OpenCV.

What is OpenCV?

OpenCV is a big open-source library for computer vision, machine learning, and image processing that is presently playing an important role in real-time operations, which are vital in today's systems. It can recognize objects, people, and even human handwriting in photographs and videos.
To summarize, we will utilize OpenCV to process a photo, i.e. to build an image sketch, in this guide.

What is Python?

Python is a programming language that is widely used to construct websites and apps, automate tasks, and analyze data. Python is a general-purpose programming language, which means it can be used to create a large variety of applications and is not specialized for any specific problem.

To be able to use OpenCV for image processing we have to import it in Python programming language, but before we start coding, let’s set up our environment so we can work efficiently.

Setting up your Python Environment

The first thing to do is to make sure you have a Python interpreter on your computer(PC), else here is a link to install Python on your Windows PC and MacOS.

Setting up the OpenCV Environment

The next step is to install the OpenCV library on your PC after you've successfully installed python on your computer.

To install OpenCV launch command prompt on your computer and run this command.

    pip install opencv-python

Please make sure you have an internet connection whilst typing this command, as it will be downloaded from the internet. Once that has been downloaded successfully, you can now launch the Python idle installed on your PC. Just type python idle on the search bar of your computer (PC).

Implementing our Code

Note: Any text (variable) at the left hand-side before the assignment symbol ( = ) is used to store information, the statement(code) at the right hand-side is stored into the left hand side.

The first step after running our idle is to create a new Python script file. This can be done by typing the Ctrl + N shortcut and this will create a new file.

The second step is to write the statement

    import cv2

This line of code import the OpenCV library into your Python code, so that you can gain functionalities of all the actions performed by it.

The next step is to assign the path of the image to the variable path and adding r at the front of the string(path). Here is an example below:

    path = r'C:\Users\folksconnect\Pictures\2020-07'

The next statement to type is:

    image_path = cv2.imread(path)

What the above line of code does is; it reads the path of the image you entered and stores it in the variable image_path.

Note if the path or image cannot be read (maybe it does not exist or there is an error in the path) this method returns an empty matrix.

The next line of statement to type is:

    grey_image = cv2.cvtColor(image_path,  cv2.COLOR_BGR2GRAY )

This line converts the color of the image you stored into image_path into a grey color image.

To show this image you can type the code:

    cv2.imshow('Image', grey_image)

This code shows the image in a dialog box with the name of the dialog box as image.

Now moving on to the next line of code:

    invert = cv2.bitwise_not(grey_img)

This line of code is used to invert the image. it changes the image pixels to zero if the pixel is greater than zero and vice-versa. For instance; a white image will be changed to black.

Using incremental development law. You can also add this line of code to see how the inverted image looks like.

    cv2.imshow('image', invert)

The next step is to blur the image and to do that

    blur = cv2.GaussianBlur(invert, (30, 30), cv2.BORDER_DEFAULT)

The syntax for the GaussianBlur() method is:

    cv2.GaussianBlur(src, ksize, sigmaX, sigmaY, borderType)

In GaussianBlur() method, you need to pass src and ksize values every time and either one, two, or all parameters value from remaining sigmaX, sigmaY and borderType parameter should be passed.

Both sigmaX and sigmaY parameters become optional if you mention the ksize(kernal size) value other than (0,0).

The src stands for the source file which we've input has invert since we want to work on invert.
The ksize value is always in tuple -- i.e. values enclosed in a parenthesis, You can set the value to any range you want depending on your preference.

The sigmaX and sigmaX are optional since ksize as been set
The borderType should also be included, but I love using the default type, so you pass in cv2.BORDER_DEFAULT

The next line of code is to invert the blur image again, and we can do that using

    invertedblur = cv2.bitwise_not(blur)

The next line of code is to divide the image

    sketch = cv2.divide(grey_img, invertedblur, scale =256.0)

Now we are done with our code the last code is just to write the image into a portable network graphic(PNG) and we can achieve that by typing

    cv2.imwrite('sketch.png', sketch)

Congratulations, We have successfully create a sketch of an image. This is an example of the image we sketched using OpenCV.

This is the full implementation of the code for easy access/use.

    import cv2
    path = r'C:\Users\folksconnect\Pictures\2020-07'
    image_path = cv2.imread(path)
    grey_image = cv2.cvtColor(image_path,  cv2.COLOR_BGR2GRAY
    cv2.imshow('Image', grey_image)
    invert = cv2.bitwise_not(grey_img)
    blur = cv2.GaussianBlur(invert, (30, 30), cv2.BORDER_DEFAULT)
    invertedblur = cv2.bitwise_not(blur)
    sketch = cv2.divide(grey_img, invertedblur, scale =256.0)
    cv2.imwrite('sketch.png', sketch)

Conclusion

Note: Ensure you change the path of the image to your own path ( the path on your PC) as the above path is a directory to an image on my PC.

In this guide, We just built a sketch of an image using OpenCV in Python. If you followed through this guide properly you should be able to set up your own version of this project and also help you in explore other cool features of this awesome library e.g. face recognition and lots more.

You can also improve on it by adding/implementing other features and functionalities.

Useful Resources

Happy Coding!

Beginners path-way on how to get started with Robotics

Ogunbiyi Ibrahim — Thu, 29 Apr 2021 11:56:18 +0000

Get ready to roll up your sleeves because i got you covered .

Firstly: what is Robotics

Robotics in lame man's term is using machine to perform task that can be done by human being. Your phone is typically a robot because it can perform calculations like adding , subtraction of two numbers which you can essentially do.

Robotics is making machine that thinks like you. Robots are widely used in industries to perform simple repetitive task. Tesla cars is typically an example of automobiles robots.

Do you think robotics is worth learning in 2021.

If you want to dive in into robotics because of the monetary aspect well this if for you according to glassdoor the average salary is ..... Whao! That's a huge money there .

What programming should I learn for robotics.

When I wanted to start my robotics software engineering career I was pondering on what programming to learn because there were lots of them.

*But heads up! * I got you covered here are the 3 core programming languages you need to know

*python*

*c language*

*c++ language*

1. python:

Why Python!. python as far as I'm concerned is one of the simplest language to learn. It as a lot of libraries for robotics with less coding compared to c and c++. Python is intuitive and easy for newbies starting their career in robotics. Python is used in raspberry pi and can also be used with ROS(robotics operating system)

###2. C language:

Why C!. A lot of hardware libraries are used in robotics are written in C this libraries allow interaction with low level hardware and real time performance.

3. C++:

Why C++! The main reason is

Performance
Scalability
Memory management.

I will drop resources for these programming languages listed above mostly free.

Do I need to get stucked with Mathematics.

Mathematics is one of the main reason people don't want to dive in into robotics. You don't need much mathematics to start with robotics. All you need is your high school or basic knowledge in algebra after that your advancement in robotics will require you to learn

Calculus
Linear algebra
Statistics

These are the step by step road map I wished I had followed before now to build your career in robotics

1. Learn 3 core programming languages

Python
C language
C++

You can start with the first two so you will be able to do hands on project like arduino and raspberry pi

2. Learn mechatronic

Mechatronic is a cool stuff to start with as an hands on project I know you might be thinking well I don't know anything about mechatronic. *But folks! * you better want to start with with something rather than nothing and to make life easier for you start with *Arduino and Raspberry pi*

*1. Arduino* is an open source electronic prototyping platform enabling user to create interactive electronic projects. You will need the basic knowledge of *C language* to start with arduino you can buy the arduino kit online at amazon or other ecommerce websites

*2. Raspberry pi* is a tiny and affordable computer that you can use to learn programming through practical projects. You will need *python or c++* to start with raspberry pi

*This article will be continued *
If you found it useful don't forget to like and share across all platforms.
You can reach at my handle

Twitter

Thanks🥳🥳

Five free ebooks i wish i add earlier when i was a python beginner programmer. Thank me later

Ogunbiyi Ibrahim — Sat, 13 Feb 2021 04:44:50 +0000

Five python free ebooks I wish I add earlier when I was a beginner folks it's very easy to decipher.
1) programming in python 3 mark summerfield
2)Begin to code with python Rob miles
3)learn python the hard way
4) A practical introduction to python programming Brian heiniod
5)Fundamental of python first programming Kenneth Lambert

*For advanced programmers I recommend you try *
Python tricks Dan bader

Thank me later 😀🥰🥰🥳🥳

python

A quick introduction of who i am

Ogunbiyi Ibrahim — Fri, 12 Feb 2021 16:34:30 +0000

I'm a 300 level computer science student at the olusegun agagu university of science and technology in ondo.im currently a python wizard but aspiring to become a data scientist so I can achieve my purpose in robotics AI