DEV Community: Penumudi Varun

Non-numeric data for ML (Encoding Data)

Penumudi Varun — Fri, 24 Jan 2025 01:57:43 +0000

When dealing with Datasets for Machine Learning, some of the data will be in the text format. For example, consider a medical dataset containing patients details or consider a dataset containing details of individuals in an organization, usually these datasets have a gender column to tell whether the person is male or female.

These non-numeric columns might contain important information needed for training the model. But, the problem is the ML and Deep Learning models cannot understand the text data. To give these features as inputs to the model we have to encode this text data into numbers before giving it to the model.

This process of conversion of this text data into the numbers is called Feature Encoding. There, are different types of feature encoding:

Label Encoding
Ordinal Encoding
One Hot Encoding

Label Encoding

The label encoding we label each unique value in the column with a unique number. for example, if we consider the gender column it will only have two unique values, male and female. so we can label male as 1 and female as 2 in another column as shown.

As you can see in the above dataset we created another column for label encoded values of gender column. In the label encoded columns we represent the male with 1 and female with 2.

Ordinal Encoding

If you sense some form of ordering in the column values, then you have to go for ordinal encoding. Consider a customer reviews dataset containing the rating column.

The rating column here has values good, fine, and bad. Here when we encode them to numbers we usually want to give a higher number to good rating and lower number to bad rating. This is called ordinal encoding. The ordinal encoded values for the rating column is also shown above.

One Hot Encoding (Categorical Encoding)

Let's take the previous example of gender column before. Instead of converting male to 1 and female to 2. we can instead create two separate columns for these two unique values as shown.

Here, you can see that the is_male, is_female columns are added and these columns acts as better indicators of gender to the ML model rather than labeling each gender with a number. This type of encoding is called Categorical Encoding or One Hot Encoding.

The is_male column acts as a Boolean feature for the ML model instead of being just a labeled number in the dataset. So, overall this is a good model. But, this has a disadvantage with cardinality of column.

Cardinality: This is measure of number of unique values of the column. Eg: the cardinality of the gender column is just 2.

Problem

Now let's consider that there is a column with 100 unique values, then by one hot encoding 100 more columns will be added to dataset, which might make our Machine Learning model slower. So, we have to consider low cardinality columns for one hot encoding.

So, these are the most important encoding types you need to know in Machine learning.

Wanna Know Big O Basics!

Penumudi Varun — Tue, 14 Jan 2025 09:21:17 +0000

Time Complexity, Big-O for Beginners

Penumudi Varun ・ Jan 14

#leetcode #dsa #algorithms #datastructures

Time Complexity, Big-O for Beginners

Penumudi Varun — Tue, 14 Jan 2025 09:10:14 +0000

Often times when dealing with algorithms, we want to know which algorithm takes more time and which one takes less time. To know it, first one need a way to measure the time taken by the algorithm. The thing is we cannot measure the time taken by a algorithm in seconds. This is because the code for algorithms can be written in different languages.

If you code for a faster algorithm in python and a slower one in C++ both of them might take same amount of time. Even if we try to compare their times by coding both in a single language, that would still be a bad idea. Because an algorithm running on a faster computer will take fewer nano seconds than the same algorithm on a slower computer. Hence "Seconds" are not a useful unit to measure time taken by a algorithm.

Better Way(Time Complexity)

So, what could be a better way. one way to solve this problem is by instead of asking "how many seconds will this algorithm take?" we can ask a different question i.e. "How many operations does this algorithm take?".

Example:

Let's consider two simple algorithms:

Algorithm to pop last element of an array/list.
Algorithm to pop element at index i in the list.

The first algorithm is simple. It You just take out the last element of the array and no need to do anything other than that.

This takes Just a single operation. Hence time complexity in Big-O notation is given by: O(1)

Whereas the second algorithm is different. To remove an element at a specific index you have to shift all the elements after this index left.

In the worst case if the index i=0 then we have to shift the whole array to left. So in worst case if there are n elements in the array, All the n elements are shifted left.
We have to perform n shifts in worst case, i.e. 'n' operations need to be performed. This makes the time complexity of this algorithm in Big O is given as O(n), where n is no. of elements in the array.

These Big O notations are standard ways to represent time complexity of an algorithm. The Big O notation is a way to tell the amount of operations the algorithm will take given n number of inputs.

Important Time Complexities

Some Important time complexities that you will see while solving DSA questions are:
O(1): Takes Constant amount of time
O(log(n)): Time increases logarithmically with no. of inputs
O(n) : Time increases linearly with inputs
O(n^2): Quadratic amount of time taken
O(2^n): Time increases exponentially with inputs
O(n!): Algorithm performs factorial number of operations.

The rate at which the time increases for algorithms for each of these types of algorithms is visually shown in the graph below.

As you can see the no. of operations for O(n!) algorithm increases rapidly and stays constant for O(1) as the amount of inputs increases. Hence, The O(n!) is the worst time complexity and O(1) is the best.

Note

When we say time complexity we don't actually mean the exact amount of operations that an algorithm will take, instead it means how the no. of operations increase with increase in no. of inputs.

let's say we consider popping two numbers at specific index and you might think now the time complexity would be O(2n), But that's not the case the time complexity would still be O(n). This is because don't care whether it takes n, 2n, 3n or some constant n operations, we see that the no. of operations algorithm takes is in the order of n's Hence time complexity is O(n) or linear.

What is the Architecture of Django?

Penumudi Varun — Sun, 12 Jan 2025 07:30:15 +0000

The first time I was asked this question was not when i was learning Django, but after I had learned it and applied for an internship. During the internship, I was asked this question. Unfortunately, I didn't know the answer at that time, but now I do.

Every Django project you create follows an architecture called MVT. The MVT here stands for Mode Template View. These three things are main parts of any Django project. Let's see about them in detail.

Model

A model is a class based representation of a table in a database. Django applications use python classes to represent a table in the website database, these classes are called models in django. All the model classes you create in django should inherit from "django.db.models.Model" class. Each model class will have attributes that represents the fields of the database table.

View

The view is the function or class that contains necessary logic to take in a HTTP request coming from the client and send the appropriate HTML, json or some other response back to the user. In Django these views can be class based or function based in Django.

The view takes in the url path, query parameters along with request body sent by the user/client then uses this data to perform CRUD operations if needed, and sends back the responses.

Template

The template in Django is nothing but a HTML file that defines the layout and contains body of webpage along with some other script written in special templating syntax supported by Django.

Using this special templating syntax we can show the dynamic data on the website. This dynamic data is actually given by a Django view to the template and it usually contains the information about models of the Django project.

MVT and MVC?

The MVT architecture used by Django is a slightly different version of another popular architecture called MVC. Here MVC stands for Model View Controller. Here Model, View and Controller stands for

Model: The model here represents the data and business logic. The model handles the data and business logic of the application similar to the model in Django's MVT.
View: The view here is different. It doesn't contain the logic for handling http requests, instead it represents UI elements. So, the view in MVC is more similar to template in MVT
Controller: The Controller in this architecture is responsible for the logic for controlling the requests and user inputs. So controller here is more like view in the MVC

So, this is all you should know about the Django's architecture. If you have any questions feel free to ask them in comments.