Introduction to Hashing

#programming #java #security #beginners

Before reading this article please make sure that you have a basic understanding of data structures and algorithms. Thanks!!😉

Suppose we have to create a system where we have to store a person's phone number along with their information using any data structure. The most obvious data structure which comes to our mind is as follows:

Arrays - They are the most common data structure but when we would try to store that data into it, the most common fallback which comes is while searching the data. It takes O(n) time for searching. And even if we used a sorted array to search it would still take O(log(n)) time and then insertion and deletion would also get affected by it.
Linked List - It is also one of the widely used data structures but is somewhat similar to an array as searching in it is also in a linear fashion. So, it would also not be an efficient approach as well to store our data.
Binary Search Trees - It is also a great one made up of nodes and linked lists. As we will use this data structure to store our data so on average it will take O(log(n)) time in order to perform insert, search and delete operations. It provides a moderately good choice as compared to arrays and linked lists.
Direct Access Table(DAT) - It is one of the solutions to this problem where we use phone numbers as our index in a big array. Entry in this array consists of a pointer to records corresponding to the phone number. For example, to insert a phone number, we create a record with details of the given phone number, use the phone number as an index, and store the pointer to the created record in the table. All the major operations take place in O(1) time. Isn't that amazing!!

But DAT has practical limitations as firstly a huge space is required to store them. For example, if the phone number is n digits, we need O(m * 10^n) space for a table where m is the size of a pointer to record. Another problem is an integer in a programming language may not store n digits.

So, this is when the Hashing comes into the picture. It is basically an improvement to DAT. The basic idea behind it is to convert the phone number into a small key and use that key as an index in a Hash Table. Not only that, Hashing performs way better than the above-listed data structures like we get O(1) as average time complexity for searching the element.

So, what is Hashing?
Bookish Definition - Hashing is the process of converting a given key into another smaller value for O(1) retrieval time. This is done by a function called a hash function to map data to some encrypted or simplified representative value which is termed as “hash code” or “hash”.

In Simple Words - We have a function called as Hash Function through which we pass a key or phone number in this case and we get the corresponding hash or hash code which is usually a small number act as an index of an array and is used to store and retrieve data.

So, this was all about Hashing Introduction. I tried to kept it short and simple. Soon, I would be discussing Hash Functions and Collision Handling in Hash Functions so stay tuned. Hope you all like my little effort towards this concept and I would be glad if I was able to add something to your knowledge.

Please give your suggestions and feedback in the comments :)