DEV Community

Kiran U Kamath
Kiran U Kamath

Posted on • Originally published at kiranukamath.substack.com

Understanding hashCode() and equals() in Java

Java’s hashCode() and equals() methods are fundamental to the functioning of many core Java classes, particularly those in the Collections framework, such as HashMap, HashSet, and Hashtable. These methods define how Java objects behave when stored in collections that use hashing for efficient retrieval.

In this blog, we’ll dive deep into how these methods work, the rationale behind them, and their impact on performance. We’ll also look at the relationship between hashCode() and equals(), explore best practices, and investigate a real-world example with Java's String class.

Overview of hashCode() and equals()

In Java, objects are often stored in collections such as HashMap or HashSet, which use hashing for efficient access and storage. For these collections to work as expected, the objects need to adhere to certain rules regarding equality (equals()) and hashing (hashCode()).

  • hashCode(): This method returns an integer hash code, which is used by hash-based collections to determine the "bucket" where an object should be placed.
  • equals(): This method checks if two objects are meaningfully equal. It compares their internal state to determine if they represent the same logical entity.

The correct implementation of these methods ensures that objects can be retrieved efficiently from collections and behave correctly when compared.

Deep Dive into hashCode()

What is a Hash Code?

A hash code is a numerical value generated for an object. It serves as a compact representation of the object’s contents. Hash codes are particularly useful for storing objects in hash-based collections (like HashMap, HashSet, and Hashtable) because they allow these collections to quickly find an object in a large dataset by using the hash code to determine the storage location (bucket).

How hashCode() Works

The hashCode() method is defined in Java’s Object class and can be overridden to provide custom hash codes for objects. Here’s the method signature:

public int hashCode();

When you insert an object into a hash-based collection, the collection first calls the object’s hashCode() method to get its hash code. This hash code is then used to find the correct bucket where the object should be stored.

But what happens if two objects have the same hash code? This is called a hash collision, and it’s something we’ll explore in more depth later. For now, the important takeaway is that two objects with the same hash code may be stored in the same bucket, and the collection will use equals() to differentiate between them.

Overriding hashCode()

Here’s an example of how you can override


public class Person {
    private String name;
    private int age;

    @Override
    public int hashCode() {
        int result = 17;  // Start with a non-zero constant
        result = 31 * result + name.hashCode();  // Combine fields
        result = 31 * result + age;
        return result;
    }
}

Enter fullscreen mode Exit fullscreen mode

Example of Hash Code Calculation:

Let's consider the string

String str = "abc";
System.out.println(str.hashCode());

The hash code is calculated as follows:

  • For the first character 'a' (ASCII 97): h = 31 * 0 + 97 = 97
  • For the second character 'b' (ASCII 98): h = 31 * 97 + 98 = 3105
  • For the third character 'c' (ASCII 99): h = 31 * 3105 + 99 = 96354

Thus, the hash code for "abc" is 96354.

The Role of Prime Numbers in hashCode()

You might have seen prime numbers like 31 commonly used in hash code calculations (e.g., 31 * h + val[i]). This is not arbitrary—prime numbers help reduce the likelihood of hash collisions. By multiplying intermediate hash values by a prime number, you distribute the hash values more evenly across possible buckets, ensuring better performance for hash-based collections.

Prime numbers ensure that hash codes are distributed more uniformly because their factors are less predictable, reducing the likelihood that different combinations of object fields will produce the same hash code.

Hash Collisions and Buckets

A hash collision occurs when two different objects return the same hash code. When this happens, both objects are stored in the same bucket, but they still need to be distinguishable. This is where the equals() method comes into play.

In a HashMap, for instance, if two objects have the same hash code, they will be placed in the same bucket. However, the collection will then use equals() to check if the objects are truly equal. If they are, the collection considers them duplicates; if not, both objects are stored in the same bucket but treated as distinct elements.

Deep Dive into equals()

What is Equality?

In Java, the equals() method defines logical equality between two objects. It is used to determine whether two objects represent the same logical entity, even if they are different instances.

Here’s the method signature of equals():

public boolean equals(Object obj);

The default implementation of equals() in the Object class compares object references using ==, which checks for reference equality—i.e., whether two objects point to the same memory location. However, this default behavior is not suitable for most applications, where you want to compare the actual contents or values of objects, not just their memory addresses.

Overriding equals()

Here’s an example of how you might override equals() in a custom class:


public class Person {
    private String name;
    private int age;

    @Override
    public boolean equals(Object obj) {
        if (this == obj) return true;
        if (obj == null || getClass() != obj.getClass()) return false;
        Person person = (Person) obj;
        return age == person.age && name.equals(person.name);
    }
}
Enter fullscreen mode Exit fullscreen mode

The Contract Between equals() and hashCode()

There is an important contract between

  • If two objects are equal according to equals(), they must have the same hash code.
  • If two objects are not equal, they can have the same or different hash codes (hash collisions).

Hash-based collections rely on this contract to function properly. If two equal objects have different hash codes, they might be stored in different buckets, causing the collection to behave incorrectly (e.g., failing to retrieve an object or incorrectly treating two objects as distinct).

  • Reflexive: For any non-null object x, x.equals(x) must return true.
  • Symmetric: For any non-null objects x and y, if x.equals(y) returns true, then y.equals(x) must also return true.
  • Transitive: For any non-null objects x, y, and z, if x.equals(y) returns true and y.equals(z) returns true, then x.equals(z) must also return true.
  • Consistent: For any non-null objects x and y, repeated calls to x.equals(y) must consistently return true or false, provided no information used in equals() comparisons is modified.

The Importance of hashCode() and equals() in Collections

In hash-based collections like HashMap, HashSet, and Hashtable, the hashCode() and equals() methods are used together to store and retrieve elements.

Insertion in a HashMap:

Hashing: The hashCode() method is called on the key to determine the bucket (array index) where the object should be placed.

Equality Check: If another object already exists in the bucket (due to a hash collision), the equals() method is used to check if the two objects are equal. If they are equal, the existing value is replaced with the new one; otherwise, the new object is stored in the same bucket using techniques like chaining or probing.

Retrieval from a HashMap:

When retrieving a value from a HashMap using a key:

Hashing: The hashCode() of the key is used to find the correct bucket.

Equality Check: The equals() method is used to identify the correct key-value pair within the bucket (if multiple objects are stored due to hash collisions).

Impact of Incorrect Implementation:

If equals() is overridden without hashCode(), hash-based collections might not work correctly. Two objects that are equal according to equals() could end up in different buckets because their hash codes differ.

If hashCode() is poorly implemented, you might experience frequent hash collisions, leading to performance degradation in collections like HashMap.

Common Mistakes and Best Practices:

Always Override hashCode() When Overriding equals():

Ensure that equal objects have the same hash code.

Failing to override hashCode() when overriding equals() violates the contract between the two methods and can lead to bugs in collections.

Use Immutable Fields

It’s a good practice to use immutable fields (e.g., final fields) in hashCode() and equals() to prevent the state of an object from changing after it’s been inserted into a collection.

If the fields that equals() and hashCode() rely on can be modified, the object’s behavior in hash-based collections may become unpredictable.

Use Prime Numbers for Hashing

Prime numbers like 31 are commonly used in hash functions because they help distribute hash values more uniformly across the hash table, reducing collisions.

Avoid Using Floating-Point Numbers

Using float or double in hashCode() can be tricky due to precision issues. If you must include them, consider converting them to int using Float.floatToIntBits() or Double.doubleToLongBits().

Caching Hash Codes for Immutable Objects

If an object is immutable (like String), you can cache its hash code to avoid recomputation, improving performance. This is done in the String class, where the hashCode is computed once and stored for future use.

String Class implementation

In Java, String is a class that represents a sequence of characters.

Implementation of equals() in String

Here’s the implementation of the equals() method in the String class:


public boolean equals(Object anObject) {
    if (this == anObject) {
        return true;
    }
    if (anObject instanceof String) {
        String anotherString = (String) anObject;
        int n = value.length;
        if (n == anotherString.value.length) {
            char v1[] = value;
            char v2[] = anotherString.value;
            int i = 0;
            while (n-- != 0) {
                if (v1[i] != v2[i])
                    return false;
                i++;
            }
            return true;
        }
    }
    return false;
}

Enter fullscreen mode Exit fullscreen mode

Breakdown of equals() Logic:

Reference Check: The method first checks if the two references (this and anObject) point to the same object (this == anObject). If true, they are considered equal, and the method returns true.

Type Check: If the two objects are not the same reference, the method checks whether anObject is an instance of String. If not, false is returned.

Length Check: If both objects are strings, their lengths are compared. If the lengths differ, the strings cannot be equal, and the method returns false.

Content Comparison: If the lengths are the same, the method compares the individual characters of the two strings. If any character differs, false is returned. If all characters match, true is returned, indicating that the strings are equal.

This implementation ensures that two strings are considered equal if and only if they contain the exact same sequence of characters in the same order.

Implementation of hashCode() in String

Here’s the actual implementation of


public int hashCode() {
    int h = hash;
    if (h == 0 && value.length > 0) {
        char val[] = value;

        for (int i = 0; i < value.length; i++) {
            h = 31 * h + val[i];
        }
        hash = h;
    }
    return h;
}

Enter fullscreen mode Exit fullscreen mode

Breakdown of hashCode() Logic:

Cached Hash Code: The String class caches the hash code once it has been calculated. The variable hash is used to store the hash code, and the method first checks if hash is already set (h == 0). If it’s non-zero, the method returns the cached hash code.

Hash Calculation: If the hash code hasn’t been calculated yet, the method iterates over the characters in the string (char val[] = value). For each character, the current hash is multiplied by 31 and added to the character’s value (h = 31 * h + val[i]). This results in a final hash code that represents the string.

Return Value: Once the hash code is computed, it is cached in the hash variable and returned.

Thanks for reading Kiran’s Blog! Subscribe for free to receive new posts and support my work.

Top comments (0)