DEV Community

Cover image for Sets - HashSet, LinkedHashSet, and TreeSet.
Paul Ngugi
Paul Ngugi

Posted on

Sets - HashSet, LinkedHashSet, and TreeSet.

You can create a set using one of its three concrete classes: HashSet, LinkedHashSet, or TreeSet. The Set interface extends the Collection interface, as shown in Figure below. It does not introduce new methods or constants, but it stipulates that an instance of Set contains no duplicate elements. The concrete classes that implement Set must ensure that no duplicate elements can be added to the set. That is, no two elements e1 and e2 can be in the set such that e1.equals(e2) is true.

Image description

The AbstractSet class extends AbstractCollection and partially implements Set. The AbstractSet class provides concrete implementations for the equals method and the hashCode method. The hash code of a set is the sum of the hash codes of all the elements in the set. Since the size method and iterator method are not implemented in the AbstractSet class, AbstractSet is an abstract class.

Three concrete classes of Set are HashSet, LinkedHashSet, and TreeSet, as shown in Figure below.

Image description

HashSet

The HashSet class is a concrete class that implements Set. You can create an empty hash set using its no-arg constructor or create a hash set from an existing collection. By default, the initial capacity is 16 and the load factor is 0.75. If you know the size of your set, you can specify the initial capacity and load factor in the constructor. Otherwise, use the default setting. The load factor is a value between 0.0 and 1.0.

The load factor measures how full the set is allowed to be before its capacity is increased. When the number of elements exceeds the product of the capacity and load factor, the capacity is automatically doubled. For example, if the capacity is 16 and load factor is 0.75, the capacity will be doubled to 32 when the size reaches 12 (16*0.75 = 12). A higher load factor decreases the space costs but increases the search time. Generally, the default load factor 0.75 is a good tradeoff between time and space costs.

A HashSet can be used to store duplicate-free elements. For efficiency, objects added to a hash set need to implement the hashCode method in a manner that properly disperses the hash code. Recall that hashCode is defined in the Object class. The hash codes of two objects must be the same if the two objects are equal. Two unequal objects may have the same hash code, but you should implement the hashCode method to avoid too many such cases. Most of the classes in the Java API implement the hashCode method. For example, the hashCode in the Integer class returns its int value. The hashCode in the Character class returns the Unicode of the character. The hashCode in the String class returns s0 *31^(n - 1) + s1 *31^(n - 2) + c + sn - 1, where si is s.charAt(i).

The code below gives a program that creates a hash set to store strings and uses a foreach loop to traverse the elements in the set.

Image description

The strings are added to the set (lines 11–16). New York is added to the set more than once, but only one string is stored, because a set does not allow duplicates.

As shown in the output, the strings are not stored in the order in which they are inserted into the set. There is no particular order for the elements in a hash set. To impose an order on them, you need to use the LinkedHashSet class.

Recall that the Collection interface extends the Iterable interface, so the elements in a set are iterable. A foreach loop is used to traverse all the elements in the set (lines 21–23).

Since a set is an instance of Collection, all methods defined in Collection can be used for sets. The code below gives an example that applies the methods in the Collection interface on sets.

package demo;

public class TestMethodsInCollection {

    public static void main(String[] args) {
        // Create set1
        java.util.Set<String> set1 = new java.util.HashSet<>();

        // Add strings to set1
        set1.add("London");
        set1.add("Paris");
        set1.add("New York");
        set1.add("San Francisco");
        set1.add("Beijing");

        System.out.println("set1 is " + set1);
        System.out.println(set1.size() + " elements in set1");

        // Delete a string from set1
        set1.remove("London");
        System.out.println("\nset1 is " + set1);
        System.out.println(set1.size() + " elements in set1");

        // Create set2
        java.util.Set<String> set2 = new java.util.HashSet<>();

        // Add strings to set2
        set2.add("London");
        set2.add("Shanghai");
        set2.add("Paris");
        System.out.println("\nset2 is " + set2);
        System.out.println(set2.size() + " elements in set2");

        System.out.println("\nIs Taipei in set2? " + set2.contains("Taipei"));

        set1.addAll(set2);
        System.out.println("\nAfter adding set2 to set1, set1 is " + set1);

        set1.removeAll(set2);
        System.out.println("After removing set2 from set1, set1 is " + set1);

        set1.retainAll(set2);
        System.out.println("After removing common elements in set2 from set1, set1 is " + set1);
    }

}

Enter fullscreen mode Exit fullscreen mode

set1 is [San Francisco, New York, Paris, Beijing, London]
5 elements in set1
set1 is [San Francisco, New York, Paris, Beijing]
4 elements in set1
set2 is [Shanghai, Paris, London]
3 elements in set2
Is Taipei in set2? false
After adding set2 to set1, set1 is
[San Francisco, New York, Shanghai, Paris, Beijing, London]
After removing set2 from set1, set1 is
[San Francisco, New York, Beijing]
After removing common elements in set2 from set1, set1 is []

The program creates two sets (lines 7, 25). The size() method returns the number of the elements in a set (line 17). Line 20

set1.remove("London");

removes London from set1.

The contains method (line 34) checks whether an element is in the set. Line 36

set1.addAll(set2);

adds set2 to set1. Therefore, set1 becomes [San Francisco, New York, Shanghai, Paris, Beijing, London]. Line 39

set1.removeAll(set2);

removes set2 from set1. Thus, set1 becomes [San Francisco, New York, Beijing]. Line 42

set1.retainAll(set2);

retains the common elements in set1. Since set1 and set2 have no common elements, set1 becomes empty.

LinkedHashSet

LinkedHashSet extends HashSet with a linked-list implementation that supports an ordering of the elements in the set. The elements in a HashSet are not ordered, but the elements in a LinkedHashSet can be retrieved in the order in which they were inserted into the set. A
LinkedHashSet can be created by using one of its four constructors, as shown in Figure above. These constructors are similar to the constructors for HashSet.

The code below gives a test program for LinkedHashSet. The program simply replaces HashSet by LinkedHashSet in code above TestHashSet.java.

Image description

A LinkedHashSet is created in line 8. As shown in the output, the strings are stored in the order in which they are inserted. Since LinkedHashSet is a set, it does not store duplicate elements.

The LinkedHashSet maintains the order in which the elements are inserted. To impose a different order (e.g., increasing or decreasing order), you can use the TreeSet class.

If you don’t need to maintain the order in which the elements are inserted, use HashSet, which is more efficient than LinkedHashSet.

TreeSet

SortedSet is a subinterface of Set, which guarantees that the elements in the set are sorted. Additionally, it provides the methods first() and last() for returning the first and last elements in the set, and headSet(toElement) and tailSet(fromElement) for returning a portion of the set whose elements are less than toElement and greater than or equal to fromElement, respectively.

NavigableSet extends SortedSet to provide navigation methods lower(e), floor(e), ceiling(e), and higher(e) that return elements respectively less than, less than or equal, greater than or equal, and greater than a given element and return null if there is no such element. The pollFirst() and pollLast() methods remove and return the first and last element in the tree set, respectively.

TreeSet implements the SortedSet interface. To create a TreeSet, use a constructor, as shown in Figure above. You can add objects into a tree set as long as they can be compared with each other.

As discussed in Section, the elements can be compared in two ways: using the Comparable interface or the Comparator interface.

The code below gives an example of ordering elements using the Comparable interface. The preceding example in code above, TestLinkedHashSet.java displays all the strings in their insertion order. This example rewrites the preceding example to display the strings in alphabetical order using the TreeSet class.

Image description

Sorted tree set: [Beijing, London, New York, Paris, San Francisco]
first(): Beijing
last(): San Francisco
headSet("New York"): [Beijing, London]
tailSet("New York"): [New York, Paris, San Francisco]
lower("P"): New York
higher("P"): Paris
floor("P"): New York
ceiling("P"): Paris
pollFirst(): Beijing
pollLast(): San Francisco
New tree set: [London, New York, Paris]

The example creates a hash set filled with strings, then creates a tree set for the same strings. The strings are sorted in the tree set using the compareTo method in the Comparable interface.

The elements in the set are sorted once you create a TreeSet object from a HashSet object using new TreeSet(set) (line 18). You may rewrite the program to create an instance of TreeSet using its no-arg constructor, and add the strings into the TreeSet object.

treeSet.first() returns the first element in treeSet (line 22), and treeSet.last() returns the last element in treeSet (line 23). treeSet.headSet("New York") returns the elements in treeSet before New York (lines 24). treeSet.tailSet("New York") returns the elements in treeSet after New York, including New York (lines 25).

treeSet.lower("P") returns the largest element less than P in treeSet (line 28). treeSet.higher("P") returns the smallest element greater than P in treeSet (line 29). treeSet.floor("P") returns the largest element less than or equal to P in treeSet (line 30). treeSet.ceiling("P") returns the smallest element greater than or equal to P in treeSet (line 31). treeSet.pollFirst() removes the first element in treeSet and
returns the removed element (line 32). treeSet.pollLast() removes the last element in treeSet and returns the removed element (line 33).

All the concrete classes in Java Collections Framework have at least two constructors. One is the no-arg constructor that constructs an empty collection. The other constructs instances from a collection. Thus the TreeSet class has the constructor TreeSet(Collection c) for constructing a TreeSet from a collection c. In this example, new TreeSet<>(set) creates an instance of TreeSet from the collection set.

If you don’t need to maintain a sorted set when updating a set, you should use a hash set, because it takes less time to insert and remove elements in a hash set. When you need a sorted set, you can create a tree set from the hash set.

If you create a TreeSet using its no-arg constructor, the compareTo method is used to compare the elements in the set, assuming that the class of the elements implements the Comparable interface. To use a comparator, you have to use the constructor TreeSet(Comparator comparator) to create a sorted set that uses the compare method in the comparator to order the elements in the set.

The code below gives a program that demonstrates how to sort elements in a tree set using the Comparator interface.

Image description

The GeometricObjectComparator class is defined in GeometricObjectComparator.java. The program creates a tree set of geometric objects using the GeometricObjectComparator for comparing the elements in the set (lines 8).

The Circle and Rectangle classes were defined in Abstract Classes. They are all subclasses of GeometricObject. They are added to the set (lines 9–12).

Two circles of the same radius are added to the tree set (lines 10–11), but only one is stored, because the two circles are equal and the set does not allow duplicates.

Top comments (0)