Module 1: Sets

Module Overview

Learn about the Set interface and its implementations in Java, including HashSet, LinkedHashSet, and TreeSet.

Learning Objectives

  • Implement code to iterate over the elements of a Set to perform an operation on each one
  • Implement code to add an object to a Set
  • Implement code to remove an object from a Set
  • Explain what the contents of a Set are after adding a duplicate value
  • Explain why checking for the presence of an object in a HashSet or adding an object to a HashSet runs in constant O(1) time
  • Design and implement a class that uses a Set as a member to ensure uniqueness
  • Outline the purpose of a Set
  • Compare and contrast the Set and List data structures
  • Recall that HashSet is an implementation of Set

In Java, a Set is a data structure that stores a unique collection of objects. Each object in a Set occurs once and only once. This is unlike a List or array where you can store the same object in multiple positions. Another difference between Sets and Lists or arrays is that a Set does not necessarily store or return objects in any particular order. For Lists, the typical use case is to retrieve individual elements based on an index. For Sets, however, the more common use cases are to check for an element's existence and to operate on a set of unique values.

In this reading, we will explore the essential methods defined by the Set interface, look into one of the Set implementations called HashSet, and compare it to Java arrays and Lists.

What is a set?

Imagine you work for Amazon Fresh, and you need to track what fruit the business offers. Suppose you use an ArrayList to store each entry. Your list might look like this:

["banana", "grapes", "fig", "mango", "pear", "apple", "peach", "watermelon"]

To check if Amazon Fresh sells grapes, you would have to look at each item until you find grapes or until you reach the end of the list and find out you don't sell grapes. Not so bad if we are looking for grapes, but this would be slower for watermelon, which is at the end of the list. As our fruit catalog gets bigger, this operation will get even slower! The time complexity of searching for fruit in an ArrayList has linear time complexity, O(n). However, we have learned that if you know the index to look up a value in an ArrayList, accessing it is a constant time, O(1), operation. We can determine if a specific fruit is in our catalog in constant time, O(1), if we use a HashSet, which implements the Set interface.

A HashSet is a data structure that utilizes the concept of hash codes to assign locations of objects to allow for a constant time, O(1), lookup. A hash function is used to determine where to store and retrieve items.

A hash function might produce the following hash codes:

"banana" → 4
"grapes" → 1
"fig" → 3
"mango" → 7
"pear" → 5
"apple" → 5
"peach" → 8
"watermelon" → 6

Using the hash codes as the storage location results in the following storage:

Index	Value
0	<empty>
1	Collection { "grapes" }
2	<empty>
3	Collection { "fig" }
4	Collection { "banana" }
5	Collection { "apple", "pear" }
6	Collection { "watermelon" }
7	Collection { "mango" }
8	Collection { "peach" }
9	<empty>
...etc...

Notice the items are not stored in any particular order like they were above in the ArrayList. The elements are ordered by their hash code. This is why Sets do not guarantee any order.

In Java, Set is an interface, which means we cannot instantiate a Set. HashSet is one implementation of the Set interface. There are others, but we will not focus on any others in this lesson.

NOTE: In this reading, we will demonstrate how HashSet accomplishes constant time, O(1), operations. We will build our mental model around this by assuming HashSet uses an array as its underlying storage. The Java implementation of HashSet may utilize a different way of storing its elements but the process for where to store and retrieve them will hold true!

HashSet

Declaring a HashSet looks like this:

Set<E> hashSet = new HashSet<>();

We use the interface type on the left-hand side, declaring E as the generic type. E is a placeholder for the Java class type your Set will store. We've named the set hashSet. On the right-hand side of the equals sign, we call the no-args HashSet constructor.

For our Amazon Fresh example, our HashSet declaration looks like the following:

Set<FreshFruit> fruitsOffered = new HashSet<>();

We will declare our FreshFruit class to be:

public class FreshFruit {
    
    private String name;
    private boolean isSeedless;

    public FreshFruit (String name, boolean isSeedless) {
         this.name = name;
         this.isSeedless = isSeedless;
    }
    
   @Override
   public boolean equals(Object o) {
        // An object can't be equal to null.
        if (o == null) {
            return false;
        }

        // If two objects have the same reference, they should be equal.
        if (this == o) {
            return true;
        }
        
        // If the objects are of different types, they shouldn't be equal. 
        if (getClass() != o.getClass()) {
            return false;
        }
        
        // Now that we know they are of the same type, cast the object to a FreshFruit
        FreshFruit that = (FreshFruit) o;
        
        // Finally, implement the checks for equality between attributes that make the two objects equal.
        return Objects.equals(this.name, that.name) && Objects.equals(this.isSeedless, that.isSeedless);
   }

    @Override
    public int hashCode() {
        return Objects.hash(this.name, this.isSeedless);
    }
}

When using a HashSet, it's important to remember that there is no guarantee to the order of the elements in the Set. The order that you add elements to the Set has no impact on how they are stored, unlike an ArrayList that always adds to the end of the list. Instead, elements are stored based on their hash code value. So, if you iterate through the elements of a Set, they will not be in any specific order. In fact, after the Set has been modified and changed sizes, another iteration through the Set could return a completely different ordering.

Now that we know how to create a HashSet, let's dive into some commonly used methods. For a complete list, you can check out the HashSet Javadoc.

The add() method

The add() method allows us to add elements to a Set. If the element is not contained in the Set it will be added, and the method will return true. If the element is already included in the Set the element will not be added, and the method will return false.

For example, we want to add a pear to our Amazon Fresh catalog:

FreshFruit pear = new FreshFruit("pear", false);
boolean isAdded = fruitsOffered.add(pear);

Let's discuss in more detail how the pear is being added and stored. It will be stored at the array index that corresponds to the hash code value of the pear object, as shown in Figure 1. In this case, the add() method calls the FreshFruit hashCode() method to get the hash code for pear. Let's say it returns 1. We add pear and return true.

This whole process takes on average constant time, O(1)! Regardless of how big our Set is, we still only need to execute the hashCode() method to determine the storage location of the element.

Figure 1

Figure 1. A diagram that shows three indexes of our storage array: 0, 1, and 2. The pear element is stored in index 1. The add() method returns true.

Since pear is the first element added to fruitsOffered, we know that the index it's added to is empty, but that's not always the case. The storage array has a limited number of indexes, meaning that some elements are hashed to the same location. Let's add an apple to fruitsOffered:

FreshFruit apple = new FreshFruit("apple", false);
boolean isAdded = fruitsOffered.add(apple);

The add() method calls the FreshFruit hashCode() method to get the hash code for apple. Let's say the hashCode() method returns 1. We go to store our apple in index 1, but there is something already there! Since the add() method does not allow duplicate elements to be added, we first have to check if the existing element equals the one being added. The FreshFruit's equals() method is used to see if the new element is a duplicate. In this case, pear is compared to apple and equals() returns false. We have ruled out that this is a duplicate item; instead, we have what we call a collision. Remember, we are storing a Collection in each index of our storage array to add our new element to the list stored at index 1. The apple and pear objects are now both stored in the same list at index 1, as shown in Figure 2. The add() method once again returns true.

Figure 2

Figure 2. A diagram that shows three indexes of our storage array: 0, 1, and 2. The pear element is stored in a Collection at index 1. The apple element is also stored in the Collection stored at index 1. The add() method returns true.

Let's take a look at what happens when a duplicate is added:

FreshFruit duplicateApple = new FreshFruit("apple", false);
boolean isAdded = fruitsOffered.add(duplicateApple);

The add() method calls the FreshFruit hashCode() method to get the hash code for duplicateApple. The hashCode() method returns 1. Remember: the hashCode() method should always return the same value for equal objects. hashCode() should always return 1 for any seeded apple; otherwise, we will have some trouble knowing about duplicates! Next, the add() method will check the contents of the Collection in index 1 to check for duplicates. The equals() method is used to make the comparisons. The add() method finds that the first element is equal to duplicateApple. The add() method will return without editing the contents of the Collection and return false. Our Set remains unchanged in Figure 3.

Figure 3

Figure 3. A diagram that shows three indexes of our storage array: 0, 1, and 2. The pear element is stored in a Collection at index 1. The apple element is also stored in the Collection stored at index 1. The add() method returns false.

The contains() method

The contains() method allows us to check if a particular element is present in a Set. If the element is contained in the Set the method will return true. If the element is not included in the Set the method will return false.

We can use the following line of code to check if an apple is present in our Set:

FreshFruit appleCheck = new FreshFruit("apple", false);
boolean isPresent = fruitsOffered.contains(appleCheck);

The contains() method also utilizes the hashCode() method. The element's hash code is used to find out which index in the array storage to look for the element. Each element in the cell's Collection is inspected to see if it is the element the caller is looking for. The equals() method is used to compare the given element and the element in the Collection. If a match is found, true is returned. If none of the elements match, then the contains() method knows the given element doesn't exist in the Set and returns false.

For example, let's say our fruitsOffered currently looks like Figure 4.

Figure 4

Figure 4. A diagram that shows three indexes of our storage array: 0, 1, and 2. The pear element is stored in a Collection at index 1. The apple element is also stored in the Collection stored at index 1.

Let's walk through what happens when we execute the following line of code:

fruitsOffered.contains(appleCheck);

The contains() method calls the FreshFruit hashCode() method to get the hash code for appleCheck. The hashCode() method returns 1. The FreshFruit equals() method is used to compare the given element, appleCheck, with the first element in index 1, pear. They are not equal. The second element is checked, appleCheck = apple, and so a match is found and true is returned.

We say that HashSet's contains() method is also, on average, a constant time, O(1), operation. We need to call the hashCode() method to find the storage location of an element directly. The hashCode() operation will consistently take the same amount of time, regardless of the size of our Set. However, we just saw that once we get to the storage index, we may need to inspect a few entries to determine which value to return. When two objects map to the same location, we call this a collision. The worst-case scenario is when a hashCode() method maps ALL objects to the same place. This means we would have to look through every element on a call to contains(), making it take linear time, O(n). This is an easy to avoid worst-case though. Collisions can be minimized when the element's hashCode() method does an excellent job of providing unique hash codes for unique objects, and we have a large enough storage array to spread the elements across. As an engineer, you are responsible for writing the hashCode() method in classes that you write. At ATA, we recommend using the Objects.hash() method to implement your hashCode() method effectively. It does a great job at spreading hash codes across all the possible int values! The Java implementation of HashSet is smart and will resize the Set to ensure there are enough storage indexes to minimize collisions. This will result in an average constant runtime, O(1).

The remove() method

The remove() method allows us to remove a particular element from a Set if it is present. If the element is contained in the Set the method will return true. If the element is not included in the Set the method will return false.

We would use the following line of code to remove the apple element:

FreshFruit anotherApple = new FreshFruit("apple", false);
boolean wasRemoved = fruitsOffered.remove(anotherApple);

The above code functions very similarly to the contains() method, but in addition to returning whether the element is present, it will also delete it from the Set. Let's say fruitsOffered looks as it does above in Figure 4. The element is located in the same manner as above, but now the element is removed from the cell. The updated fruitsOffered is shown in Figure 5.

Figure 5

Figure 5. A diagram that shows three indexes of our storage array: 0, 1, and 2. After a call to remove an apple, the apple element is removed from index 1, and the method returns true.

Like contains(), remove() is also on average, a constant time, O(1), operation.

Searching for an element in a Set by an attribute

If we need to check if a Set contains an element that matches a specific attribute, and not on all the attributes that make two objects equal we can, but we won't be able to utilize our constant time lookup.

If we wanted to know if we have any grapes in our Amazon Fresh catalog, seeds or no seeds, we could no longer use the contains() method. HashSet doesn't offer a way to search by some partial hash code value. Instead, we can utilize linear search, which has an O(n) runtime, by iterating through our Set.

public boolean containsGrapes(Set<FreshFruit> fruitsOffered) {
    for (FreshFruit fruit : fruitsOffered) {
        if (fruit.getName().equals("grapes")) {
            return true;
        }
    }
    return false;
}

Sets provide a way to maintain a unique collection of elements. HashSet offers constant time, O(1), operations for adding, removing, and checking its contents for the existence of an element.

Code Examples

Here are some examples demonstrating key Set concepts:

Creating and Using a HashSet

import java.util.HashSet;
import java.util.Set;

public class SetExample {
    public static void main(String[] args) {
        // Creating a new HashSet
        Set<String> fruits = new HashSet<>();
        
        // Adding elements to the Set
        fruits.add("Apple");
        fruits.add("Banana");
        fruits.add("Orange");
        fruits.add("Apple");  // Duplicate - will not be added
        
        // Iterating over elements
        System.out.println("Fruits in the set:");
        for (String fruit : fruits) {
            System.out.println(fruit);
        }
        
        // Checking if an element exists
        System.out.println("Contains Apple? " + fruits.contains("Apple"));
        
        // Removing an element
        fruits.remove("Banana");
        
        // Size of the Set
        System.out.println("Number of fruits: " + fruits.size());
    }
}

Using Set Operations

import java.util.HashSet;
import java.util.Set;

public class SetOperationsExample {
    public static void main(String[] args) {
        // First set
        Set<Integer> set1 = new HashSet<>();
        set1.add(1);
        set1.add(2);
        set1.add(3);
        set1.add(4);
        
        // Second set
        Set<Integer> set2 = new HashSet<>();
        set2.add(3);
        set2.add(4);
        set2.add(5);
        set2.add(6);
        
        // Union (create a new set with all elements)
        Set<Integer> union = new HashSet<>(set1);
        union.addAll(set2);
        System.out.println("Union: " + union);
        
        // Intersection (elements present in both sets)
        Set<Integer> intersection = new HashSet<>(set1);
        intersection.retainAll(set2);
        System.out.println("Intersection: " + intersection);
        
        // Difference (elements in set1 but not in set2)
        Set<Integer> difference = new HashSet<>(set1);
        difference.removeAll(set2);
        System.out.println("Difference (set1 - set2): " + difference);
    }
}

Sets Practice

Delivery Drone Repo

Guided Projects

Resources