Return to Course Content Home

Collections

Optional Reading

Read Introduction, Interfaces, Implementations and Algorithms of The Java Tutorial on Collections. I don't expect you to be experts on collections. Instead, I want you to be familiar with some of the basics and remember they exist for later projects.

Why Collections?

If you need to store some information in some type of data structure, the Java Collections provide you with proven implementations so you don't have to write your own. Collections are used to store, retrieve, manipulate, and communicate aggregate similar types of data. The point is that if you find yourself needing some kind of data structure to hold data, look a the Collections API first because you have a lot of options.

It turns out, Sun (Oracle) created a Collections Framework to act as a standard library of Containers. From the Java Tutorial:

A collections framework is a unified architecture for representing and manipulating collections. All collections frameworks contain the following:

Note that what we have here is a set of Containers, with similar methods, and proven algorithms to manipulate your date. Typically, if you have need for a Container for data, the Java Collections Framework probably has something already designed, implemented, tested and published. There is no reason to "re-invent the wheel".

Collection Interfaces

It is mentioned above that the Collection Framework uses Interfaces to abstract data handling. Let us look at java.util.List and see what a sample Collections Framework Interface looks like

First, you'll notice the Interface definition uses Generics, which we will cover in the next module. Generics are a way to put type safety into Collections (they used to be just "Objects").

Interface List<E>

So, we see that the List Interface requires us to specify a type of List as part of its declaration, then if you notice, there are a lot of known implementations listed for the List Interface

AbstractList, AbstractSequentialList, ArrayList, AttributeList, CopyOnWriteArrayList, LinkedList, RoleList, RoleUnresolvedList, Stack, Vector

What is important here is that if you choose to have a list, you can choose "how" you want to implement that list, but it is still a List. You could start with a Vector implementation, then two months later switch to an ArrayList and your code would still work for the List interface methods

 

Some of the main methods in a Collection

Collections have matured quite a bit since their initial introduction. You can take a look at the Collection API and then lets cover some of the interesting methods in a bit more detail (we'll look at most of the methods listed in the API below, but not all of them).

Interestingly, some of the methods in the Collection interface are listed as optional. If you look at the first method below...add()...this doesn't make sense for a read-only implementation of a Collection. In this case two things may happen if you call this "optional" method in a class that doesn't support the operation (1) a UnsupportedOperationException *may* be thrown by the implementing class if the calling of the method will have no effect on the collection. (2) the method may not do anything to the collections, but will return its stated value.

I've taken the summaries of the methods and added a few words to each, note that you can click on any method name to go to the full API documentation of that method.

boolean add(E e)
Ensures that this collection contains the specified element (optional operation).
void clear()
Removes all of the elements from this collection (optional operation).
boolean contains(Object o)
Returns true if this collection contains the specified element. Note that this takes an Object argument, and not the actualy type of Object in the collection
boolean equals(Object o)
Compares the specified object with this collection for equality. Ah ha! Remember when we defined this for our basic Object definitions? The same logic applies for this interface (and implementing classes)
int hashCode()
Returns the hash code value for this collection. Same comment as the equals(object) method above...
boolean isEmpty()
Returns true if this collection contains no elements.
Iterator<E> iterator()
Returns an iterator over the elements in this collection. We'll cover the use of an Iterator in the next few sectons.
default Stream<E> parallelStream()
Returns a possibly parallel Stream with this collection as its source. We haven't (and wont') cover this, but you can get Parallel Streams that will spread tasks out across multiple CPUs/cores.
boolean remove(Object o)
Removes a single instance of the specified element from this collection, if it is present (optional operation).
default boolean removeIf(Predicate<? super E> filter)
Removes all of the elements of this collection that satisfy the given predicate.
boolean retainAll(Collection<?> c)
Retains only the elements in this collection that are contained in the specified collection (optional operation).
int size()
Returns the number of elements in this collection.
default Spliterator<E> spliterator()
Creates a Spliterator over the elements in this collection. What the heck is a Spliterator...right? a Spliterator is a special iterator that breaks a Collection into smaller chunks, that can be passed off to parallel processing code. We will not be covering this in class, but what a name...right?
default Stream<E> stream()
Returns a sequential Stream with this collection as its source. I showed this in the Iterator example above.
Object[] toArray()
Returns an array containing all of the elements in this collection. You can convert your Collection into a traditional array...but notice that it is an array of Objects. You will need to cast the result to the correct type.
<T> T[] toArray(T[] a)
Returns an array containing all of the elements in this collection; the runtime type of the returned array is that of the specified array. Similar to the no-argument version of the method, only this time you are passing a type to the toArray() call, and getting a list back of that type.
 

You can pretty much count on anything in the Collection Framework implementing all of these methods, so they are good to remember!

 

Why Multiple Implementations for the same Interface?

There are already several implementations of some basid interfacest:

General-purpose Implementations
Interfaces
Implementations
Hash table
Resizable Array
Tree
Linked list
Hash table + Linked List
Set HashSet, ConcurrentSkipListSet   TreeSet   LinkedHashSet
List   ArrayList, Vector, Stack   LinkedList  
Queue   ArrayDeque, ConcurrentLinkedQueue, DelayQueue, PriorityBlockingQueue, PriorityQueue, SynchronousQueue   LinkedList, LinkedBlockingDeque, LinkedBlockingQueue,  
Map Hashmap   TreeMap   LinkedHashMap
Deque   ArrayDeque, ConcurrentLinkedQueue, DelayQueue, PriorityBlockingQueue, PriorityQueue, SynchronousQueue  

LinkedList, LinkedBlockingDeque, LinkedBlockingQueue,

 
BlockingDeque       LinkedBlockingDeque,  
NavigableSet ConcurrentSkipListSet        
NavigableMap ConcurrentSkipListMap, TreeMap        

Take a look at the above table. Notice that for the List interface, there are four implementations, ArrayList, Vector, Stack and LinkedList. If they both implement the List interface, why have four?

You can determine which is better for your use by looking in the API. It all comes down to the implementation. ArrayList is implemented using an internal Array. Efficient for most simple list, but if you want to do a lot of dynamic manipulation of the list (adds, inserts, removes) the Array implementation is not the most efficient. LinkedList on the other hand is much better for dynamic lists. The API for LinkedList states that "In addition to implementing the List interface, the LinkedList class provides uniformly named methods to get, remove and insert an element at the beginning and end of the list. These operations allow linked lists to be used as a stack, queue, or double-ended queue."

So, they are both lists, but how the list will be used helps determine which implementation of the List interface to use. The API will tell you how the interface is implemented (HashMap, Array, LinkedList, etc...) so that you can choose the correct implementation.

What about Vector?

Many of you may remember java.util.Vector which was an early Collection-like component from the early days of Java 2 platform v1.2. It was "retrofitted" to implement the List interface and is now a member of the Java Collections (note it wasn't in the table from the required reading). Many people use ArrayList instead of Vector because Vector has built in synchronization (Thread safety), which has a performance cost. Unless you need a "synchronized" ArrayList, you don't need to use Vector.

New since JDK 1.6

Several additions were made to the Collections API that are not talked about in the Optional Reading above. They are:

New Interfaces

New Classes

Updated Classes in Java 6.0

New in Java 8

One of the things that was added to the Collections Framework in Java 8 was the stream() and parallelStream() methods. While we don't tend to cover the advanced topics like Streams in this course, this supports things like lambdas and other more specialized methods to handle processing of large Collections in efficient and even parallel implementations.

Iterating through a Collection

There are actually at least three different ways to iterate through a Collection.

The first way is to use something that is new in JDK 8, and something we really won't cover in this class...but I'll quickly mention it here. As of JDK 8, the recommended way to iterate through a Collection is to use something called a Stream and Lambda Functions. For example, let's say you have our Employee class, and you want to print out people in the Security Department. You get a Stream from the Collection, and then filter and process each element.

employees.stream()
   .filter(e -> getDepartment().equals("Security")
   .forEach(e -> System.out.println(e.getName());

The above code, gets the stream, filters the elements for those where getDepartment() is equal to "Security", and then prints out each one. Again, trying to cover Streams and Lambdas at this point is out of scope, but if you are interested, you can dig into it more.

The second way is to use the for-each Construct

for (Employee emp : employees) {
   if (emp.getDepartment().equals("Security") {
       System.out.println(e.getName());
   }
}

The third way is to use an Iterator to travese the elements of the Collection. An Iterator is actually an Interface that provides the following methods:

public Intgerface Iterator<E> {
  boolean hasNext();
  E next();
  void remove();  // this is an optional method and may not be in every implementation
}

And as is often the case, this example uses Generics, which we will cover in the next section. Let's just say the

You use an Iterator by getting the Iterator from the Collection (it supplies the implementation), and then seeing if there is anything next "in line" in the Collection to process with the Iterator. This is the hasNext() method, and it returns true if there is another element in the Collection. If there is another element, you can call the next() method to access the next Element.

So, lets use an ArrayList as an example. We'll show the code for building the 3 element ArrayList, and then how to use the Iterator.

        ArrayList<Employee> employees = new ArrayList<Employee>();
Employee e = new Employee();
e.setFirstName("Bob");
e.setLastName("Evans");
e.setDepartment("Security");
employees.add(e);
e = new Employee();
e.setFirstName("Robert");
e.setLastName("Mitchell");
e.setDepartment("Security");
employees.add(e);
e = new Employee();
e.setFirstName("Dana");
e.setLastName("Schlichting");
e.setDepartment("Help Desk");
employees.add(e);
Iterator i = employees.iterator();
while (i.hasNext()){
Employee emp = (Employee)i.next();
if (emp.getDepartment().equals("Security")) {
System.out.println(emp.getName());
}
}

There is something else interesting about the use of the iterator

You then call the next()

Basic Collections Types (Interfaces)

The following list describes the different types, or Interfaces that exist in the Collections Framework

Practice

Download the collections Eclipse project, unzip it and load it (or just view the source if you aren't using Eclipse).

In this example, I make some very simple collections using different implementations. Run the code and look at the output.

Try removing the equals and hashCode methods of Test and see what happens to the HashSet examples. Why do you think this is?

Also pay attention to the order of the elements vs. how they had been added. Why does the HashSet not come out in the order added?