Core Java

Java Best Practices – Vector vs ArrayList vs HashSet

Photo of Byron KiourtzoglouByron KiourtzoglouAugust 16th, 2010Last Updated: December 9th, 2019
6 7,316 9 minutes read

Continuing our series of articles concerning proposed practices while working with the Java programming language, we are going to perform a performance comparison between the three probably most usedCollection implementation classes. To make things more realistic we are going to test against a multi–threading environment so as to discuss and demonstrate how to utilizeVector,ArrayList and/orHashSet for high performance applications.

All discussed topics are based on use cases derived from the development of mission critical, ultra high performance production systems for the telecommunication industry.

Prior reading each section of this article it is highly recommended that you consult the relevant Java API documentation for detailed information and code samples.

All tests are performed against a Sony Vaio with the following characteristics :

  • System : openSUSE 11.1 (x86_64)
  • Processor (CPU) : Intel(R) Core(TM)2 Duo CPU T6670 @ 2.20GHz
  • Processor Speed : 1,200.00 MHz
  • Total memory (RAM) : 2.8 GB
  • Java : OpenJDK 1.6.0_0 64-Bit

The following test configuration is applied :

  • Concurrent worker Threads : 50
  • Test repeats per worker Thread : 100
  • Overall test runs : 100

Vector vs ArrayList vs HashSet

One of the most common tasks a Java developer has to implement is storing and retrieving objects fromCollections. The Java programming language provides a handful ofCollection implementation classes with both overlapping and unique characteristics.Vector,ArrayList andHashSet are probably the most frequently used.

Nevertheless working withCollection implementation classes, especially in a multi–threading environment, can by tricky. The majority of them does not provide synchronized access by default. As a consequence, altering the internal structure of aCollection (inserting or retracting elements) in a concurrent manner will certainly lead to errors.

The case scenario that will be discussed here is having multipleThreads inserting, retracting and iterating through elements of everyCollection implementation class mentioned above. We are going to demonstrate how to properly utilize the aforementioned collection implementation classes in a multi–threading environment and provide relevant performance comparison charts so as to show which one performs better in every test case.

To make a fare comparison we will assume that no NULL elements are allowed and that we do not mind the ordering of elements in theCollections. Furthermore sinceVector is the onlyCollection implementation class of our test group that provides synchronized access by default, synchronization forArrayList andHashSetCollection implementation classes will be achieved usingCollections.synchronizedList andCollections.synchronizedSet static methods. These methods provide a “wrapped” synchronized instance of a designatedCollection implementation class as shown below :

  • List syncList = Collections.synchronizedList(new ArrayList());
  • Set syncSet = Collections.synchronizedSet(new HashSet());

Test case #1 – Adding elements in a collection

For the first test case we are going to have multipleThreads adding String elements in eachCollection implementation class. In order to maintain uniqueness among String elements, we will construct them as shown below :

  • A static first part e.g. “helloWorld”
  • The worker Thread id, remember that we have 50 worker Threads running concurrently
  • The worker Thread test repeat number, remember that each worker thread performs 100 test repeats for each test run

For every test run each worker Thread will insert 100 String elements, as shown below :

  • For the first test repeat
    • Worker Thread #1 will insert the String element : “helloWorld-1-1”
    • Worker Thread #2 will insert the String element : “helloWorld-2-1”
    • Worker Thread #3 will insert the String element : “helloWorld-3-1” etc …
  • For the second test repeat
    • Worker Thread #1 will insert the String element : “helloWorld-1-2”
    • Worker Thread #2 will insert the String element : “helloWorld-2-2”
    • Worker Thread #3 will insert the String element : “helloWorld-3-2” etc …
  • etc …

At the end of each test run everyCollection implementation class will be populated with 5000 distinct String elements.

Below we present a performance comparison chart between the three aforementionedCollection implementation classes

The horizontal axis represents the number of test runs and the vertical axis the average transactions per second (TPS) for each test run. Thus higher values are better. As you can seeVector andArrayListCollection implementation classes performed almost identically when adding elements to them. On the other hand theHashSetCollection implementation class presented a slightly inferior performance mainly due to the more complex internal structure and the hash generation mechanism.

Test case #2 – Removing elements from a collection

For the second test case we are going to have multipleThreads removing String elements from eachCollection implementation class. All collection implementation classes will be pre–populated with the String elements from the previous test case. For removing elements we will be utilizing a sharedIterator instance among all workerThreads for eachCollection implementation class. Synchronized access to theIterator instance will also be implemented. Every workerThread will be removing the next available element of theCollection implementation class, issuing the “next()” and “remove()”Iterator operations (to avoidConcurrentModificationException). Below is the performance comparison chart for the aforementioned test case.

The horizontal axis represents the number of test runs and the vertical axis the average transactions per second (TPS) for each test run. Thus higher values are better. Again bothVector andArrayListCollection implementation classes performed almost identically when removing String elements from them. On the other hand theHashSetCollection implementation class outperformedVector andArrayList by far, resulting in 678000 TPS on average.

At this point we must pinpoint that by using the “remove(0)” method ofVector andArrayListCollection implementation classes to remove String elements, we have achieved slightly better performance results compared to utilizing a synchronized sharedIterator instance. The reason that we have demonstrated the synchronized sharedIterator instance “next()” and “remove()” operations approach is to maintain a fare comparison between the threeCollection implementation classes.

IMPORTANT NOTICE

Our test case scenario dictates that we remove the first element from eachCollection implementation class. Nevertheless we must pinpoint that this applies as the worst case scenario forVector andArrayListCollection implementation classes. As one of our readers, James Watson, successfully commented on a relevantpost fromTheServerSide

The reason is that on ArrayList and Vecrtor, the remove() method will result in a System.arraycopy() call if any element except the last element is removed (the last element being the element with index: size – 1). Removing the first element means the entire rest of the array is copied which is an O(n) operation. Since the test removes all the elements in the List, the full test becomes O(n^2) (slow.)

HashSet remove does not do any such array copies so it’s remove is O(1) or constant time. For the full test it then is O(n) (fast.). If the tests were rewritten to remove the last element from the ArrayList and Vector, you would likely similar performance to the HashSet.

For that reason we will conduct this test again, removing the last element fromVector andArrayListCollection implementation classes since we presume that the order of elements in theCollections is of no importance. The performance results are show below.

The horizontal axis represents the number of test runs and the vertical axis the average transactions per second (TPS) for each test run. Thus higher values are better. As expected allCollection implementation classes performed almost identically when removing String elements from them.

Test case #3 – Iterators

For the third test case we are going to have multiple workerThreads iterating over the elements of eachCollection implementation class. Every workerThread will be using theCollection “iterator()” operation to retrieve a reference to anIterator instance and iterate through all the availableCollection elements using theIterator “next()” operation. AllCollection implementation classes will be pre–populated with the String values from the first test case. Below is the performance comparison chart for the aforementioned test case.

The horizontal axis represents the number of test runs and the vertical axis the average transactions per second (TPS) for each test run. Thus higher values are better. BothVector andHashSetCollection implementation classes performed poorly compared to theArrayListCollection implementation class.Vector scored 68 TPS on average, whileHashSet scored 9200 TPS on average. On the other hand theArrayList outperformedVector andHashSet by far, resulting in 421000 TPS on average.

Test case #4 – Adding and removing elements

For our final test case we are going to implement the combination of test case #1 and test case #2 scenarios. One group of workerThreads is going to insert String elements to everyCollection implementation class, whereas another group of workerThreads is going to retract the String elements from them.

In the combined (adding and retracting elements) operation the synchronized sharedIterator instance approach for removing elements cannot be used. Adding and removing elements concurrently from a singleCollection eliminates the use of a sharedIterator instance due to the fact that the internal structure of theCollection is constantly changing. ForVector andArrayListCollection implementation classes the aforementioned limitation can be bypassed by using the “remove(0)” operation, which retracts the first element from aCollection internal storage. Unfortunately theHashSetCollection implementation class does not provide such functionality. Our proposed way for achieving maximum performance results for the combined operation using theHashSetCollection implementation class is the following :

  • Use two distinct HashSets, one for adding elements and the other for retracting
  • Implement a “controller”Thread that will swap the contents of the aforementionedHashSet classes when the “retracting”HashSet is empty. The “controller”Thread can be implemented as a regularTimerTask that can check the contents of the “retracting”HashSet at regular time intervals and perform the swap if needed
  • A sharedIterator instance for the “retracting”HashSet should be created upon swapping the two HashSets
  • All workerThreads that retract elements should wait for the “retracting”HashSet to be filled with elements and be notified after the swap

What follows is a code snippet displaying the proposed implementation :

Global Declarations :

Set<string> s = new HashSet<string>(); // The HashSet for retracting elementsIterator<string> sIt = s.iterator(); // The shared Iterator for retracting elementsSet<string> sAdd = new HashSet<string>(); // The HashSet for adding new elementsBoolean read = Boolean.FALSE; // Helper Object for external synchronization and wait – notify functionality when retracting elements from the “s” HashSetBoolean write = Boolean.FALSE; // Helper Object for external synchronization when writing elements to the “sAdd” HashSet

Code for adding elements to the “sAdd”HashSet :

synchronized(write) { sAdd.add(“helloWorld” + "-" + threadId + "-" + count);}

Code for retracting elements from the “s”HashSet :

while (true) { synchronized (read) {  try {   sIt.next();   sIt.remove();   break;  } catch (NoSuchElementException e) {   read.wait();  } }}

The “controller” class code :

public class Controller  extends TimerTask {  public void run() {  try {   performSwap();  } catch (Exception ex) {   ex.printStackTrace();  } }   private void performSwap() throws Exception {  synchronized(read) {   if(s.isEmpty()) {    synchronized(write) {     if(!sAdd.isEmpty()) {      Set<string> tmpSet;      tmpSet = s;      s = sAdd;      sAdd = tmpSet;      sIt = s.iterator();      read.notifyAll();     }    }   }  } }}

Finally, the code to schedule the “controller”TimerTask

Timer timer = new Timer();timer.scheduleAtFixedRate(new Controller(), 0, 1);

We should start the “controller” task prior starting the workerThreads that write to and read from the relevant HashSets.

Keep in mind that for theVector andArrayListCollection implementation classes we have used the “remove(0)” operation to retract elements. Below are the code snippets for the aforementionedCollection implementation classes :

while (true) { try {  vector.remove(0);  break; } catch (ArrayIndexOutOfBoundsException e) { }}while (true) { try {  syncList.remove(0);  break; } catch (IndexOutOfBoundsException e) { }}

Below is the performance comparison chart for the addition part of the aforementioned test case.

Following is the performance comparison chart for the retraction part of the aforementioned test case.

The horizontal axis represents the number of test runs and the vertical axis the average transactions per second (TPS) for each test run. Thus higher values are better. BothVector andArrayListCollection implementation classes performed almost identically when adding and retracting elements from them. On the other hand for the element addition test case our proposed implementation, with the “adding” and “retracting”HashSet pair, performed slightly inferiorly compared toVector andArrayList implementations. Nevertheless for the element retraction test case our “adding” and “retracting”HashSet pair implementation outperformed bothVector andArrayList implementations by far scoring 6000 TPS on average.

Happy coding

Justin

Related Articles :
Related Snippets :
Do you want to know how to develop your skillset to become aJava Rockstar?
Subscribe to our newsletter to start Rockingright now!
To get you started we give you our best selling eBooks forFREE!
1. JPA Mini Book
2. JVM Troubleshooting Guide
3. JUnit Tutorial for Unit Testing
4. Java Annotations Tutorial
5. Java Interview Questions
6. Spring Interview Questions
7. Android UI Design
and many more ....
I agree to theTerms andPrivacy Policy

Thank you!

We will contact you soon.

Photo of Byron KiourtzoglouByron KiourtzoglouAugust 16th, 2010Last Updated: December 9th, 2019
6 7,316 9 minutes read
Photo of Byron Kiourtzoglou

Byron Kiourtzoglou

Byron is a master software engineer working in the IT and Telecom domains. He is an applications developer in a wide variety of applications/services. He is currently acting as the team leader and technical architect for a proprietary service creation and integration platform for both the IT and Telecom industries in addition to a in-house big data real-time analytics solution. He is always fascinated by SOA, middleware services and mobile development. Byron is co-founder and Executive Editor atJava Code Geeks.
Subscribe
Notify of
guest
I agree to theTerms andPrivacy Policy
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.

I agree to theTerms andPrivacy Policy
The comment form collects your name, email and content to allow us keep track of the comments placed on the website. Please read and accept our website Terms and Privacy Policy to post a comment.