Expand description
Collection types.
Rust’s standard collection library provides efficient implementations of themost common general purpose programming data structures. By using thestandard implementations, it should be possible for two libraries tocommunicate without significant data conversion.
To get this out of the way: you should probably just useVec orHashMap.These two collections cover most use cases for generic data storage andprocessing. They are exceptionally good at doing what they do. All the othercollections in the standard library have specific use cases where they arethe optimal choice, but these cases are borderlineniche in comparison.Even whenVec andHashMap are technically suboptimal, they’re probably agood enough choice to get started.
Rust’s collections can be grouped into four major categories:
- Sequences:
Vec,VecDeque,LinkedList - Maps:
HashMap,BTreeMap - Sets:
HashSet,BTreeSet - Misc:
BinaryHeap
§When Should You Use Which Collection?
These are fairly high-level and quick break-downs of when each collectionshould be considered. Detailed discussions of strengths and weaknesses ofindividual collections can be found on their own documentation pages.
§Use aVec when:
- You want to collect items up to be processed or sent elsewhere later, anddon’t care about any properties of the actual values being stored.
- You want a sequence of elements in a particular order, and will only beappending to (or near) the end.
- You want a stack.
- You want a resizable array.
- You want a heap-allocated array.
§Use aVecDeque when:
- You want a
Vecthat supports efficient insertion at both ends of thesequence. - You want a queue.
- You want a double-ended queue (deque).
§Use aLinkedList when:
- You want a
VecorVecDequeof unknown size, and can’t tolerateamortization. - You want to efficiently split and append lists.
- You areabsolutely certain youreally,truly, want a doubly linkedlist.
§Use aHashMap when:
- You want to associate arbitrary keys with an arbitrary value.
- You want a cache.
- You want a map, with no extra functionality.
§Use aBTreeMap when:
- You want a map sorted by its keys.
- You want to be able to get a range of entries on-demand.
- You’re interested in what the smallest or largest key-value pair is.
- You want to find the largest or smallest key that is smaller or largerthan something.
§Use theSet variant of any of theseMaps when:
- You just want to remember which keys you’ve seen.
- There is no meaningful value to associate with your keys.
- You just want a set.
§Use aBinaryHeap when:
- You want to store a bunch of elements, but only ever want to process the“biggest” or “most important” one at any given time.
- You want a priority queue.
§Performance
Choosing the right collection for the job requires an understanding of whateach collection is good at. Here we briefly summarize the performance ofdifferent collections for certain important operations. For further details,see each type’s documentation, and note that the names of actual methods maydiffer from the tables below on certain collections.
Throughout the documentation, we will adhere to the following conventionsfor operation notation:
- The collection’s size is denoted by
n. - If a second collection is involved, its size is denoted by
m. - Item indices are denoted by
i. - Operations which have anamortized cost are suffixed with a
*. - Operations with anexpected cost are suffixed with a
~.
Calling operations that add to a collection will occasionally require acollection to be resized - an extra operation that takesO(n) time.
Amortized costs are calculated to account for the time cost of such resizeoperationsover a sufficiently large series of operations. An individualoperation may be slower or faster due to the sporadic nature of collectionresizing, however the average cost per operation will approach the amortizedcost.
Rust’s collections never automatically shrink, so removal operations aren’tamortized.
HashMap usesexpected costs. It is theoretically possible, though veryunlikely, forHashMap to experience significantly worse performance thanthe expected cost. This is due to the probabilistic nature of hashing - i.e.it is possible to generate a duplicate hash given some input key that willrequires extra computation to correct.
§Cost of Collection Operations
| get(i) | insert(i) | remove(i) | append(Vec(m)) | split_off(i) | range | append | |
|---|---|---|---|---|---|---|---|
Vec | O(1) | O(n-i)* | O(n-i) | O(m)* | O(n-i) | N/A | N/A |
VecDeque | O(1) | O(min(i,n-i))* | O(min(i,n-i)) | O(m)* | O(min(i,n-i)) | N/A | N/A |
LinkedList | O(min(i,n-i)) | O(min(i,n-i)) | O(min(i,n-i)) | O(1) | O(min(i,n-i)) | N/A | N/A |
HashMap | O(1)~ | O(1)~* | O(1)~ | N/A | N/A | N/A | N/A |
BTreeMap | O(log(n)) | O(log(n)) | O(log(n)) | N/A | N/A | O(log(n)) | O(n+m) |
Note that where ties occur,Vec is generally going to be faster thanVecDeque, andVecDeque is generally going to be faster thanLinkedList.
For Sets, all operations have the cost of the equivalent Map operation.
§Correct and Efficient Usage of Collections
Of course, knowing which collection is the right one for the job doesn’tinstantly permit you to use it correctly. Here are some quick tips forefficient and correct usage of the standard collections in general. Ifyou’re interested in how to use a specific collection in particular, consultits documentation for detailed discussion and code examples.
§Capacity Management
Many collections provide several constructors and methods that refer to“capacity”. These collections are generally built on top of an array.Optimally, this array would be exactly the right size to fit only theelements stored in the collection, but for the collection to do this wouldbe very inefficient. If the backing array was exactly the right size at alltimes, then every time an element is inserted, the collection would have togrow the array to fit it. Due to the way memory is allocated and managed onmost computers, this would almost surely require allocating an entirely newarray and copying every single element from the old one into the new one.Hopefully you can see that this wouldn’t be very efficient to do on everyoperation.
Most collections therefore use anamortized allocation strategy. Theygenerally let themselves have a fair amount of unoccupied space so that theyonly have to grow on occasion. When they do grow, they allocate asubstantially larger array to move the elements into so that it will take awhile for another grow to be required. While this strategy is great ingeneral, it would be even better if the collectionnever had to resize itsbacking array. Unfortunately, the collection itself doesn’t have enoughinformation to do this itself. Therefore, it is up to us programmers to giveit hints.
Anywith_capacity constructor will instruct the collection to allocateenough space for the specified number of elements. Ideally this will be forexactly that many elements, but some implementation details may preventthis. See collection-specific documentation for details. In general, usewith_capacity when you know exactly how many elements will be inserted, orat least have a reasonable upper-bound on that number.
When anticipating a large influx of elements, thereserve family ofmethods can be used to hint to the collection how much room it should makefor the coming items. As withwith_capacity, the precise behavior ofthese methods will be specific to the collection of interest.
For optimal performance, collections will generally avoid shrinkingthemselves. If you believe that a collection will not soon contain any moreelements, or just really need the memory, theshrink_to_fit method promptsthe collection to shrink the backing array to the minimum size capable ofholding its elements.
Finally, if ever you’re interested in what the actual capacity of thecollection is, most collections provide acapacity method to query thisinformation on demand. This can be useful for debugging purposes, or foruse with thereserve methods.
§Iterators
Iteratorsare a powerful and robust mechanism used throughout Rust’sstandard libraries. Iterators provide a sequence of values in a generic,safe, efficient and convenient way. The contents of an iterator are usuallylazily evaluated, so that only the values that are actually needed areever actually produced, and no allocation need be done to temporarily storethem. Iterators are primarily consumed using afor loop, although manyfunctions also take iterators where a collection or sequence of values isdesired.
All of the standard collections provide several iterators for performingbulk manipulation of their contents. The three primary iterators almostevery collection should provide areiter,iter_mut, andinto_iter.Some of these are not provided on collections where it would be unsound orunreasonable to provide them.
iter provides an iterator of immutable references to all the contents of acollection in the most “natural” order. For sequence collections likeVec,this means the items will be yielded in increasing order of index startingat 0. For ordered collections likeBTreeMap, this means that the itemswill be yielded in sorted order. For unordered collections likeHashMap,the items will be yielded in whatever order the internal representation mademost convenient. This is great for reading through all the contents of thecollection.
iter_mut provides an iterator ofmutable references in the same order asiter. This is great for mutating all the contents of the collection.
into_iter transforms the actual collection into an iterator over itscontents by-value. This is great when the collection itself is no longerneeded, and the values are needed elsewhere. Usingextend withinto_iteris the main way that contents of one collection are moved into another.extend automatically callsinto_iter, and takes anyT:IntoIterator.Callingcollect on an iterator itself is also a great way to convert onecollection into another. Both of these methods should internally use thecapacity management tools discussed in the previous section to do this asefficiently as possible.
Iterators also provide a series ofadapter methods for performing commonthreads to sequences. Among the adapters are functional favorites likemap,fold,skip andtake. Of particular interest to collections is therev adapter, which reverses any iterator that supports this operation. Mostcollections provide reversible iterators as the way to iterate over them inreverse order.
Several other collection methods also return iterators to yield a sequenceof results but avoid allocating an entire collection to store the result in.This provides maximum flexibility ascollect orextend can be called to“pipe” the sequence into any collection if desired. Otherwise, the sequencecan be looped over with afor loop. The iterator can also be discardedafter partial use, preventing the computation of the unused items.
§Entries
Theentry API is intended to provide an efficient mechanism formanipulating the contents of a map conditionally on the presence of a key ornot. The primary motivating use case for this is to provide efficientaccumulator maps. For instance, if one wishes to maintain a count of thenumber of times each key has been seen, they will have to perform someconditional logic on whether this is the first time the key has been seen ornot. Normally, this would require afind followed by aninsert,effectively duplicating the search effort on each insertion.
When a user callsmap.entry(key), the map will search for the key andthen yield a variant of theEntry enum.
If aVacant(entry) is yielded, then the keywas not found. In this casethe only valid operation is toinsert a value into the entry. When this isdone, the vacant entry is consumed and converted into a mutable reference tothe value that was inserted. This allows for further manipulation of thevalue beyond the lifetime of the search itself. This is useful if complexlogic needs to be performed on the value regardless of whether the value wasjust inserted.
If anOccupied(entry) is yielded, then the keywas found. In this case,the user has several options: they canget,insert orremove thevalue of the occupied entry. Additionally, they can convert the occupiedentry into a mutable reference to its value, providing symmetry to thevacantinsert case.
§Examples
Here are the two primary ways in whichentry is used. First, a simpleexample where the logic performed on the values is trivial.
§Counting the number of times each character in a string occurs
usestd::collections::btree_map::BTreeMap;letmutcount = BTreeMap::new();letmessage ="she sells sea shells by the sea shore";forcinmessage.chars() {*count.entry(c).or_insert(0) +=1;}assert_eq!(count.get(&'s'),Some(&8));println!("Number of occurrences of each character");for(char, count)in&count {println!("{char}: {count}");}When the logic to be performed on the value is more complex, we may simplyuse theentry API to ensure that the value is initialized and perform thelogic afterwards.
§Tracking the inebriation of customers at a bar
usestd::collections::btree_map::BTreeMap;// A client of the bar. They have a blood alcohol level.structPerson { blood_alcohol: f32 }// All the orders made to the bar, by client ID.letorders =vec![1,2,1,2,3,4,1,2,2,3,4,1,1,1];// Our clients.letmutblood_alcohol = BTreeMap::new();foridinorders {// If this is the first time we've seen this customer, initialize them // with no blood alcohol. Otherwise, just retrieve them.letperson = blood_alcohol.entry(id).or_insert(Person { blood_alcohol:0.0});// Reduce their blood alcohol level. It takes time to order and drink a beer!person.blood_alcohol*=0.9;// Check if they're sober enough to have another beer.ifperson.blood_alcohol >0.3{// Too drunk... for now.println!("Sorry {id}, I have to cut you off"); }else{// Have another!person.blood_alcohol +=0.1; }}§Insert and complex keys
If we have a more complex key, calls toinsert willnot update the value of the key. For example:
usestd::cmp::Ordering;usestd::collections::BTreeMap;usestd::hash::{Hash, Hasher};#[derive(Debug)]structFoo { a: u32, b:&'staticstr,}// we will compare `Foo`s by their `a` value only.implPartialEqforFoo {fneq(&self, other:&Self) -> bool {self.a == other.a }}implEqforFoo {}// we will hash `Foo`s by their `a` value only.implHashforFoo {fnhash<H: Hasher>(&self, h:&mutH) {self.a.hash(h); }}implPartialOrdforFoo {fnpartial_cmp(&self, other:&Self) ->Option<Ordering> {self.a.partial_cmp(&other.a) }}implOrdforFoo {fncmp(&self, other:&Self) -> Ordering {self.a.cmp(&other.a) }}letmutmap = BTreeMap::new();map.insert(Foo { a:1, b:"baz"},99);// We already have a Foo with an a of 1, so this will be updating the value.map.insert(Foo { a:1, b:"xyz"},100);// The value has been updated...assert_eq!(map.values().next().unwrap(),&100);// ...but the key hasn't changed. b is still "baz", not "xyz".assert_eq!(map.keys().next().unwrap().b,"baz");Modules§
- binary_
heap - A priority queue implemented with a binary heap.
- btree_
map - An ordered map based on a B-Tree.
- btree_
set - An ordered set based on a B-Tree.
- hash_
map - A hash map implemented with quadratic probing and SIMD lookup.
- hash_
set - A hash set implemented as a
HashMapwhere the value is(). - linked_
list - A doubly-linked list with owned nodes.
- vec_
deque - A double-ended queue (deque) implemented with a growable ring buffer.
Structs§
- BTree
Map - An ordered map based on aB-Tree.
- BTree
Set - An ordered set based on a B-Tree.
- Binary
Heap - A priority queue implemented with a binary heap.
- HashMap
- Ahash map implemented with quadratic probing and SIMD lookup.
- HashSet
- Ahash set implemented as a
HashMapwhere the value is(). - Linked
List - A doubly-linked list with owned nodes.
- TryReserve
Error - The error type for
try_reservemethods. - VecDeque
- A double-ended queue implemented with a growable ring buffer.
Enums§
- TryReserve
Error Kind Experimental - Details of the allocation that caused a
TryReserveError