Movatterモバイル変換


[0]ホーム

URL:


The Secret Life of Cows

Created by and published inPascal's Scribbles. Labeled asrust.

A lot of people at RustFest Paris mentioned Cows– which may be surprising if you’ve never seenstd::borrow::Cow!

Cow in this context stands for “Clone on Write” andis a type that allows you to reuse data if it is not modified.Somehow, these bovine super powers of Rust’s standard libraryappear to be a well-kept secreteven though they arenot new.This post will dig into this very useful pointer type byexplaining why in systems programming languages you need such fine control,explain Cows in detail,and compare them to other ways of organizing your data.

Contents

  1. Organizing Data
    1. Where Does Our Data Live
    2. Structuring Data
    3. Dropping Data
    4. No Needless Copying
  2. What is a Cow Anyway
    1. A std Example
    2. A Beefy Definition
  3. Cows in the Wild
    1. Mixed Static and Dynamic Strings
    2. Benchmarks
    3. Serde

Organizing Data

This is what it all comes down to:People want to have a good, precise way to organize their data.And they want their programming language to support them.That’s why a lot of newer languages include a bunch of data structuresoptimized for different use cases,and that is also why software developers are dealing with API documentation so often.To ensure that your code has the performance characteristics you expect,it is essential to know which piece of data is represented in which way.

In systems programming languages,this is in some regards even more important.You want to know:

  1. exactly where your data lives,
  2. that it is efficiently stored,
  3. that it is removed as soon as you stop using it,
  4. and that you don’t copy it around needlessly.

Ensuring all these properties is a great way to write fast programs.Let’s look at how we can do this in Rust.

Where Does Our Data Live

It is quite explicit where your data lives.By default, primitive types and structs containing primitive types are allocated on the stack,without any dynamic memory allocation.If you want to store data of a size only known at runtime(say the text content of a file),you need to use a type that dynamically allocates memory (on the heap),for exampleString, orVec.You can explicitly allocate a data type on the heap by wrapping it in aBox.

(If you’re unfamiliar with the notion of “Stack and Heap”,you can find a good explanation inthis chapterof the official Rust book.)

Note: Creating a new (not-empty)String means allocating memory,which is a somewhat costly operation.A language like Rust gives you quite a few options toskip some allocations,and doing so can speed up performance-critical parts of your code significantly.(Spoiler:Cow is one of these options.)

Structuring Data

If you know what you will do with your data,you can probably figure out how to best store it.If you for example always iterate through a known list of values, an array (or aVec) is the way to go.If you need to look up values by known keys, and don’t care about the order they are stored in, ahash map sounds good.If you need a stack to put data onto from different threads, you can usecrossbeam-deque.This is just to give you a few examples – there are books on this topic and you should read them.ACow doesn’t really help you here per-se, but you can use itinside your data structures.

Dropping Data

Luckily, in Rust it is easy tomake sure our data gets removed from memoryas soon as possible(so we don’t use up too much memory and slow down the system).Rust uses the ownership model of automaticallydropping resources when they go out of scope,so it doesn’t need to periodically run a garbage collector to free memory.You can still waste memory, of course, by allocating too much of it manually,or by building reference cycles and leaking it.

No Needless Copying

One important step towards being a responsible citizen in regard to memory usage is to not copy data more than necessary.If you for example have a function that removes whitespace at the beginning of a string,you could create a new string that just contains the characters after the leading whitespace.(Remember: A new string means a new memory allocation.)Or, you could return aslice of the original string, that starts after the leading whitespace.The second options requires that we keep the original data around,because our new slice is just referencing it internally.This means that instead of copying however many bytes your string contains,we just write two numbers:A pointer to the point in the original string after the leading whitespace,and the length of the remaining string that we care about.(Carrying the length with us is a convention in Rust.)

But what about a more complicated function?Let’s imagine we want to replace some characters in a string.Do we always need to copy it over with the characters swapped out?Or can we be clever and return some pointer to the original string if there was no replacement needed?Indeed, in Rust we can! This is whatCow is all about.

What is a Cow Anyway

In Rust, the abbreviation “Cow” stands for “clone on write”1.It is an enum with two states:Borrowed andOwned.This means you can use it to abstract overwhether you own the data or just have a reference to it.This is especially useful when you want toreturn a typefrom a function that may or may not need to allocate.

A std Example

Let’s look at an example.Say you have aPath and want to convert it to a string.Sadly, not every filesystem path is valid UTF-8(Rust strings are guaranteed to be UTF-8 encoded).Rust has a handy function to get a string regardless:Path::to_string_lossy.When the path is valid UTF-8 already,it will return a reference to the original data,otherwise it will create a new stringwhere invalid characters are replaced with the character.

usestd::borrow::Cow;usestd::path::Path;letpath=Path::new("foo.txt");matchpath.to_string_lossy(){Cow::Borrowed(_str_ref)=>println!("path was valid UTF-8"),Cow::Owned(_new_string)=>println!("path was not valid UTF-8"),}

A Beefy Definition

With that in mind, let’s look atthe actual definition ofCow:

enumCow<'a,B:ToOwned+?Sized+'a>{/// Borrowed data.Borrowed(&'aB),/// Owned data.Owned(<BasToOwned>::Owned),}

As you can see, it takes some convincing to have Rust accept this typein a way we can work with it.Let’s go through it one by one.

Alright, so far so good!Let me just point out one thing though:If you want to store a&'input str in a Cow (UsingCow::Borrowed(&'input str) for example), what is the concrete type of the Cow?(The generic one isCow<'a, T>.)

Right!Cow<'input, str>!The type definition for theBorrowed variant contains&'a T,so our generic type is the type we refer to.This also means thatToOwned doesn’t need to be implemented for references,but for concrete types, likestr andPath.

Let me note something about that lifetime the Cow carries with it real quick:If you want to replace the type ofbar instruct Foo { bar: String }with aCow,you’ll have to specify the lifetime of the reference theCow can include:struct Foo<'a> { bar: Cow<'a, str> }.This means that every time you nowuseFoo that lifetime will be tracked,and every time you take or returnFoo you might just need to annotate it.

One easy way around this is to use'static':You can omit the lifetime annotation on your struct,but your Cow can only contain references to static memory.This might sound less useful than a generic lifetime– that’s because it is –but in case of functions and types that either contain or returnnew data or static defaults known at compile-timeit can be enough.

Cows in the Wild

Knowing Cows in theory is fine and dandy,but the examples we’ve seen so faronly give you a small glance at when they can be used in practice.Sadly, as it turns out, not many Rust APIs expose Cows.Maybe, because they are seen as a thing you can introduce when you have a performance bottleneck,or maybe it’s because people don’t want to add lifetime annotations to theirstructs(and they don’t want to or can’t useCow<'static, T>).

Mixed Static and Dynamic Strings

One very cool use-case for Cows iswhen dealing with functions thateither return static strings (i.e., strings you literally write in your source code)or dynamic strings that get put together at run time.TheProgramming Rust book by Jim Blandy and Jason Orendorffcontains an example like this:

usestd::borrow::Cow;fndescribe(error:&Error)->Cow<'static,str>{match*error{Error::NotFound=>"Error: Not found".into(),Error::Custom(e)=>format!("Error: {}",e).into(),}}

Small aside:See how we are using theInto trait hereto make constructing cows super concise?Into is the inverse ofFromand is implemented for all types that implementFrom.So, the compiler knows that we want aCow<'static, str>,and gave it aString or a&'static str.Lucky for us,impl<'a> From<&'a str> for Cow<'a, str>andimpl<'a> From<String> for Cow<'a, str>are in the standard library,so rustc can find and call these!

Why is this a very cool example?Reddit user0x7CFE put it likethis:

The most important thing is that sinceCow<'static, str> derefs2 to&strit may act as a drop-in replacement everywhere, where&str is expected.So, if last error variant was added to an already existing code base,all would just work without any major refactoring.

In other languages like C++ you’d probably have to decide,[either] to return allocating version likestd::string everywhereor get rid of the details and suffer from poor ergonomics,where you’d need to use such [a] method.Even worse, error entry with extra details may be very rareand yet, you’d need to make everything allocate just to stick it all in.

Rust provides a solution that is zero cost for cases where extra details are not needed.It’s a brilliant example of “pay only for what you use” principle in action.

Benchmarks

One example for improving program performance by using a Cow isthis part of the Regex Redux micro-benchmark.The trick is to store a reference to the data at firstand replace it with owned data during the loop’s iterations.

Serde

A great example for how you can use the super powers of Cowsin your own structsto refer to input data instead of copying it overisSerde’s#[serde(borrow)] attribute.If you have a struct like

#[derive(Debug,Deserialize)]structFoo<'input>{bar:Cow<'input,str>,}

Serde will by default fill thatbar Cow with an ownedString (playground).If you however write it like

#[derive(Debug,Deserialize)]structFoo<'input>{#[serde(borrow)]bar:Cow<'input,str>,}

Serde will try to create a borrowed version of the Cow (playground).

This will only work, however, when the input string doesn’t need to be adjusted.So, for example,when you deserialize a JSON string that has escaped quotes in it3Serde will have to allocate a new string to store the unescaped representation,and will thus give you aCow::Owned (playground).


Thanks to Robert Balicki, Alex Kitchens, and Matt Brubeck for reviewing this post!And also thanks to Brad Gibson forasking about a better explanation on the?Sized business– which took me less than two years to resolve!

  1. Yes, that’s right:Clone on write, notcopy on write. That’s because in Rust, theCopy trait is guaranteed to be a simplememcpy operation, whileClone can also do custom logic (like recursively clone aHashMap<String, String>

  2. Thanks to an implementation of theDeref trait, you can use a reference to aCow<'static, str> in place of a&str. That means, aCow<'static, str> can be seen a reference to a string without having to convert it. 

  3. "\"Escaped strings contain backslashes\", he said." 


[8]ページ先頭

©2009-2025 Movatter.jp