Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

A string type for Rust that is not required to be valid UTF-8.

License

Unknown and 2 other licenses found

Licenses found

Unknown
COPYING
Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT
NotificationsYou must be signed in to change notification settings

BurntSushi/bstr

This crate provides extension traits for&[u8] andVec<u8> that enabletheir use as byte strings, where byte strings areconventionally UTF-8. Thisdiffers from the standard library'sString andstr types in that they arenot required to be valid UTF-8, but may be fully or partially valid UTF-8.

Build statuscrates.io

Documentation

https://docs.rs/bstr

When should I use byte strings?

See this part of the documentation for more details:https://docs.rs/bstr/1.*/bstr/#when-should-i-use-byte-strings.

The short story is that byte strings are useful when it is inconvenient orincorrect to require valid UTF-8.

Usage

cargo add bstr

Examples

The following two examples exhibit both the API features of byte strings andthe I/O convenience functions provided for reading line-by-line quickly.

This first example simply shows how to efficiently iterate over lines in stdin,and print out lines containing a particular substring:

use std::{error::Error, io::{self,Write}};use bstr::{ByteSlice, io::BufReadExt};fnmain() ->Result<(),Box<dynError>>{let stdin = io::stdin();letmut stdout = io::BufWriter::new(io::stdout());    stdin.lock().for_byte_line_with_terminator(|line|{if line.contains_str("Dimension"){            stdout.write_all(line)?;}Ok(true)})?;Ok(())}

This example shows how to count all of the words (Unicode-aware) in stdin,line-by-line:

use std::{error::Error, io};use bstr::{ByteSlice, io::BufReadExt};fnmain() ->Result<(),Box<dynError>>{let stdin = io::stdin();letmut words =0;    stdin.lock().for_byte_line_with_terminator(|line|{        words += line.words().count();Ok(true)})?;println!("{}", words);Ok(())}

This example shows how to convert a stream on stdin to uppercase withoutperforming UTF-8 validationand amortizing allocation. On standard ASCIItext, this is quite a bit faster than what you can (easily) do with standardlibrary APIs. (N.B. Any invalid UTF-8 bytes are passed through unchanged.)

use std::{error::Error, io::{self,Write}};use bstr::{ByteSlice, io::BufReadExt};fnmain() ->Result<(),Box<dynError>>{let stdin = io::stdin();letmut stdout = io::BufWriter::new(io::stdout());letmut upper =vec![];    stdin.lock().for_byte_line_with_terminator(|line|{        upper.clear();        line.to_uppercase_into(&mut upper);        stdout.write_all(&upper)?;Ok(true)})?;Ok(())}

This example shows how to extract the first 10 visual characters (as graphemeclusters) from each line, where invalid UTF-8 sequences are generally treatedas a single character and are passed through correctly:

use std::{error::Error, io::{self,Write}};use bstr::{ByteSlice, io::BufReadExt};fnmain() ->Result<(),Box<dynError>>{let stdin = io::stdin();letmut stdout = io::BufWriter::new(io::stdout());    stdin.lock().for_byte_line_with_terminator(|line|{let end = line.grapheme_indices().map(|(_, end, _)| end).take(10).last().unwrap_or(line.len());        stdout.write_all(line[..end].trim_end())?;        stdout.write_all(b"\n")?;Ok(true)})?;Ok(())}

Cargo features

This crates comes with a few features that control standard library, serde andUnicode support.

  • std -Enabled by default. This provides APIs that require the standardlibrary, such asVec<u8> andPathBuf. Enabling this feature also enablesthealloc feature.
  • alloc -Enabled by default. This provides APIs that require allocationsvia thealloc crate, such asVec<u8>.
  • unicode -Enabled by default. This provides APIs that require sizableUnicode data compiled into the binary. This includes, but is not limited to,grapheme/word/sentence segmenters. When this is disabled, basic support suchas UTF-8 decoding is still included. Note that currently, enabling thisfeature also requires enabling thestd feature. It is expected that thislimitation will be lifted at some point.
  • serde - Enables implementations of serde traits forBStr, and alsoBString whenalloc is enabled.

Minimum Rust version policy

This crate's minimum supportedrustc version (MSRV) is1.73.

In general, this crate will be conservative with respect to the minimumsupported version of Rust. MSRV may be bumped in minor version releases.

Future work

Since it is plausible that some of the types in this crate might end up in yourpublic API (e.g.,BStr andBString), we will commit to being veryconservative with respect to new major version releases. It's difficult to sayprecisely how conservative, but unless there is a major issue with the1.0release, I wouldn't expect a2.0 release to come out any sooner than someperiod of years.

A large part of the API surface area was taken from the standard library, sofrom an API design perspective, a good portion of this crate should be on solidground. The main differences from the standard library are in how the varioussubstring search routines work. The standard library provides genericinfrastructure for supporting different types of searches with a single method,where as this library prefers to define new methods for each type of search anddrop the generic infrastructure.

Someprobable future considerations for APIs include, but are not limited to:

  • Unicode normalization.
  • More sophisticated support for dealing with Unicode case, perhaps bycombining the use cases supported bycaselessandunicase.

Here are some examples that areprobably out of scope for this crate:

  • Regular expressions.
  • Unicode collation.

The exact scope isn't quite clear, but I expect we can iterate on it.

In general, as stated below, this crate brings lots of related APIs togetherinto a single crate while simultaneously attempting to keep the total number ofdependencies low. Indeed, every dependency ofbstr, except formemchr, isoptional.

High level motivation

Strictly speaking, thebstr crate provides very little that can't already beachieved with the standard libraryVec<u8>/&[u8] APIs and the ecosystem oflibrary crates. For example:

  • The standard library'sUtf8Error can beused for incremental lossy decoding of&[u8].
  • Theunicode-segmentationcrate can be used for iterating over graphemes (or words), but is onlyimplemented for&str types. One could useUtf8Error above to implementgrapheme iteration with the same semantics as whatbstr provides (automaticUnicode replacement codepoint substitution).
  • Thetwoway crate can be used for fast substringsearching on&[u8].

So why createbstr? Part of the point of thebstr crate is to provide auniform API of coupled components instead of relying on users to piece togetherloosely coupled components from the crate ecosystem. For example, if you wantedto perform a search and replace in aVec<u8>, then writing the code to dothat with thetwoway crate is not that difficult, but it's still additionalglue code you have to write. This work adds up depending on what you're doing.Consider, for example, trimming and splitting, along with their differentvariants.

In other words,bstr is partially a way of pushing back against themicro-crate ecosystem that appears to be evolving. Namely, it is a goal ofbstr to keep its dependency list lightweight. For example,serde is anoptional dependency because there is no feasible alternative. In service ofthis philosophy, currently, the only required dependency ofbstr ismemchr.

License

This project is licensed under either of

at your option.

The data insrc/unicode/data/ is licensed under the Unicode License Agreement(LICENSE-UNICODE), althoughthis data is only used in tests.

About

A string type for Rust that is not required to be valid UTF-8.

Topics

Resources

License

Unknown and 2 other licenses found

Licenses found

Unknown
COPYING
Apache-2.0
LICENSE-APACHE
MIT
LICENSE-MIT

Stars

Watchers

Forks

Sponsor this project

 

Packages

No packages published

Contributors37


[8]ページ先頭

©2009-2025 Movatter.jp