karrick/goblsPublic

NotificationsYou must be signed in to change notification settings
Fork2
Star11

go buffered line scanner

License

MIT license

11 stars 2 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 48 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
benchmark_test.go		benchmark_test.go
bufferScanner.go		bufferScanner.go
debug.go		debug.go
ensure_test.go		ensure_test.go
go.mod		go.mod
go.sum		go.sum
gobls.go		gobls.go
gobls_test.go		gobls_test.go
scanner.go		scanner.go

Repository files navigation

gobls

Gobls is a buffered line scanner for Go.

Description

Similar tobufio.Scanner, but wrapsbufio.Reader.ReadLine so linesof arbitrary length can be scanned. It uses a hybrid approach so thatin most cases, when lines are not unusually long, the fast code pathis taken. When lines are unusually long, it uses the per-scannerpre-allocated byte slice to reassemble the fragments into a singleslice of bytes.

Example

Enumerating lines from an io.Reader (drop in replacement for bufio.Scanner)

When you have an io.Reader that you want to enumerate, normally youwrap it inbufio.Scanner. This library is a drop in replacement forthis particular circumstance, and you can change frombufio.NewScanner(r) togobls.NewScanner(r), and no longer have toworry about token too long errors.

varlines,charactersintls:=gobls.NewScanner(os.Stdin)forls.Scan() {lines++characters+=len(ls.Bytes())    }iferr:=ls.Err();err!=nil {fmt.Fprintln(os.Stderr,"cannot scan:",err)    }fmt.Println("Counted",lines,"lines and",characters,"characters.")

Enumerating lines from []byte

If you already have a slice of bytes that you want to enumerate linesfor, it is much more performant to wrap that byte slice withgobls.NewBufferScanner(buf) than to wrap the slice in a io.Readerand call either the above orbufio.NewScanner.

varlines,charactersintls:=gobls.NewBufferScanner(buf)forls.Scan() {lines++characters+=len(ls.Bytes())    }iferr:=ls.Err();err!=nil {fmt.Fprintln(os.Stderr,"cannot scan:",err)    }fmt.Println("Counted",lines,"lines and",characters,"characters.")

Performance

TheBufferScanner is faster thanbufio.Scanner for allbenchmarks. However, on my test system, the regularScanner takesfrom 2% to nearly 40% longer than bufio scanner, depending on thelength of the lines to be scanned. The 40% longer times were onlyobserved when line lengths werebufio.MaxScanTokenSize bytes long.Usually the performance penalty is 2% to 15% of bufio measurements.

Rungo test -bench=. -benchmem on your system for comparison. I'msure the testing method could be improved. Suggestions are welcomed.

For circumstances where there is no concern about enumerating lineswhose lengths are longer than the max token length frombufio, thenI recommend using the standard library.

On the other hand, if you already have a slice of bytes, library ismuch more performant than the equivalentbufio.NewScanner(bytes.NewReader(buf)).

$ go test -bench=. -benchmemgoos: linuxgoarch: amd64pkg: github.com/karrick/goblsBenchmarkScanner/0064/bufio-8               30000000   43.7  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0064/reader-8              20000000   59.2  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0064/buffer-8              50000000   33.7  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0128/bufio-8               30000000   54.5  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0128/reader-8              20000000   70.5  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0128/buffer-8              30000000   38.9  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0256/bufio-8               20000000   79.8  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0256/reader-8              20000000   94.9  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0256/buffer-8              30000000   50.2  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0512/bufio-8               10000000    123  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0512/reader-8              10000000    144  ns/op  0  B/op  0  allocs/opBenchmarkScanner/0512/buffer-8              20000000   79.0  ns/op  0  B/op  0  allocs/opBenchmarkScanner/1024/bufio-8               10000000    210  ns/op  0  B/op  0  allocs/opBenchmarkScanner/1024/reader-8              10000000    227  ns/op  0  B/op  0  allocs/opBenchmarkScanner/1024/buffer-8              10000000    119  ns/op  0  B/op  0  allocs/opBenchmarkScanner/2048/bufio-8                5000000    382  ns/op  0  B/op  0  allocs/opBenchmarkScanner/2048/reader-8               3000000    413  ns/op  0  B/op  0  allocs/opBenchmarkScanner/2048/buffer-8               5000000    272  ns/op  0  B/op  0  allocs/opBenchmarkScanner/4096/bufio-8                2000000    701  ns/op  0  B/op  0  allocs/opBenchmarkScanner/4096/reader-8               2000000    733  ns/op  0  B/op  0  allocs/opBenchmarkScanner/4096/buffer-8               3000000    517  ns/op  0  B/op  0  allocs/opBenchmarkScanner/excessively_long/bufio-8     200000  11681  ns/op  0  B/op  0  allocs/opBenchmarkScanner/excessively_long/reader-8    100000  14464  ns/op  2  B/op  0  allocs/opBenchmarkScanner/excessively_long/buffer-8    200000   8688  ns/op  0  B/op  0  allocs/opPASSok  github.com/karrick/gobls256.191s

About

go buffered line scanner

Releases

12tags

Packages

No packages published

Languages

Go100.0%

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

License

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

gobls

Description

Example

Enumerating lines from an io.Reader (drop in replacement for bufio.Scanner)

Enumerating lines from []byte

Performance

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages

Uh oh!

Languages

Movatterモバイル変換

License

karrick/gobls

Folders and files

Latest commit

History

Repository files navigation

gobls

Description

Example

Enumerating lines from an io.Reader (drop in replacement for bufio.Scanner)

Enumerating lines from []byte

Performance

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages0

Uh oh!

Languages

Packages