aptusproject/aptus-corePublic

NotificationsYou must be signed in to change notification settings
Fork1
Star10

A utility library aiming to simplify the Scala coding experience.

License

Apache-2.0, Apache-2.0 licenses found

Licenses found

10 stars 1 fork Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 424 Commits
core/src		core/src
data/src		data/src
docs/dyn		docs/dyn
images		images
licenses-binary		licenses-binary
licenses		licenses
meta/src/main/scala/aptus		meta/src/main/scala/aptus
project		project
reflect/src/main		reflect/src/main
src/test/resources		src/test/resources
.gitignore		.gitignore
CHANGELOG.md		CHANGELOG.md
LICENSE		LICENSE
LICENSE-binary		LICENSE-binary
NOTICE		NOTICE
NOTICE-binary		NOTICE-binary
README.md		README.md
build.sbt		build.sbt
dependencies.png		dependencies.png

Repository files navigation

Aptus

"Aptus" is latin for suitable, appropriate, fitting. It is a utility library meant to improve the Scala experience for simple tasks,when performance isn't most important. It also helps you code defensively when representing errors in types isn't important (thinkassert).

Introduction

For a good introduction to the library, see my talk from the Functional Scala 2024 conference:video on YouTube

In particular the talk discusses the next exciting development for Aptus: bringing quick and simple dynamic data manipulations to the library, for instance:

importaptus.dyn._"/path/to/my.tsv"// eg: name,age,occupation,pets  .dyns    .rename   ("occupation"~>"job")    .increment("age")    .remove   ("pets")  .write("/path/to/my.jsonl")// one JSON doc per line

Not mentioned in the talk: the ability to go to/from case classes - it wasn't implemented back then;coming soon: ability to interact with Python viaScalaPy (think pandas)

SBT

libraryDependencies += "io.github.aptusproject" %% "aptus-core" % "0.7.0"

Then import the following to test it out:

importaptus.all._

Though in general a more piecemeal approach is recommended:

importaptus.min._ORpackageobjectsomeprojectpackageextends aptus.Minimal

Alongside some ad hoc imports where needed:

importaptus.Map_importaptus.OutputFilePath...

The library is available for Scala3.4.0 and2.13

Dependency graph

Note:gson will soon be replaced withujson

Motivation

I created Aptus in bits over the past 10 years, as I struggled to get seemingly simple tasks done in Scala. It is not intended to be comprehensive, or particularly optimized.It should be seen more as a starting point for a project, where performance isn't most critical and compute resources aren't too limited.It can also serve as a reference, from which the basic use of underlying abstractions can be expanded upon as needed.It's also for people who enjoy Scala's type system and think types shouldn't be thrown out the window (hissing snake sound), yet don't feel the need to capture every possible error as types.Consider for instance Li Haoyi's post"Scala at Scale at Databricks", notably this passage:

Zero usage of "archetypical" Scala frameworks: Play, Akka, Scalaz, Cats, ZIO, etc.

This resonates well with aptus' goals. I like using some of the tools he mentions, but I also want to make sure I have simpler solutions at hand too.

I included all the dependencies shown in the diagram above because I find that they are required for most non-trivial projects.For instance, what application nowadays does not need to handle JSON at some point?Or parse a CSV file? Or handle a bz2 file?

Note that Aptus is heavily used in my data transformation library:Gallia, as well as most of my other projects (public and private).

Defensive coding

Let's consider stdlib's Seq's.zip and.toMap method for instance. Both will silently discard elements in some situations, and this behaviorwill almost never be the desired/expected one(if nothing because it may not be obvious to another maintainer)..zip for instance will truncate the longer sequence if they are not the same size..toMap will discard entries with duplicate keys, keeping only the last one.In almost all real life situations I encountered personnally and where either situation happened, it was the result of an upstream problem: I either meant for the two collections to be the same size for.zip,and I thought I wouldn't have duplicate keys when using.toMap.As a result I create two corresponding methods in aptus,.zipSameSize and.force.map, which throw a requirement runtime error when either situation occurs.I have been using them exclusively for years now, and it has more than paid off in catching errors early.

We'll see another example of defensive coding in the next section about succinctness: Java's.split andStringOps.split can also discard elements silently.

Succinctness

A good example of succinctness is a method likesplitByWholeSeparatorPreserveAllTokens from Apache Commons'sStringUtils,and whose semantics feelmore intuitive to me than those of Java'sString.split.Meanwhile using:

"foo|bar".splitBy("|")

is a lot more convenient than using:

importorg.apache.commons.lang3.StringUtilsvalstr="foo|bar"if (str.isEmpty())List(str)elseStringUtils.splitByWholeSeparatorPreserveAllTokens(str,"|").toList

It should be noted that both Java'sString.split and the stdlib'sStringOps.split have the very unintuitive behavior of not reporting trailing elements when empty, for instance:

println("1,2,3,,".split(',').toList)// List(1, 2, 3)

I try to illustrate such differences in succinctness/consistency/defensiveness of behavior throughout the examples below.

Practicality

Another aspect of Aptus is practicality, for instance I often find myself using expressions such as:

"foo=3"  .splitBy("=")  .force.tuple2  .mapSecond(_.toInt)

The stdlib's counterpart would look something like:

"foo=3"  .split('=')match {caseArray(x,y:String)=>     (x, y.toInt) }

Which I argue is harder to read/write and less obvious to understand (albeit not a lot more verbose).

Examples

In-line assertions

Note: .ensuring from the stdlib does not offer a way to manipulate the value in the error message

"hello".ensuring(_.size<=5)                   .toUpperCase.p// prints "HELLO" - stdlib"hello".assert (_.size<=5)                    .toUpperCase.p// prints "HELLO""hello".assert (_.size<=5, x=>s"value=${x}").toUpperCase.p// prints "HELLO" - can't do that with `ensuring()`"hello".require(_.size<=5)                    .toUpperCase.p// prints "HELLO""hello".require(_.size<=5, x=>s"value=${x}").toUpperCase.p// prints "HELLO"// these throw AssertionError"hello".assert (_.size>5)                    .toUpperCase.p"hello".assert (_.size>5, x=>s"value=${x}").toUpperCase.p// "assertion failed: value=hello"

Convenient for chaining, consider the pure stdlib alternative:

{importutil.chaining._"hello"    .ensuring(_.startsWith("h"))    .toUpperCase    .pipe(println)}

In-line printing

E.g. for quick debugging:

"hello".prt// prints: "hello""hello".p// prints: "hello""hello".p.toUpperCase.p// prints: "hello", then "HELLO""hello".inspect(_.size).p// prints: "5", then "hello""hello".i      (_.size).p// prints: "5", then "hello"1.toString.p// prints "1"1.str     .p// prints "1""hello".p__// prints   "hello"   and exits program (code 0)"hello".i__(_.quote)// prints "\"hello\"" and exits program (code 0)

String operations

"hello". append(" you!")  .p// prints "hello you!""hello".prepend("well,") .p// prints "well, hello""hello". appendedAll(" you!")  .p// prints "hello you!"  - stdlib"hello".prependedAll("well,") .p// prints "well, hello" - stdlib"hello".colon             .p// prints "hello:""hello".tab               .p// prints "hello<TAB>""hello".newline           .p// prints "hello<new-line>""hello".colon  ("human")  .p// prints "hello:human""hello".tab    ("human")  .p// prints "hello<TAB>human""hello".newline("human")  .p// prints "hello<new-line>human""hello".quote             .p// prints "\"hello\"""hello|world"  .splitBy("|").p// prints Seq(hello, world)"hello|world||".splitBy("|").p// prints Seq(hello, world, , ) - won't unexpectely ignore empty trailing elements"a\tb\tc".splitXsv('\t')// uses commons-csv under the hood to properly handle the split (eg escaping, ...)"hello".padLeft (8,' ').p// "   hello""hello".padRight(8,' ').p// "hello   "1.str  .padLeft (3,'0').p// "001"1.str  .padRight(3,'0').p// "100""mykey".   contains("my").p// stdlib"mykey".notContains("MY").p// negative counterpart// .. many more, see String_, for instance:// - strip{Prefix,Suffix}{Guaranteed,IfApplicable}// - remove{Guaranteed,IfApplicable}// - toBase64// ...

Note: see correspondingtests

Number operations

3.1416.add       (1).p// 4.14163.1416.multiplyBy(2).p// 6.2832...3.1416.isInBetween(fromInclusive=3.0,toExclusive:4.0).p// true// likewise for Int and Long

ForDouble:

3.1416     .maxDecimals   (2).p // 3.14 - still a Double (unlike formats below)3.1416     .formatDecimals(2).p // 3.143.1416.exp .formatDecimals(4).p // 23.14093.1416.log2.formatDecimals(4).p // 1.6515

Personally, I always have to look up printf's "% notation" before using it, so a method likeformatDecimals make things a lot easier.

Aptus also helps with collections of numbers:

Seq(3,2,1).mean  .p// 2.0Seq(3,2,1).minMax.p// (1, 3)// ... more: median, stdev, range, IQR, ... (see aptus.Seq_)

Time operations

"2023-06-05".parseLocalDate.getYear.p// 2023// also available://   parseLocalDateTime, parseLocalTime, parseInstant, parseOffsetDateTime and parseZonedDateTime// and//   parseLocalDateTime(pattern), ...

Conditional piping (a.k.a conditional "thrush")

"hello"  .pipeIf(_.size<=5)(_.toUpperCase).p// prints "HELLO""bonjour".pipeIf(_.size<=5)(_.toUpperCase).p// prints unchanged3.pipeIf(_%2==0)(_+1).p// prints 3 (unchanged)4.pipeIf(_%2==0)(_+1).p// prints 5valsuffixOpt=Some("?")"hello".pipeOpt(suffixOpt)(suffix=> _+ suffix).p// prints "hello?""hello".pipeOpt(None)     (suffix=> _+ suffix).p// prints unchanged

Seediscussion onScala Users.

There also is also amapIf counterpart:

Seq(1,2,3).mapIf(true) (_+1).p// List(2, 3, 4)Seq(1,2,3).mapIf(_<2)(_+1).p// List(2, 2, 3)

In-line "to Option"

"hello"  .in.someIf(_.size<=5).p// prints Some("hello")"bonjour".in.someIf(_.size<=5).p// prints None"hello"  .in.noneIf(_.size<=5).p// prints None"bonjour".in.noneIf(_.size<=5).p// prints Some("bonjour")// note: can also use shorthands: inNoneIf/inSomeIf

Convenient for chaining, consider the pure stdlib alternative:

{valstr="hello"valopt=if (str.size<=5)Some(str)elseNone  println(opt)}

Notes:

Option.when could also be used, but the test part isn't a predicate on the element (which would be much better).
Someone on the scala user list also pointed out this alternative:Some("hello").filter(_.size <= 5). While clever, I'd argue the semantics are much less obvious than"hello".in.someIf(_.size <= 5).

"force" disambiguator (Option/Map)

.get is polysemic in the standard library, sometimes "attempting" to get the result as withMap (returnsOption[T]), sometimes "forcing" it as withOption (returnsT)

aptus'.force conveys semantics unambiguously:

valmyOpt=Some("foo")valmyMap=Map("bar"->"foo")myOpt.force       .p// prints "foo"myMap.force("bar").p// prints "foo"// versus stdlib way:myOpt.get       .p// prints      "foo"  -> forcingmyMap.get("bar").p// prints Some("foo") -> attempting

More forcing

Seq(1)      .force.one     .p// 1Seq(1)      .force.option  .p// Some(1)Seq( )      .force.option  .p// NoneSeq(1,2,3).force.distinct.p// Seq(1, 2, 3)Seq(1,2,3).force.set     .p// Set(1, 2, 3)val (first, second)=Seq("foo","bar")       .force.tuple2val (first, second, third)=Seq("foo","bar","baz").force.tuple3// ... and so on up to 10

But:

Seq(1,2)   .force.one// runtime errorSeq(1,2)   .force.option// runtime errorSeq(1,2,1).force.distinct// runtime errorSeq(1,2,1).force.set// runtime errorSeq(1,2,3).force.tuple2// runtime error... and so on

The.force.one mechanism is one of the most useful operations, and a much safer bet than simply doing.head.

Help with Options

To optional:

   (None   ,Some(2))         .toOptionalTuple.p// None   (Some(1),None   )         .toOptionalTuple.p// None   (Some(1),Some(2))         .toOptionalTuple.p// Some((1, 2))Seq(None,None,None)   .toOptionalSeq  .p// NoneSeq(Some(1),Some(2),None)   .toOptionalSeq  .p// NoneSeq(Some(1),Some(2),Some(3)).toOptionalSeq  .p// Some(Seq(1, 2, 3))

Swapping:

// parameter for .swap is by-nameSome("foo").swap("bar").p// NoneNone       .swap("bar").p// Some("bar")

Help with Sequences

Quick sequence formatting:

Seq(1,2,3).@@.p//    [1, 2, 3]Seq(1,2,3).#@@.p// #3:[1, 2, 3]Seq(1,2,3).joinln// one per lineSeq(1,2,3).joinlnln// one per line every other lineSeq(1,2,3).joinln.sectionAllOff("data:")// or equivalently belowSeq(1,2,3).section             ("data:")// returns:/*  data:      1      2      3*/

Aptus also provides help with sorting for common cases, for instance:

Seq(Seq("d","e","f"),Seq("g","h","i"),Seq("a","b","c"))  .sorted(aptus.seqOrdering[String])/*returns:Seq(    Seq("a", "b", "c"),    Seq("d", "e", "f"),    Seq("g", "h", "i") )))*/

Zip operations

Most of the time, we want to zip collections of same size, and we want to code it defensively:

Seq(1,2,3).zipSameSize(Seq(4,5,6)).p// Seq((1,4), (2,5), (3,6))Seq(1,2,3).zipSameSize(Seq(4,5))   .p// runtime error

Ask yourselves: what are legitimate use cases where we zip two collections of different size and are perfectly happy to have the longuest silently truncated?

Other usefulzip-related operations are:

Seq("a","b","c").zipWithIsFirst.map {case (x, first/* for "a" here*/)=>if (first) ...else ... }Seq("a","b","c").zipWithIsLast .map {case (x, last/* for "c" here*/)=>if (last)  ...else ... }Seq("a","b","c").zipWithIndex.p// List((a,0), (b,1), (c,2))Seq("a","b","c").zipWithRank .p// List((a,1), (b,2), (c,3))

Splitting at head/last:

Seq(1,2,3).splitAtHead.p// (1,Seq(2, 3))Seq(1,2,3).splitAtLast.p// (Seq(1, 2),3)

Contained:

1.   containedIn(Seq(1,2,3)).p// true1.notContainedIn(Seq(1,2,3)).p// false// also available for Set

Note: Why not use "contains" from the stdlib instead? Consider the following situation:

valref=Seq("2","4","6")Seq(1,2,3).map(ref.contains(_.toString))// cannot do thatSeq(1,2,3).map(x=> ref.contains(x.toString))// we need an intermediateSeq(1,2,3).map(_.toString.containedIn(ref))// unless using containedIn

Ordering sequences of sequences (size prevails):

implicitvalord:Ordering[Seq[Int]]= aptus.seqOrderingSeq(Seq(4,5,6),Seq(1,2,3)).sorted.p// Seq(Seq(1, 2, 3), Seq(4, 5, 6))Seq(Seq(4,5,6),Seq(1,2   )).sorted.p// Seq(Seq(1, 2)   , Seq(4, 5, 6))Seq(Seq(4,5)   ,Seq(1,2,3)).sorted.p// Seq(Seq(4, 5)   , Seq(1, 2, 3))

Note:List vsSeq, seediscussion onScala Users.

Help with Maps

Most of the time, we do not want duplicates to be silently discarded:

// is this what we wanted?Seq(1->"a",2->"b",2->"c").toMap    .p// Map(1 -> "a", 2 -> "c")// likely notSeq(1->"a",2->"b",2->"c").force.map.p// runtime errorSeq(1->"a",2->"b")          .force.map.p// Map(1 -> "a", 2 -> "b")

Associate left/right:

Seq("foo","bar")                                    .force.mapLeft(_.toUpperCase).pSeq("foo","bar").map(_.associateLeft(_.toUpperCase)).force.map.p// returns: Map("FOO" -> "foo", "BAR" -> "bar")Seq("foo","bar")                                    .force.mapRight(_.size).pSeq("foo","bar").map(_.associateRight(_.size)).force.map.p// returns: Map("foo" -> 3, "bar" -> 3)

Group by key:

Seq("foo"->1,"bar"->2,"foo"->3).groupByKey.p// returns: Map(bar -> List(2), foo -> List(1, 3))// if original order must be preserved:Seq("bar"->2,"foo"->1,"foo"->3).groupByKeyWithListMap.p// returns: ListMap(bar -> List(2), foo -> List(1, 3))

Count by key:

Seq("foo"->1,"bar"->2,"foo"->3).countByKey.p// returns: List((2,foo), (1,bar))

Count by self:

Seq("a","b","a","c").countBySelf.p// returns: Seq(("a", 2), ("b", 1), ("c", 1)))// note: ordered by DESC

Help with Tuples

Fromimport aptus.Tuple{2,3,4,5}_

(1,2).toSeq.p// Seq(1, 2)(1,2).mapFirst (_+1)// (2, 2)(1,2).mapSecond(_+1)// (1, 3)(1,2,3).mapThird(_+1)// (1, 2, 4)

Wrapping

"foo".in.some .p// Some("foo")"foo".in.seq  .p// Seq ("foo")"foo".in.list .p// List("foo")"foo".in.left .p// Left("foo")"foo".in.right.p// Right("foo")// also see in.someIf/in.noneIf above

Sliding pairs

Seq[Int]()             .slidingPairs// Seq()Seq     (1)            .slidingPairs// Seq()Seq     (1,2,3,4,5).slidingPairs// Seq((1, 2), (2, 3), (3, 4), (4, 5))Seq(1,2,3).slidingPairsWithPrevious.p// List((None,1), (Some(1),2), (Some(2),3))Seq(1,2,3).slidingPairsWithNext    .p// List((1,Some(2)), (2,Some(3)), (3,None))

consider the pure stdlib alternative:

Seq(1,2,3,4,5)  .sliding(2)  .map { x=>    (x(0), x(1)) }  .toSeq

Closing resources

Aptus'Closeabled boils down to:

class Closeabled[T](underlying: T, cls: Closeable) extends Closeable

Convenient for instance when you don't want to manage pairs ofIterator/Closeable, e.g.:

// let's write linesSeq("hello","world").writeFileLines("/tmp/lines")// and stream them backvalmyCloseabled:SelfClosingIterator[String]="/tmp/lines".streamFileLines()// for instance, we can consume the content (will automatically close)myCloseabled                   .consume(_.toList).p// as is<XOR>myCloseabled.map(_.map(_.size)).consume(_.toList).p// line pre-processing

Orphan methods

We call some method directly from theaptus package object if no natural parent can be used.

aptus.fs.homeDirectoryPath().p// "/home/tony"aptus.hardware.totalMemory().p// 1011351552aptus.random.uuidString()   .p// a1bffc1e-72aa-477e-ac84-e4133ffcafadaptus.time.stamp().p// 240224152753aptus.illegalState   ("freeze!")// Exception in thread "main" IllegalStateException: freeze!aptus.illegalArgument("freeze!")// Exception in thread "main" IllegalArgumentException: freeze!aptus.reflect.formatStackTrace().p// returns:/*  java.lang.Throwable      at aptus.aptmisc.Reflect$.formatStackTrace(Misc.scala:62)      ...      <where you are in your code>*/// ... (see more in aptus.AptusAliases)

Conveying intent

These are often used to save/homogenize comments.

Sometimes we want to convey that a sequence cannot be reordered without consequences, think of it as built-in comment

@ordermatters val mySeq(MostImportant, SecondMostImportant, ...)

An annotation is favored over a type alias here so that it can be applied to other code areas than sequences.

The following are just aliases, cheap replacements forNonEmptyList-like alternatives:

valvalues:Nes[Int]=Seq(1,2,3)valmaybeValues:Pes[Int]=Some(Seq(1,2,3))

Note: Value classes don't acceptrequire statements

IO

Plain files:

"hello world".writeFileContent("/tmp/content")"/tmp/content".readFileContent().p// prints: "hello world"Seq("hello","world").writeFileLines("/tmp/lines")"/tmp/lines".readFileLines().p// prints: Seq("hello", "world")

Compressed files:

"hello world".writeFileContent("/tmp/content.gz")"/tmp/content.gz".readFileContent().p// prints: "hello world"Seq("hello","world").writeFileLines("/tmp/lines.gz")"/tmp/lines.gz".readFileLines().p// prints: Seq("hello", "world")// note: file -i /tmp/content.gz" shows it's indeed application/gzip"/data/bigfile.gz".streamFileLines()// returns a SelfClosingIterator[String], which closes itself once all lines have been seen

JSON:

A special note about JSON, owing to its ubiquity (and despite itsmany flaws).WhileGallia is my main project pertaining to data in general (especially transformation thereof), I included a minimal set of functionality in Aptus:

""" {"foo": 1}""".jsonObject// returns a com.google.gson.JsonObject"""[{"foo": 1}]""".jsonArray// returns a com.google.gson.JsonArray"""{"foo": 1, "bar": true}""".prettyJson.p// .compactJson is also available/*{  "foo": 1,  "bar": true}*/

In the future, a subset of Gallia will be created,which will basically offer a similar set of operationsbut without any concern for the underlying schema:gallia-dyn.It will offer a convenient way to perform "dynamic" transformations,and therefore handle JSON. Once ready,a subset of gallia-dyn` will likely be included in Aptus for convenience,so that simple manipulations such as these will be possible OOTB:

"""{"foo": "hello", "bar": 2, "baz": true}"""  .readObj    .toUpperCase("foo")    .increment  ("bar")    .drop       ("baz")  .printCompactJson()// """{"foo": "HELLO", "bar": 3}"""

URLs:

valTestResources="https://raw.githubusercontent.com/aptusproject/aptus-core/6f4acbc/src/test/resources"s"${TestResources}/content".readUrlContent()// prints "hello word"s"${TestResources}/lines"  .readUrlLines().p// prints: Seq("hello", "world")

Notes:

These may move under"...".file and"...".url respectively (TBD)
In the future we'll allow a basic POST as well

File System

A very lightweight way to handle the file system, not mean to be comprehensive (useos-lib for more power)

"/tmp/sbt".path.isDir()"/tmp/sbt".path.file.removeFile()..."/tmp/sbt".path.dir.listNames()..."/tmp/sbt".path.dir.listFilePathsRecursively()

System calls

Quick-and-dirty system calls:

"echo hello"           .systemCall()// prints: "hello""date +%s"             .systemCall()// prints: "1622562984""head -1 /proc/cpuinfo".systemCall()// prints: "processor: 0"

Backlog

At least aList_ counterpart toSeq_, maybe via code generation (again seediscussion onScala Users)
Add more useful abstractions borrowed from other languages, e.g. Python'sCounter
Lots more tests to be written, though many methods in aptus are too trivial to warrant a test, e.g.def pipeIf(test: Boolean)(f: A => A): A = if (test) f(a) else a
More useful methods remain to be ported from Aptus' prototype (not published because too messy)
See all theTODOs in the code
Also see Gallia'sbacklog

Contributing

Contributions welcome.

About

A utility library aiming to simplify the Scala coding experience.

Topics

scala utilities

Resources

Readme

License

Apache-2.0, Apache-2.0 licenses found

Releases10

v0.7.0 Latest

Dec 2, 2024

+ 9 releases

Packages

No packages published

Languages

Scala100.0%

Movatterモバイル変換

License

Licenses found

aptusproject/aptus-core

Folders and files

Latest commit

History

Repository files navigation

Aptus

Introduction

SBT

Dependency graph

Motivation

Defensive coding

Succinctness

Practicality

Examples

In-line assertions

In-line printing

String operations

Number operations

Time operations

Conditional piping (a.k.a conditional "thrush")

In-line "to Option"

"force" disambiguator (Option/Map)

More forcing

Help with Options

Help with Sequences

Zip operations

Help with Maps

Help with Tuples

Wrapping

Sliding pairs

Closing resources

Orphan methods

Conveying intent

IO

File System

System calls

Backlog

Contributing

About

Topics

Resources

License

Licenses found

Uh oh!

Stars

Watchers

Forks

Releases10

Packages0

Uh oh!

Languages

Packages