- Notifications
You must be signed in to change notification settings - Fork1
A utility library aiming to simplify the Scala coding experience.
License
Apache-2.0, Apache-2.0 licenses found
Licenses found
aptusproject/aptus-core
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
"Aptus" is latin for suitable, appropriate, fitting. It is a utility library meant to improve the Scala experience for simple tasks,when performance isn't most important. It also helps you code defensively when representing errors in types isn't important (thinkassert
).
libraryDependencies += "io.github.aptusproject" %% "aptus-core" % "0.7.0"
Then import the following to test it out:
importaptus.all._
Though in general a more piecemeal approach is recommended:
importaptus.min._ORpackageobjectsomeprojectpackageextends aptus.Minimal
Alongside some ad hoc imports where needed:
importaptus.Map_importaptus.OutputFilePath...
The library is available for Scala3.4.0 and2.13
Note:gson will soon be replaced withujson
I created Aptus in bits over the past 10 years, as I struggled to get seemingly simple tasks done in Scala. It is not intended to be comprehensive, or particularly optimized.It should be seen more as a starting point for a project, where performance isn't most critical and compute resources aren't too limited.It can also serve as a reference, from which the basic use of underlying abstractions can be expanded upon as needed.It's also for people who enjoy Scala's type system and think types shouldn't be thrown out the window (hissing snake sound), yet don't feel the need to capture every possible error as types.Consider for instance Li Haoyi's post"Scala at Scale at Databricks", notably this passage:
Zero usage of "archetypical" Scala frameworks: Play, Akka, Scalaz, Cats, ZIO, etc.
This resonates well with aptus' goals. I like using some of the tools he mentions, but I also want to make sure I have simpler solutions at hand too.
I included all the dependencies shown in the diagram above because I find that they are required for most non-trivial projects.For instance, what application nowadays does not need to handle JSON at some point?Or parse a CSV file? Or handle a bz2 file?
Note that Aptus is heavily used in my data transformation library:Gallia, as well as most of my other projects (public and private).
Let's consider stdlib's Seq's.zip
and.toMap
method for instance. Both will silently discard elements in some situations, and this behaviorwill almost never be the desired/expected one(if nothing because it may not be obvious to another maintainer)..zip
for instance will truncate the longer sequence if they are not the same size..toMap
will discard entries with duplicate keys, keeping only the last one.In almost all real life situations I encountered personnally and where either situation happened, it was the result of an upstream problem: I either meant for the two collections to be the same size for.zip
,and I thought I wouldn't have duplicate keys when using.toMap
.As a result I create two corresponding methods in aptus,.zipSameSize
and.force.map
, which throw a requirement runtime error when either situation occurs.I have been using them exclusively for years now, and it has more than paid off in catching errors early.
We'll see another example of defensive coding in the next section about succinctness: Java's.split
andStringOps.split
can also discard elements silently.
A good example of succinctness is a method likesplitByWholeSeparatorPreserveAllTokens
from Apache Commons'sStringUtils
,and whose semantics feelmore intuitive to me than those of Java'sString.split
.Meanwhile using:
"foo|bar".splitBy("|")
is a lot more convenient than using:
importorg.apache.commons.lang3.StringUtilsvalstr="foo|bar"if (str.isEmpty())List(str)elseStringUtils.splitByWholeSeparatorPreserveAllTokens(str,"|").toList
It should be noted that both Java'sString.split
and the stdlib'sStringOps.split
have the very unintuitive behavior of not reporting trailing elements when empty, for instance:
println("1,2,3,,".split(',').toList)// List(1, 2, 3)
I try to illustrate such differences in succinctness/consistency/defensiveness of behavior throughout the examples below.
Another aspect of Aptus is practicality, for instance I often find myself using expressions such as:
"foo=3" .splitBy("=") .force.tuple2 .mapSecond(_.toInt)
The stdlib's counterpart would look something like:
"foo=3" .split('=')match {caseArray(x,y:String)=> (x, y.toInt) }
Which I argue is harder to read/write and less obvious to understand (albeit not a lot more verbose).
Note: .ensuring from the stdlib does not offer a way to manipulate the value in the error message
"hello".ensuring(_.size<=5) .toUpperCase.p// prints "HELLO" - stdlib"hello".assert (_.size<=5) .toUpperCase.p// prints "HELLO""hello".assert (_.size<=5, x=>s"value=${x}").toUpperCase.p// prints "HELLO" - can't do that with `ensuring()`"hello".require(_.size<=5) .toUpperCase.p// prints "HELLO""hello".require(_.size<=5, x=>s"value=${x}").toUpperCase.p// prints "HELLO"// these throw AssertionError"hello".assert (_.size>5) .toUpperCase.p"hello".assert (_.size>5, x=>s"value=${x}").toUpperCase.p// "assertion failed: value=hello"
Convenient for chaining, consider the pure stdlib alternative:
{importutil.chaining._"hello" .ensuring(_.startsWith("h")) .toUpperCase .pipe(println)}
E.g. for quick debugging:
"hello".prt// prints: "hello""hello".p// prints: "hello""hello".p.toUpperCase.p// prints: "hello", then "HELLO""hello".inspect(_.size).p// prints: "5", then "hello""hello".i (_.size).p// prints: "5", then "hello"1.toString.p// prints "1"1.str .p// prints "1""hello".p__// prints "hello" and exits program (code 0)"hello".i__(_.quote)// prints "\"hello\"" and exits program (code 0)
"hello". append(" you!") .p// prints "hello you!""hello".prepend("well,") .p// prints "well, hello""hello". appendedAll(" you!") .p// prints "hello you!" - stdlib"hello".prependedAll("well,") .p// prints "well, hello" - stdlib"hello".colon .p// prints "hello:""hello".tab .p// prints "hello<TAB>""hello".newline .p// prints "hello<new-line>""hello".colon ("human") .p// prints "hello:human""hello".tab ("human") .p// prints "hello<TAB>human""hello".newline("human") .p// prints "hello<new-line>human""hello".quote .p// prints "\"hello\"""hello|world" .splitBy("|").p// prints Seq(hello, world)"hello|world||".splitBy("|").p// prints Seq(hello, world, , ) - won't unexpectely ignore empty trailing elements"a\tb\tc".splitXsv('\t')// uses commons-csv under the hood to properly handle the split (eg escaping, ...)"hello".padLeft (8,' ').p// " hello""hello".padRight(8,' ').p// "hello "1.str .padLeft (3,'0').p// "001"1.str .padRight(3,'0').p// "100""mykey". contains("my").p// stdlib"mykey".notContains("MY").p// negative counterpart// .. many more, see String_, for instance:// - strip{Prefix,Suffix}{Guaranteed,IfApplicable}// - remove{Guaranteed,IfApplicable}// - toBase64// ...
Note: see correspondingtests
3.1416.add (1).p// 4.14163.1416.multiplyBy(2).p// 6.2832...3.1416.isInBetween(fromInclusive=3.0,toExclusive:4.0).p// true// likewise for Int and Long
ForDouble
:
3.1416 .maxDecimals (2).p // 3.14 - still a Double (unlike formats below)3.1416 .formatDecimals(2).p // 3.143.1416.exp .formatDecimals(4).p // 23.14093.1416.log2.formatDecimals(4).p // 1.6515
Personally, I always have to look up printf's "% notation" before using it, so a method likeformatDecimals
make things a lot easier.
Aptus also helps with collections of numbers:
Seq(3,2,1).mean .p// 2.0Seq(3,2,1).minMax.p// (1, 3)// ... more: median, stdev, range, IQR, ... (see aptus.Seq_)
"2023-06-05".parseLocalDate.getYear.p// 2023// also available:// parseLocalDateTime, parseLocalTime, parseInstant, parseOffsetDateTime and parseZonedDateTime// and// parseLocalDateTime(pattern), ...
"hello" .pipeIf(_.size<=5)(_.toUpperCase).p// prints "HELLO""bonjour".pipeIf(_.size<=5)(_.toUpperCase).p// prints unchanged3.pipeIf(_%2==0)(_+1).p// prints 3 (unchanged)4.pipeIf(_%2==0)(_+1).p// prints 5valsuffixOpt=Some("?")"hello".pipeOpt(suffixOpt)(suffix=> _+ suffix).p// prints "hello?""hello".pipeOpt(None) (suffix=> _+ suffix).p// prints unchanged
Seediscussion onScala Users.
There also is also amapIf
counterpart:
Seq(1,2,3).mapIf(true) (_+1).p// List(2, 3, 4)Seq(1,2,3).mapIf(_<2)(_+1).p// List(2, 2, 3)
"hello" .in.someIf(_.size<=5).p// prints Some("hello")"bonjour".in.someIf(_.size<=5).p// prints None"hello" .in.noneIf(_.size<=5).p// prints None"bonjour".in.noneIf(_.size<=5).p// prints Some("bonjour")// note: can also use shorthands: inNoneIf/inSomeIf
Convenient for chaining, consider the pure stdlib alternative:
{valstr="hello"valopt=if (str.size<=5)Some(str)elseNone println(opt)}
Notes:
Option.when
could also be used, but the test part isn't a predicate on the element (which would be much better).- Someone on the scala user list also pointed out this alternative:
Some("hello").filter(_.size <= 5)
. While clever, I'd argue the semantics are much less obvious than"hello".in.someIf(_.size <= 5)
.
.get
is polysemic in the standard library, sometimes "attempting" to get the result as withMap
(returnsOption[T]
), sometimes "forcing" it as withOption
(returnsT
)
aptus'.force
conveys semantics unambiguously:
valmyOpt=Some("foo")valmyMap=Map("bar"->"foo")myOpt.force .p// prints "foo"myMap.force("bar").p// prints "foo"// versus stdlib way:myOpt.get .p// prints "foo" -> forcingmyMap.get("bar").p// prints Some("foo") -> attempting
Seq(1) .force.one .p// 1Seq(1) .force.option .p// Some(1)Seq( ) .force.option .p// NoneSeq(1,2,3).force.distinct.p// Seq(1, 2, 3)Seq(1,2,3).force.set .p// Set(1, 2, 3)val (first, second)=Seq("foo","bar") .force.tuple2val (first, second, third)=Seq("foo","bar","baz").force.tuple3// ... and so on up to 10
Seq(1,2) .force.one// runtime errorSeq(1,2) .force.option// runtime errorSeq(1,2,1).force.distinct// runtime errorSeq(1,2,1).force.set// runtime errorSeq(1,2,3).force.tuple2// runtime error... and so on
The.force.one
mechanism is one of the most useful operations, and a much safer bet than simply doing.head
.
(None ,Some(2)) .toOptionalTuple.p// None (Some(1),None ) .toOptionalTuple.p// None (Some(1),Some(2)) .toOptionalTuple.p// Some((1, 2))Seq(None,None,None) .toOptionalSeq .p// NoneSeq(Some(1),Some(2),None) .toOptionalSeq .p// NoneSeq(Some(1),Some(2),Some(3)).toOptionalSeq .p// Some(Seq(1, 2, 3))
// parameter for .swap is by-nameSome("foo").swap("bar").p// NoneNone .swap("bar").p// Some("bar")
Seq(1,2,3).@@.p// [1, 2, 3]Seq(1,2,3).#@@.p// #3:[1, 2, 3]Seq(1,2,3).joinln// one per lineSeq(1,2,3).joinlnln// one per line every other lineSeq(1,2,3).joinln.sectionAllOff("data:")// or equivalently belowSeq(1,2,3).section ("data:")// returns:/* data: 1 2 3*/
Aptus also provides help with sorting for common cases, for instance:
Seq(Seq("d","e","f"),Seq("g","h","i"),Seq("a","b","c")) .sorted(aptus.seqOrdering[String])/*returns:Seq( Seq("a", "b", "c"), Seq("d", "e", "f"), Seq("g", "h", "i") )))*/
Most of the time, we want to zip collections of same size, and we want to code it defensively:
Seq(1,2,3).zipSameSize(Seq(4,5,6)).p// Seq((1,4), (2,5), (3,6))Seq(1,2,3).zipSameSize(Seq(4,5)) .p// runtime error
Ask yourselves: what are legitimate use cases where we zip two collections of different size and are perfectly happy to have the longuest silently truncated?
Other usefulzip
-related operations are:
Seq("a","b","c").zipWithIsFirst.map {case (x, first/* for "a" here*/)=>if (first) ...else ... }Seq("a","b","c").zipWithIsLast .map {case (x, last/* for "c" here*/)=>if (last) ...else ... }Seq("a","b","c").zipWithIndex.p// List((a,0), (b,1), (c,2))Seq("a","b","c").zipWithRank .p// List((a,1), (b,2), (c,3))
Seq(1,2,3).splitAtHead.p// (1,Seq(2, 3))Seq(1,2,3).splitAtLast.p// (Seq(1, 2),3)
1. containedIn(Seq(1,2,3)).p// true1.notContainedIn(Seq(1,2,3)).p// false// also available for Set
Note: Why not use "contains" from the stdlib instead? Consider the following situation:
valref=Seq("2","4","6")Seq(1,2,3).map(ref.contains(_.toString))// cannot do thatSeq(1,2,3).map(x=> ref.contains(x.toString))// we need an intermediateSeq(1,2,3).map(_.toString.containedIn(ref))// unless using containedIn
Ordering sequences of sequences (size prevails):
implicitvalord:Ordering[Seq[Int]]= aptus.seqOrderingSeq(Seq(4,5,6),Seq(1,2,3)).sorted.p// Seq(Seq(1, 2, 3), Seq(4, 5, 6))Seq(Seq(4,5,6),Seq(1,2 )).sorted.p// Seq(Seq(1, 2) , Seq(4, 5, 6))Seq(Seq(4,5) ,Seq(1,2,3)).sorted.p// Seq(Seq(4, 5) , Seq(1, 2, 3))
Note:List
vsSeq
, seediscussion onScala Users.
Most of the time, we do not want duplicates to be silently discarded:
// is this what we wanted?Seq(1->"a",2->"b",2->"c").toMap .p// Map(1 -> "a", 2 -> "c")// likely notSeq(1->"a",2->"b",2->"c").force.map.p// runtime errorSeq(1->"a",2->"b") .force.map.p// Map(1 -> "a", 2 -> "b")
Seq("foo","bar") .force.mapLeft(_.toUpperCase).pSeq("foo","bar").map(_.associateLeft(_.toUpperCase)).force.map.p// returns: Map("FOO" -> "foo", "BAR" -> "bar")Seq("foo","bar") .force.mapRight(_.size).pSeq("foo","bar").map(_.associateRight(_.size)).force.map.p// returns: Map("foo" -> 3, "bar" -> 3)
Seq("foo"->1,"bar"->2,"foo"->3).groupByKey.p// returns: Map(bar -> List(2), foo -> List(1, 3))// if original order must be preserved:Seq("bar"->2,"foo"->1,"foo"->3).groupByKeyWithListMap.p// returns: ListMap(bar -> List(2), foo -> List(1, 3))
Seq("foo"->1,"bar"->2,"foo"->3).countByKey.p// returns: List((2,foo), (1,bar))
Seq("a","b","a","c").countBySelf.p// returns: Seq(("a", 2), ("b", 1), ("c", 1)))// note: ordered by DESC
Fromimport aptus.Tuple{2,3,4,5}_
(1,2).toSeq.p// Seq(1, 2)(1,2).mapFirst (_+1)// (2, 2)(1,2).mapSecond(_+1)// (1, 3)(1,2,3).mapThird(_+1)// (1, 2, 4)
"foo".in.some .p// Some("foo")"foo".in.seq .p// Seq ("foo")"foo".in.list .p// List("foo")"foo".in.left .p// Left("foo")"foo".in.right.p// Right("foo")// also see in.someIf/in.noneIf above
Seq[Int]() .slidingPairs// Seq()Seq (1) .slidingPairs// Seq()Seq (1,2,3,4,5).slidingPairs// Seq((1, 2), (2, 3), (3, 4), (4, 5))Seq(1,2,3).slidingPairsWithPrevious.p// List((None,1), (Some(1),2), (Some(2),3))Seq(1,2,3).slidingPairsWithNext .p// List((1,Some(2)), (2,Some(3)), (3,None))
consider the pure stdlib alternative:
Seq(1,2,3,4,5) .sliding(2) .map { x=> (x(0), x(1)) } .toSeq
Aptus'Closeabled
boils down to:
class Closeabled[T](underlying: T, cls: Closeable) extends Closeable
Convenient for instance when you don't want to manage pairs ofIterator/Closeable
, e.g.:
// let's write linesSeq("hello","world").writeFileLines("/tmp/lines")// and stream them backvalmyCloseabled:SelfClosingIterator[String]="/tmp/lines".streamFileLines()// for instance, we can consume the content (will automatically close)myCloseabled .consume(_.toList).p// as is<XOR>myCloseabled.map(_.map(_.size)).consume(_.toList).p// line pre-processing
We call some method directly from theaptus
package object if no natural parent can be used.
aptus.fs.homeDirectoryPath().p// "/home/tony"aptus.hardware.totalMemory().p// 1011351552aptus.random.uuidString() .p// a1bffc1e-72aa-477e-ac84-e4133ffcafadaptus.time.stamp().p// 240224152753aptus.illegalState ("freeze!")// Exception in thread "main" IllegalStateException: freeze!aptus.illegalArgument("freeze!")// Exception in thread "main" IllegalArgumentException: freeze!aptus.reflect.formatStackTrace().p// returns:/* java.lang.Throwable at aptus.aptmisc.Reflect$.formatStackTrace(Misc.scala:62) ... <where you are in your code>*/// ... (see more in aptus.AptusAliases)
These are often used to save/homogenize comments.
Sometimes we want to convey that a sequence cannot be reordered without consequences, think of it as built-in comment
@ordermatters val mySeq(MostImportant, SecondMostImportant, ...)
An annotation is favored over a type alias here so that it can be applied to other code areas than sequences.
The following are just aliases, cheap replacements forNonEmptyList
-like alternatives:
valvalues:Nes[Int]=Seq(1,2,3)valmaybeValues:Pes[Int]=Some(Seq(1,2,3))
Note: Value classes don't acceptrequire
statements
Plain files:
"hello world".writeFileContent("/tmp/content")"/tmp/content".readFileContent().p// prints: "hello world"Seq("hello","world").writeFileLines("/tmp/lines")"/tmp/lines".readFileLines().p// prints: Seq("hello", "world")
"hello world".writeFileContent("/tmp/content.gz")"/tmp/content.gz".readFileContent().p// prints: "hello world"Seq("hello","world").writeFileLines("/tmp/lines.gz")"/tmp/lines.gz".readFileLines().p// prints: Seq("hello", "world")// note: file -i /tmp/content.gz" shows it's indeed application/gzip"/data/bigfile.gz".streamFileLines()// returns a SelfClosingIterator[String], which closes itself once all lines have been seen
A special note about JSON, owing to its ubiquity (and despite itsmany flaws).WhileGallia is my main project pertaining to data in general (especially transformation thereof), I included a minimal set of functionality in Aptus:
""" {"foo": 1}""".jsonObject// returns a com.google.gson.JsonObject"""[{"foo": 1}]""".jsonArray// returns a com.google.gson.JsonArray"""{"foo": 1, "bar": true}""".prettyJson.p// .compactJson is also available/*{ "foo": 1, "bar": true}*/
In the future, a subset of Gallia will be created,which will basically offer a similar set of operationsbut without any concern for the underlying schema:gallia-dyn.It will offer a convenient way to perform "dynamic" transformations,and therefore handle JSON. Once ready,a subset of gallia-dyn` will likely be included in Aptus for convenience,so that simple manipulations such as these will be possible OOTB:
"""{"foo": "hello", "bar": 2, "baz": true}""" .readObj .toUpperCase("foo") .increment ("bar") .drop ("baz") .printCompactJson()// """{"foo": "HELLO", "bar": 3}"""
valTestResources="https://raw.githubusercontent.com/aptusproject/aptus-core/6f4acbc/src/test/resources"s"${TestResources}/content".readUrlContent()// prints "hello word"s"${TestResources}/lines" .readUrlLines().p// prints: Seq("hello", "world")
Notes:
- These may move under
"...".file
and"...".url
respectively (TBD) - In the future we'll allow a basic POST as well
A very lightweight way to handle the file system, not mean to be comprehensive (useos-lib
for more power)
"/tmp/sbt".path.isDir()"/tmp/sbt".path.file.removeFile()..."/tmp/sbt".path.dir.listNames()..."/tmp/sbt".path.dir.listFilePathsRecursively()
Quick-and-dirty system calls:
"echo hello" .systemCall()// prints: "hello""date +%s" .systemCall()// prints: "1622562984""head -1 /proc/cpuinfo".systemCall()// prints: "processor: 0"
- At least a
List_
counterpart toSeq_
, maybe via code generation (again seediscussion onScala Users) - Add more useful abstractions borrowed from other languages, e.g. Python's
Counter
- Lots more tests to be written, though many methods in aptus are too trivial to warrant a test, e.g.
def pipeIf(test: Boolean)(f: A => A): A = if (test) f(a) else a
- More useful methods remain to be ported from Aptus' prototype (not published because too messy)
- See all the
TODO
s in the code - Also see Gallia'sbacklog
Contributions welcome.
About
A utility library aiming to simplify the Scala coding experience.