Posted onJul 8, 2021

Let's Read – Polished Ruby Programming – Ch 1

Polished Ruby Programming is a recent release byJeremy Evans, a well known Rubyist working on the Ruby core team, Roda, Sequel, and several other projects. Knowing Jeremy and his experience this was an instant buy for me, and I look forward to what we learn in this book.

You can find the book here:

https://www.packtpub.com/product/polished-ruby-programming/9781801072724

This review, like other "Let's Read" series in the past, will go through each of the chapters individually and will add commentary, additional notes, and general thoughts on the content. Do remember books are limited in how much information they can cram on a page, and they can't cover everything.

With that said let's go ahead and get started.

Chapter 1 – Getting the Most out of Core Classes

The book starts in with an overview of core classes, and the following topics:

Learning when to use core classes
Best uses for true, false, and nil objects
Different numeric types for different needs
Understanding how symbols differ from strings
Learning how best to use arrays, hashes, and sets
Working with Struct – one of the underappreciated core classes

We'll be covering each of those. From a glance this is a good overview of common confusing topics in Ruby.

Learning when to use core classes

We start out with two examples, one which usesArray and one which uses a custom classThingList:

things=["foo","bar","baz"]things.eachdo|thing|putsthingendthings=ThingList.new("foo","bar"," baz")things.eachdo|thing|putsthingend

The point made here is that the first is much clearer than the second. UsingThingList introduces a lot of uncertainty versus the more knownArray, especially because as mentioned why else would someone use that instead of anArray?

There are a lot of talks around this topic of extending core classes and some of the bad things that can happen around there, one in particular is"Let's Subclass Hash - What's the worst that could happen?" by Michael Herold. The short version is theHashie gem tried to implement dot-access (hash[:a] can be called ashash.a) and there were all types of issues around that.

Jeremy's point here is a good one: Only go custom when you know the risks and the benefits you gain outweigh them.

Risks like performance, intuitive understanding, maintainability, and more come up frequently and should most certainly be taken into account.

Best uses for`true`,`false`, and`nil` objects

True and False

true andfalse are fairly universal concepts, and as mentioned if they meet your needs you should use them. One thing, however, to watch out for is that they're instances ofTrueClass andFalseClass, Ruby doesn't really have a concept ofBoolean unless you're using something like Steep or Sorbet.

The first case of when to use them is a predicate method, or one that ends with? in Ruby:

1.kind_of?(Integer)# => true

Other examples given are around equalities and inequalities:

1>2# => false1==1# => true

Note:=== behavesvery differently in Ruby, but that's a topic for a later discussion

For me it's a matter of whether you're answering a question. For predicate methods that's clear, for equalities and inequalities maybe a bit less so. Another common use tends to be around status updates, did something succeed or fail? Granted these tend to be more in tuple type pairs like[true, response] or[false, error], but another subject for later.

Nil

Next up he gets intonil and some of the common usages:

[].first# => nil{1=>2}[3]# => nil

nil should be understood as nothing, we return it when there's nothing to return. In the first case there's no first element of theArray, and in the second there's no key for3.

Note:Hash can have a default value assigned through eitherHash.new(0) orHash.new { |h, k| h[k] = [] } which overrides the idea that "nothing" was there, but that's beyond the point being made here.

The tricky part, and one that was mentioned, is that!nil istrue and!1 isfalse:

!nil# => true!1# => false

That gets us patterns like this to "coerce"Boolean-like values:

!!nil

In generalnil should be avoided unless it's genuinely the case that there's "nothing" there. Consider this case:

[1,2,3].select{|v|v>4}# => []

Sure, we found "nothing", but a better response is an emptyArray which is the "nothing" of this particular case. If we returnednil instead and tried to do this what do you think might happen?:

[1,2,3].select{|v|v>4}.map{|v|v*2}

You would get some errors on it. In this particular case with[1, 2, 3] there's "nothing" there but in other cases like[4, 5, 6]? That's valid. One might notice some patterns here with "empty" or "nothing" values, but that strafes hard into Functional Programming territory and a very fun idea you couldread more about here if you're particularly adventurous.

Point being, return sane defaults rather thannil when it makes sense.

Bang (`!`) methods and Nil

Next up are some more confusing parts of Ruby, especially around bang (!) methods:

"a".gsub!('b','')# => nil[2,4,6].select!(&:even?)# => nil["a","b","c"].reject!(&:empty?)# => nil

Jeremy mentions that this is done for optimization purposes to make sure that the receiver didn't make a modification. For me it's a reason I avoid! methods with some frequency as I've been caught by that more than once, and often times you really don't need them. General rule for me is to avoid mutation and mutating methods unless absolutely necessary as it breaks chaining and a lot of intuition about how Ruby works.

Caching with false and nil

In both of the examples provided:

@cached_value||=some_expression# orcache[:key]||=some_expression

Ifsome_expression isfalse ornil it'll reevaluate instead of being "cached" for later use. The suggested alternative is to usedefined? instead:

ifdefined?(@cached_value)@cached_valueelse@cached_value=some_expressionend

Personally I lean towards guard-style statements for method-based caches, but that's a matter of preference:

defanother_expressionreturn@cached_valueifdefined?(@cached_value)@cached_value=some_expressionend

Hash cache

He also mentionsHashes for caching usingfetch which has some additional fun behavior:

cache.fetch(:key){cache[:key]=some_expression}

There are a few ways thatfetch does things which may be important to mention here:

hash={a:1}# => {:a=>1}hash.fetch(:a)# => 1hash.fetch(:b,1)# => 1hash.fetch(:b){1}# => 1hash.fetch(:b)# KeyError (key not found: :b)

If youfetch on a value which is not present without either a default or provided block it'll raise aKeyError, which can bevery useful for ensuring things are present.

Memory Advantages

A good point to close on is thattrue,false, andnil are going to be faster than most other Ruby objects due to being immediate object types. That means there's no requirement for memory allocation on create or indirection on accessing them later, making them faster than non-immediate objects.

Different numeric types for different needs

Next up we have different numeric types. Jeremy opens with a good point that in more cases than not you're probably just going to want anInteger type rather than fractional ones. Ruby also offers floats, rationals, and BigDecimal among a few others if you count non-base-10 variants. They're all under theNumeric class.

Note: - As mentioned,BigDecimal is not required by default:require 'big_decimal'. It also has a particularly pesky compatibility break in whichBigDecimal.new will break versusBigDecimal(). I still don't get why they didn't leave it and just alias it, but alas here we are.

He opens with an example usingtimes:

10.timesdo# executed 10 timesend

It may have been a good idea here to include the block variable as well and indicate that it receives each value:

3.timesdo|i|putsiend# 0# 1# 2

...as the example referenced afor loop equivalency and this may lead to some confusion and introduction of counter variables where one is already built in to cover that case.

Integer division and truncation

A common confusion point withIntegers and one that he brings up here is what happens with truncation:

5/10# => 07/3# => 2

Chances are that's not exactly what was intended, so be careful when dividing to convert one of the digits to a different numeric type likeRational (becauseFloat has its own bit of fun we cover later.)

It returns only the quotient and not the remainder or fractional parts thereafter. That's similar to C, and somewhat amusingly an interview question at some companies.

Floats

Noted workarounds in the book useRational orFloat here:

# or Rational(5, 10) or 5 / 10.to_r5/10r# => (1/2)# Float7.0/3# => 2.3333333333333335

Float is noted as the fastest, but they're not precisely exact.This site has a good explanation as to why, but the short version is not enough digits to represent all numbers, and the more things you do to aFloat the more apparent it becomes as in this example:

f=1.1v=0.01000.timesdov+=fendv# => 1100.0000000000086

Rationals

Rational can get around this with more precision, but is slower in general. If you're dealing with any type of money or things which require precision thoughFloat is abad idea to use.

If we were to do that same code usingRational instead the book shows this:

f=1.1rv=0.0r1000.timesdov+=fendv# => (1100/1)

Now as far as speed Jeremy makes an excellent point which harkens back to YAGNI (You Aren't Going To Need It). They're maybe 2-6x slower, and micro-optimizationsrarely are the bottle neck for your code.

As he mentioned in the book rationals are great for when you need exact answers, and as mentioned earlier money is definitely one of those cases. In cases where you're just comparing numbers and not doing calculations? Yeah,Float is probably fine.

BigDecimal

So where does that leaveBigDecimal in this equation? Let's take a look at the examples provided:

v=BigDecimal(1)/3v*3# => 0.999999999999999999e0f=BigDecimal(1.1,2)v=BigDecimal(0)1000.timesdov+=fendv# => 0.11e4v.to_s('F')# => "1100.0"

BigDecimal uses scientific notation, as the name implies, so it can deal with very large numbers. The book doesn't go into a lot of detail here, and quite frankly I've rarely had to use them in Ruby myself.

Personally I likethis post by HoneyBadger on the subject of currency and whenBigDecimal orRational might be used.

Understanding how symbols differ from strings

If there were a single issue in Ruby that's more confusing than most of the rest combined it would beSymbol vsString and when both are used. I have my personal opinions on this, but will save those for later.

Rails, as the book mentions, treats them indiscriminately as a solution to this annoyance withHash#with_indifferent_access to bypass needing to care about the difference. In the background a lot of Ruby, as the book mentions, will also do this conversion.

So what are the two?

Strings

"A string in Ruby is a series of characters or bytes, useful for storing text or binary data. Unless the string is frozen, you append to it, modify existing characters in it, or replace it with a different string."

In most all cases I would advocate for freezingStrings, Ruby even has the frozen string literal comment to do this that goes at the top of a file:

# frozen_string_literal: true

This has been shown to improve application performance, and is often easier to work with as mutation (especially on receivers) can have all types of unintended consequences. We won't get into functional purity wars on this, but in general mutating methods in Ruby can make it harder to reason about code, so use sparingly.

I'll mention this later, but if frozen string literals were the default a lot of the use case forSymbol would become more difficult to justify, though there would still be some marginal performance gains from their implementation.

Symbol

"A symbol in Ruby is a number with an attached identifier that is a series of characters or bytes. Symbols in Ruby are an object wrapper for an internal type that Ruby callsID, which is an integer type. When you use a symbol in Ruby code, Ruby looks up the number associated with that identifier. The reason for having anID type internally is that it is much faster for computers to deal with integers instead of a series of characters or bytes. Ruby usesID values to reference local variables, instance variables, class variables, constants, and method names."

This may be a bit of a complicated way to explain aSymbol, though does get into some important implementation details. More simply aSymbol is an identifying text to describe a part of your Ruby code.

Methods, for instance, can be identified by aSymbol representing their name likedef add could be represented as:add elsewhere in the program, and passed tosend to retrieve the method code:

method=:addfoo.send(method,bar)

Caveat: Personally I would prefermethod_name here asmethod itself is a Method that can be used to get amethod by name, which can be confusing.

Confusingly though this works as well, as mentioned by the book:

method="add"foo.send(method,bar)

As the book mentions this is because Ruby is trying to be nice to the programmer, and honestly feels a bit self-aware to me that it knows this is confusing. ManyString methods will work on aSymbol, compounding this.

The book mentions the following few examples:

defswitch(value)casevaluewhen:foo# foowhen:bar# barwhen:baz# bazendend

In this one we're usingSymbols as identifying text rather than as text itself. If we were to want todo something withvalue, however,Symbol would not make much sense:

defappend2(value)value.gsub(/foo/,"bar")end

In this casevalue works as aString, so we should ensure aString is passed to it.

Personal Opinions

Personally I believe that frozen strings, if optimized, could be used as more of an alternative toSymbol. Whatever performance gains there are from this are not worth the confusion it incurs on the users, and should be avoided.

Javascript, for instance, has the same JSON-like syntax as Ruby but treats the keys asString values instead:

constmap={a:1,b:2,c:3};map['a']// => 1map.a// => 1

Granted that later dot-syntax is areally bad idea in Ruby as mentioned in that aboveHashie talk from RubyConf, but that's another matter.

My main gripe is that for as much as Ruby gives value to the use ofSymbol it sure likes to pretend they don't exist and coerce things to prevent users from getting errors in a lot of cases.

Anyways, personal rant over, I don't really see this changing in future versions of the language either as it would be far too large of a breaking change and not worth the migration pains on the community to do.

Learning how best to use arrays, hashes, and sets

That's a lot to cover, and honestly one chapter isn't enough to cover a substantial portion of what makes evenArray interesting in Ruby, but that's not the point of this book so I digress. At the least I would highly recommend reading intoEnumerable on theofficial docs after this chapter to get an idea of what all is possible.

Array

[[:foo,1],[:bar,3],[:baz,7]].eachdo|sym,i|# ...end

The example provided is a set of two-item tuples to represent data, not much to show here except that blocks can deconstruct values using arguments likesym andi here. Note that there's a real subtle thing to keep in mind on this versus aHash though: You can have multiple instances of:foo here, but only one in aHash which wants unique keys.

Hash

The Hash example is very similar:

{foo:1,bar:3,baz:7}.eachdo|sym,i|# ...end

The book mentions that theArray solution is likely more correct from a design perspective, but that theHash is easier to implement. I would be inclined to agree with that, except in the case mentioned above where things could get complicated.

Consider if you had a set of tags coming in from AWS underArray tuples, representing that as aHash would be a bad idea. Keep in mind your underlying data when deciding on how to express it in Ruby.

Implementing an in-memory database

Now this is a more unique application of the two in a book that I've seen, and I really like that he's going for something with a bit more substance here. He starts out with generating some mock data to play with here:

album_infos=100.times.flat_mapdo|i|10.times.mapdo|j|["Album#{i}",j,"Track#{j}"]endend

It should be noted thatflat_map flattens after mapping (transforming) a collection, but this book does assume intermediate Ruby knowledge to be fair.

Creating Indexes - Array Tuples

The first part of this involves indexing data, or giving a clear way to look up the data from multiple angles. If we were to make a simple index function forArray it might look like this (and Rails does something similar):

classArraydefindex_by(&block)indexes={}self.each{|v|indexes[block.call(v)]=v}indexesendend

Remember that bit about unique keys though, as that does make things complicated. What if it indexes by a person's name but two people are named the same thing? Anyways, back to the problem solution they provide:

album_artists={}album_track_artists={}album_infos.eachdo|album,track,artist|(album_artists[album]||=[])<<artist(album_track_artists[[album,track]]||=[])<<artistendalbum_artists.each_value(&:uniq!)

Granted for me I might have done something a bit more like this:

album_artists=Hash.new{|h,k|h[k]=Set.new}album_track_artists=Hash.new{|h,k|h[k]=Set.new}album_infos.eachdo|album,track,artist|album_artists[album].addartistalbum_track_artists[[album,track]].addartistend

...which prevents the need to conflate default assignment and later uniqueness constraints, asSet can only have unique values, but that also makes the solution more complicated and harder to explain in the first chapter so I can understand why it was written that way.

The lookup function is amusing:

lookup=->(album,track=nil)doiftrackalbum_track_artists[[album,track]]elsealbum_artists[album]endend

Why? Well ones first instinct might be to create a method like so:

deflookup(album,track=nil)# ...end

...but where exactly does it get thealbum_artists andalbum_track_artists then? This solution avoids that by using lambda functions, which capture the local context they're defined in through what's called a closure.

Granted I think this is a bit unusual in Ruby and not quite common use, but prevents the need for wrapping all of this in a class and substantially lengthening the chapter. Not sure I'd advocate for it elsewhere though.

(You'll also note I make a point not to implement it as such myself for the length of the article)

Creating Indexes - Nested Hashes

The second solution uses nested hashes instead:

albums={}album_infos.eachdo|album,track,artist|((albums[album]||={})[track]||=[])<<artistend

...and as with the previous case it may be worthwhile to decouple assignment and default values by promoting that code to the initial object instantiation:

albums=Hash.newdo|h,k|h[k]=Hash.new{|h2,k2|h2[k2]=[]}end

Is it less succinct? Sure, but it's also explicit about the shape of our data which I believe to be a good tradeoff.

The lookup code, as the book does mention, becomes far more complex for this:

lookup=->(album,track=nil)doiftrackalbums.dig(album,track)elsea=albums[album].each_value.to_aa.flatten!a.uniq!aendend

What I like about this book is that Jeremy mentions the tradeoffs of each of these approaches. TheArray-tuple approach takes a lot more memory, but has much faster lookup for a large number of records. The second is far more inefficient on justalbum lookups, but excels in nested queries.

Creating Indexes - Known Data

What he does in the next section though is an interesting insight on knowing the underlying data and what that affords us.

albums={}album_infos.eachdo|album,track,artist|album_array=albums[album]||=[[]]album_array[0]<<artist(album_array[track]||=[])<<artistendalbums.each_valuedo|array|array[0].uniq!end

Unlike previous sections this assumes that the first item will be the artists, and1 to99 will be the tracks. Wecould explicitly model the data but that gets pretty messy:

TRACK_COUNT=99albums=Hash.new{|h,k|h[k]=[Set.new,*([]*TRACK_COUNT)]}

...which I don't particularly like, but does expose that this data structure is a bit perilous.

One trick here is that Ruby'sdig function works with bothHash andArray, meaning numbered indexes work here, making the lookup function much simpler:

lookup=->(album,track=0)doalbums.dig(album,track)end

...but the code can be brittle when it comes to changing requirements unlike the other two as it's very tightly bound to the shape of the data. You can eek out some extra performance here, but it may not be worth it if you ever need to revisit and refactor it later.

Known Artist Names - Array

The next section wants to develop a feature for finding known artists names in albums versus a list of user-provided ones:

album_artists=album_infos.flat_map(&:last)album_artists.uniq!lookup=->(artists)doalbum_artists&artistsend

Known Artist Names - Hash

...but mentions that this can be slow with large counts of artists. A proposed counter-solution uses aHash to key known artists:

album_artists={}album_infos.eachdo|_,_,artist|album_artists[artist]||=trueendlookup=->(artists)doartists.selectdo|artist|album_artists[artist]endend

Though this may be easier withvalues_at:

lookup=->(artists)doalbum_artists.values_at(*artists)end

Known Artist Names - Set

...but the point of this exercise is to lead us toSet, so let's get to that instead:

require'set'album_artists=Set.new(album_infos.flat_map(&:last))lookup=->(artists)doalbum_artists&artistsend

The difference here is thatSet is much faster than theArray approach, but not quite as fast as theHash one. The book recommends the former for the nicer API, and the latter if you need the performance gain.

Working with Struct – one of the underappreciated core classes

See, I really likeStruct, especially when I'm in a REPL. Glad to see it here. Jeremy starts with an example here of a normal class:

classArtistattr_accessor:name,:albumsdefinitialize(name,albums)@name=name@albums=albumsendend

If you've ever felt like a lot of that was redundant you'll really loveStruct:

Artist=Struct.new(:name,:albums)

...though personally I like kwargs for classes to be clear about what exactly you're passing to it, andStruct also covers that case:

Artist=Struct.new(:name,:albums,keyword_init:true)Artist.new(name:'Brandon',albums:[])

Clearer to me. Anyways, the book mentions the tradeoffs thatStruct is lighter than aclass but takes longer to look up attributes.

He does mention an interesting property ofStruct, a new instance is actually aClass:

Struct.new(:a,:b).class# => Class

Subclassing Struct

Though that's not the case with subclasses as mentioned:

Struct.new('A',:a,:b).new(1,2).class# => Struct::A

...and he also notes an implementation of what theStruct.new method might look like:

defStruct.new(name,*fields)unlessname.is_a?(String)fields.unshift(name)name=nilendsubclass=Class.new(self)ifnameconst_set(name,subclass)end# Internal magic to setup fields/storage for subclassdefsubclass.new(*values)obj=allocateobj.initialize(*values)objend# Similar for allocate, [], members, inspect# Internal magic to setup accessor instance methodssubclassend

If you happen to pass a name like'A' to it it'll define a constant on the current namespace with that subclass attached to it. There's a bit of hand-waving on underlying details here, which would definitely take a bit, then the final section on actually making a new instance.

Personally I would almost rather avoid this in favor of the later mentioned subclassing:

classSubStruct<Structend

...and the above code may be a bit much for what you need to know aboutStruct for most cases.

Frozen Structs

There is mention in the next section about automatically freezing structs:

A=Struct.new(:a,:b)dodefinitialize(...)superfreezeendend

...which makes values immutable. Jeremy also mentions that there were several Ruby tracker issues filed to make this a more mature feature, but none made it into Ruby 3, and this is the most viable workaround.

Personally I like the idea of immutable small data types ala Haskell and Scala case classes for quick usage as containers of data rather than domain objects.

Summary and Questions

The chapter ends off with a summary and some questions. Let's take a look at the questions real quick.

1. How arenil andfalse different from all other objects?

nil is literally nothing, and quite frequently errors you see in Ruby are due to one getting in somewhere where the application does not expect it.

false is an instance ofFalseClass, so not sure I get the intent of this particular question when juxtaposed withnil. Perhaps this would be phrased better on what the intentions of these data types are instead?

2. Are all standard arithmetic operations using two BigDecimal objects exact?

On twoBigDecimal types yes, but if aFloat gets on one side not as much.

3. Would it make sense for Ruby to combine symbols and strings?

Philosophically? I wantSymbol to go away because it makes things far more complicated for new Rubyists for very very little real gains, and even trips me up on a semi-frequent basis. I dislike them for the complexity they introduce to the language.

Pragmatically? No. It should be left as is, as the fallout of changing that would break untold amounts of Ruby code and start one heck of a war in the community. It's not worth the cost, as much as I dislike it.

4. Which uses less memory for the same data - hash, orSet?

ProbablyHash, but not by much. I seem to recall thatSet is implemented in terms of aHash anyways so it can't be that far off.

5. What are the only two core methods that return a new instance ofClass?

Struct.new andClass.new I'd think.

Wrap Up

The Good

In general? Pragmatism. Jeremy excels in making tradeoffs and explaining why certain things are done a certain way, and that shows in a lot of his work. Is it the best solution? Maybe not, but it accounts for edge cases, and that's where he really excels: digging into those very details.

The book takes a pragmatic stance on addressing performance implications of different data structures and their usages. Not many do that.

It took time to address one of the elephants in the Ruby community aroundSymbol andString and had a fairly reasoned response to it. I might have liked to see the implications of removing one, but understand that that'd ballon the size of this chapter real quick.

It took a bolder stance in introductory problem withalbum, which gave a lot more of a chance to explore interesting code. Too many examples feel really basic and don't really show a lot of potential concerns, and I think this book gets that right.

The Bad

Safari Books Online has an early access version with all the code line-breaked and in serif font, no highlighting. I wish Packt would fix this as that's near impossible to read as-is. I do hope the physical book fixes this.

As far as the book itself I feel like the first chapter tries to put alot of content into one chapter, and may have been better served by breaking it up into more sections.

I do wish that the section ontrue,false, andnil went more into reasoned default values rather than dive into bang methods as much as it did, as those will find more use in a lot of Ruby programs to prevent errors.

Some of the examples tended to conflate assignment and concatenation behavior, and may have been better served by explicitly defining data structures above the code over||= use.

The section onStruct veered from a very useful overview to a bit into the weeds and lost me.

Overview

I intend to keep reading and writing similar read-alongs for other chapters, and look forward to what's next.

Do I have objections with some of the content? Sure, but I have objections with my own code from last month, I just make sure to understand why decisions were made and note factors around it as I can. That's what makes these reviews fun is giving additional context and exploring why certain subjects are covered.

See you all in chapter 2!

Top comments(2)

Chris Born

Dad and husband first, beer and software tied for second.

Location
Seattle, WA
Work
Sr Software Engineer
Joined
Oct 19, 2021

• Oct 19 '21

Copy link

Well written. Just heard Jeremy on a podcast talking about the book and finished reading through the sample chapter. I appreciate your "Let's Read" and commentary and look forward to reading the others.