- Notifications
You must be signed in to change notification settings - Fork453
🔬 A Ruby library for carefully refactoring critical paths.
License
github/scientist
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
A Ruby library for carefully refactoring critical paths.
Let's pretend you're changing the way you handle permissions in a large web app. Tests can help guide your refactoring, but you really want to compare the current and refactored behaviors under load.
require"scientist"classMyWidgetdefallows?(user)experiment=Scientist::Default.new"widget-permissions"experiment.use{model.check_user(user).valid?}# old wayexperiment.try{user.can?(:read,model)}# new wayexperiment.runendend
Wrap ause
block around the code's original behavior, and wraptry
around the new behavior.experiment.run
will always return whatever theuse
block returns, but it does a bunch of stuff behind the scenes:
- It decides whether or not to run the
try
block, - Randomizes the order in which
use
andtry
blocks are run, - Measures the wall time and cpu time of all behaviors in seconds,
- Compares the result of
try
to the result ofuse
, - Swallow and record exceptions raised in the
try
block when overridingraised
, and - Publishes all this information.
Theuse
block is called thecontrol. Thetry
block is called thecandidate.
Creating an experiment is wordy, but when you include theScientist
module, thescience
helper will instantiate an experiment and callrun
for you:
require"scientist"classMyWidgetincludeScientistdefallows?(user)science"widget-permissions"do |experiment|experiment.use{model.check_user(user).valid?}# old wayexperiment.try{user.can?(:read,model)}# new wayend# returns the control valueendend
If you don't declare anytry
blocks, none of the Scientist machinery is invoked and the control value is always returned.
The examples above will run, but they're not reallydoing anything. Thetry
blocks don't run yet and none of the results get published. Replace the default experiment implementation to control execution and reporting:
require"scientist/experiment"classMyExperimentincludeScientist::Experimentattr_accessor:namedefinitialize(name)@name=nameenddefenabled?# see "Ramping up experiments" belowtrueenddefraised(operation,error)# see "In a Scientist callback" belowp"Operation '#{operation}' failed with error '#{error.inspect}'"super# will re-raiseenddefpublish(result)# see "Publishing results" belowpresultendend
WhenScientist::Experiment
is included in a class, it automatically sets it as the default implementation viaScientist::Experiment.set_default
. Thisset_default
call is skipped if you includeScientist::Experiment
in a module.
Now calls to thescience
helper will load instances ofMyExperiment
.
Scientist compares control and candidate values using==
. To override this behavior, usecompare
to define how to compare observed values instead:
classMyWidgetincludeScientistdefusersscience"users"do |e|e.use{User.all}# returns User instancese.try{UserService.list}# returns UserService::User instancese.comparedo |control,candidate|control.map(&:login) ==candidate.map(&:login)endendendend
If either the control block or candidate block raises an error, Scientist compares the two observations' classes and messages using==
. To override this behavior, usecompare_errors
to define how to compare observed errors instead:
classMyWidgetincludeScientistdefslug_from_login(login)science"slug_from_login"do |e|e.use{User.slug_from_loginlogin}# returns String instance or ArgumentErrore.try{UserService.slug_from_loginlogin}# returns String instance or ArgumentErrorcompare_error_message_and_class=->(control,candidate)docontrol.class ==candidate.class &&control.message ==candidate.messageendcompare_argument_errors=->(control,candidate)docontrol.class ==ArgumentError &&candidate.class ==ArgumentError &&control.message.start_with?("Input has invalid characters") &&candidate.message.start_with?("Invalid characters in input")ende.compare_errorsdo |control,candidate|compare_error_message_and_class.call(control,candidate) ||compare_argument_errors.call(control,candidate)endendendend
Results aren't very useful without some way to identify them. Use thecontext
method to add to or retrieve the context for an experiment:
science"widget-permissions"do |e|e.context:user=>usere.use{model.check_user(user).valid?}e.try{user.can?(:read,model)}end
context
takes a Symbol-keyed Hash of extra data. The data is available inExperiment#publish
via thecontext
method. If you're using thescience
helper a lot in a class, you can provide a default context:
classMyWidgetincludeScientistdefallows?(user)science"widget-permissions"do |e|e.context:user=>usere.use{model.check_user(user).valid?}e.try{user.can?(:read,model)}endenddefdestroyscience"widget-destruction"do |e|e.use{old_scary_destroy}e.try{new_safe_destroy}endenddefdefault_scientist_context{:widget=>self}endend
Thewidget-permissions
andwidget-destruction
experiments will both have a:widget
key in their contexts.
If an experiment requires expensive setup that should only occur when the experiment is going to be run, define it with thebefore_run
method:
# Code under test modifies this in-place. We want to copy it for the# candidate code, but only when needed:value_for_original_code=big_objectvalue_for_new_code=nilscience"expensive-but-worthwhile"do |e|e.before_rundovalue_for_new_code=big_object.deep_copyende.use{original_code(value_for_original_code)}e.try{new_code(value_for_new_code)}end
Sometimes you don't want to store the full value for later analysis. For example, an experiment may returnUser
instances, but when researching a mismatch, all you care about is the logins. You can define how to clean these values in an experiment:
classMyWidgetincludeScientistdefusersscience"users"do |e|e.use{User.all}e.try{UserService.list}e.cleando |value|value.map(&:login).sortendendendend
And this cleaned value is available in observations in the final published result:
classMyExperimentincludeScientist::Experiment# ...defpublish(result)result.control.value# [<User alice>, <User bob>, <User carol>]result.control.cleaned_value# ["alice", "bob", "carol"]endend
Note that the#clean
method will discard the previous cleaner block if you call it again. If for some reason you need to access the currently configured cleaner block,Scientist::Experiment#cleaner
will return the block without further ado.(This probably won't come up in normal usage, but comes in handy if you're writing, say, a custom experiment runner that provides default cleaners.)
The#clean
method will not be used for comparison of the results, so in the following example it is not possible to remove the#compare
method without the experiment failing:
defuser_idsscience"user_ids"doe.use{[1,2,3]}e.try{[1,3,2]}e.clean{ |value|value.sort}e.compare{ |a,b|a.sort ==b.sort}endend
During the early stages of an experiment, it's possible that some of your code will always generate a mismatch for reasons you know and understand but haven't yet fixed. Instead of these known cases always showing up as mismatches in your metrics or analysis, you can tell an experiment whether or not to ignore a mismatch using theignore
method. You may include more than one block if needed:
defadmin?(user)science"widget-permissions"do |e|e.use{model.check_user(user).admin?}e.try{user.can?(:admin,model)}e.ignore{user.staff?}# user is staff, always an admin in the new systeme.ignoredo |control,candidate|# new system doesn't handle unconfirmed users yet:control && !candidate && !user.confirmed_email?endendend
The ignore blocks are only called if thevalues don't match. Unless acompare_errors
comparator is defined, two cases are considered mismatches: a) one observation raising an exception and the other not, b) observations raising exceptions with different classes or messages.
Sometimes you don't want an experiment to run. Say, disabling a new codepath for anyone who isn't staff. You can disable an experiment by setting arun_if
block. If this returnsfalse
, the experiment will merely return the control value. Otherwise, it defers to the experiment's configuredenabled?
method.
classDashboardControllerincludeScientistdefdashboard_itemsscience"dashboard-items"do |e|# only run this experiment for staff memberse.run_if{current_user.staff?}# ...endend
As a scientist, you know it's always important to be able to turn your experiment off, lest it run amok and result in villagers with pitchforks on your doorstep. In order to control whether or not an experiment is enabled, you must include theenabled?
method in yourScientist::Experiment
implementation.
classMyExperimentincludeScientist::Experimentattr_accessor:name,:percent_enableddefinitialize(name)@name=name@percent_enabled=100enddefenabled?percent_enabled >0 &&rand(100) <percent_enabledend# ...end
This code will be invoked for every method with an experiment every time, so be sensitive about its performance. For example, you can store an experiment in the database but wrap it in various levels of caching such as memcache or per-request thread-locals.
What good is science if you can't publish your results?
You must implement thepublish(result)
method, and can publish data however you like. For example, timing data can be sent to graphite, and mismatches can be placed in a capped collection in redis for debugging later.
Thepublish
method is given aScientist::Result
instance with its associatedScientist::Observation
s:
classMyExperimentincludeScientist::Experiment# ...defpublish(result)# Wall time# Store the timing for the control value, $statsd.timing"science.#{name}.control",result.control.duration# for the candidate (only the first, see "Breaking the rules" below, $statsd.timing"science.#{name}.candidate",result.candidates.first.duration# CPU time# Store the timing for the control value, $statsd.timing"science.cpu.#{name}.control",result.control.cpu_time# for the candidate (only the first, see "Breaking the rules" below, $statsd.timing"science.cpu.#{name}.candidate",result.candidates.first.cpu_time# and counts for match/ignore/mismatch:ifresult.matched? $statsd.increment"science.#{name}.matched"elsifresult.ignored? $statsd.increment"science.#{name}.ignored"else $statsd.increment"science.#{name}.mismatched"# Finally, store mismatches in redis so they can be retrieved and examined# later on, for debugging and research.store_mismatch_data(result)endenddefstore_mismatch_data(result)payload={:name=>name,:context=>context,:control=>observation_payload(result.control),:candidate=>observation_payload(result.candidates.first),:execution_order=>result.observations.map(&:name)}key="science.#{name}.mismatch" $redis.lpushkey,payload $redis.ltrimkey,0,1000enddefobservation_payload(observation)ifobservation.raised?{:exception=>observation.exception.class,:message=>observation.exception.message,:backtrace=>observation.exception.backtrace}else{# see "Keeping it clean" above:value=>observation.cleaned_value}endendend
When running your test suite, it's helpful to know that the experimental results always match. To help with testing, Scientist defines araise_on_mismatches
class attribute when you includeScientist::Experiment
. Only do this in your test suite!
To raise on mismatches:
classMyExperimentincludeScientist::Experiment# ... implementationendMyExperiment.raise_on_mismatches=true
Scientist will raise aScientist::Experiment::MismatchError
exception if any observations don't match.
To instruct Scientist to raise a custom error instead of the defaultScientist::Experiment::MismatchError
:
classCustomMismatchError <Scientist::Experiment::MismatchErrordefto_smessage="There was a mismatch! Here's the diff:"diffs=result.candidates.mapdo |candidate|Diff.new(result.control,candidate)end.join("\n")"#{message}\n#{diffs}"endend
science"widget-permissions"do |e|e.use{Report.find(id)}e.try{ReportService.new.fetch(id)}e.raise_withCustomMismatchErrorend
This allows for pre-processing on mismatch error exception messages.
Scientist rescues and tracksall exceptions raised in atry
oruse
block, including some where rescuing may cause unexpected behavior (likeSystemExit
orScriptError
). To rescue a more restrictive set of exceptions, modify theRESCUES
list:
# default is [Exception]Scientist::Observation::RESCUES.replace[StandardError]
Timeout ⏲️: If you're introducing a candidate that could possibly timeout, use caution.
If an exception is raised within any of Scientist's internal helpers, likepublish
,compare
, orclean
, theraised
method is called with the symbol name of the internal operation that failed and the exception that was raised. The default behavior ofScientist::Default
is to simply re-raise the exception. Since this halts the experiment entirely, it's often a better idea to handle this error and continue so the experiment as a whole isn't canceled entirely:
classMyExperimentincludeScientist::Experiment# ...defraised(operation,error)InternalErrorTracker.track!"science failure in#{name}:#{operation}",errorendend
The operations that may be handled here are:
:clean
- an exception is raised in aclean
block:compare
- an exception is raised in acompare
block:enabled
- an exception is raised in theenabled?
method:ignore
- an exception is raised in anignore
block:publish
- an exception is raised in thepublish
method:run_if
- an exception is raised in arun_if
block
Becauseenabled?
andrun_if
determine when a candidate runs, it's impossible to guarantee that it will run every time. For this reason, Scientist is only safe for wrapping methods that aren't changing data.
When using Scientist, we've found it most useful to modify both the existing and new systems simultaneously anywhere writes happen, and verify the results at read time withscience
.raise_on_mismatches
has also been useful to ensure that the correct data was written during tests, and reviewing published mismatches has helped us find any situations we overlooked with our production data at runtime. When writing to and reading from two systems, it's also useful to write some data reconciliation scripts to verify and clean up production data alongside any running experiments.
Keep in mind that Scientist'stry
anduse
blocks run sequentially in random order. As such, any data upon which your code depends may change before the second block is invoked, potentially yielding a mismatch between the candidate and control return values. To calibrate your expectations with respect tofalse negatives arising from systemic conditions external to your proposed changes, consider starting with an experiment in which both thetry
anduse
blocks invoke the control method. Then proceed with introducing a candidate.
As your candidate behavior converges on the controls, you'll start thinking about removing an experiment and using the new behavior.
- If there are any ignore blocks, the candidate behavior isguaranteed to be different. If this is unacceptable, you'll need to remove the ignore blocks and resolve any ongoing mismatches in behavior until the observations match perfectly every time.
- When removing a read-behavior experiment, it's a good idea to keep any write-side duplication between an old and new system in place until well after the new behavior has been in production, in case you need to roll back.
Sometimes scientists just gotta do weird stuff. We understand.
Science is useful even when all you care about is the timing data or even whether or not a new code path blew up. If you have the ability to incrementally control how often an experiment runs via yourenabled?
method, you can use it to silently and carefully test new code paths and ignore the results altogether. You can do this by settingignore { true }
, or for greater efficiency,compare { true }
.
This will still log mismatches if any exceptions are raised, but will disregard the values entirely.
It's not usually a good idea to try more than one alternative simultaneously. Behavior isn't guaranteed to be isolated and reporting + visualization get quite a bit harder. Still, it's sometimes useful.
To try more than one alternative at once, add names to sometry
blocks:
require"scientist"classMyWidgetincludeScientistdefallows?(user)science"widget-permissions"do |e|e.use{model.check_user(user).valid?}# old waye.try("api"){user.can?(:read,model)}# new service APIe.try("raw-sql"){user.can_sql?(:read,model)}# raw queryendendend
When the experiment runs, all candidate behaviors are tested and each candidate observation is compared with the control in turn.
Define the candidates with namedtry
blocks, omit ause
, and pass a candidate name torun
:
experiment=MyExperiment.new("various-ways")do |e|e.try("first-way"){ ...}e.try("second-way"){ ...}endexperiment.run("second-way")
Thescience
helper also knows this trick:
science"various-ways",run:"first-way"do |e|e.try("first-way"){ ...}e.try("second-way"){ ...}end
If you're writing tests that depend on specific timing values, you can provide canned durations using thefabricate_durations_for_testing_purposes
method, and Scientist will report these inScientist::Observation#duration
andScientist::Observation#cpu_time
instead of the actual execution times.
science"absolutely-nothing-suspicious-happening-here"do |e|e.use{ ...}# "control"e.try{ ...}# "candidate"e.fabricate_durations_for_testing_purposes({"control"=>{"duration"=>1.0,"cpu_time"=>0.9},"candidate"=>{"duration"=>0.5,"cpu_time"=>0.4}})end
fabricate_durations_for_testing_purposes
takes a Hash of duration & cpu_time values, keyed by behavior names. (By default, Scientist uses"control"
and"candidate"
, but if you override these as shown inTrying more than one thing orNo control, just candidates, use matching names here.) If a name is not provided, the actual execution time will be reported instead.
We should mention these durations will be used both for theduration
field and thecpu_time
field.
LikeScientist::Experiment#cleaner
, this probably won't come up in normal usage. It's here to make it easier to test code that extends Scientist.
If you need to use Scientist in a place where you aren't able to include the Scientist module, you can callScientist.run
:
Scientist.run"widget-permissions"do |e|e.use{model.check_user(user).valid?}e.try{user.can?(:read,model)}end
Be on a Unixy box. Make sure a modern Bundler is available.script/test
runs the unit tests. All development dependencies are installed automatically. Scientist requires Ruby 2.3 or newer.
- RealGeeks/lab_tech is a Rails engine for using this library by controlling, storing, and analyzing experiment results with ActiveRecord.
- daylerees/scientist (PHP)
- scientistproject/scientist.net (.NET)
- joealcorn/laboratory (Python)
- rawls238/Scientist4J (Java)
- tomiaijo/scientist (C++)
- trello/scientist (node.js)
- ziyasal/scientist.js (node.js, ES6)
- TrueWill/tzientist (node.js, TypeScript)
- TrueWill/paleontologist (Deno, TypeScript)
- yeller/laboratory (Clojure)
- lancew/Scientist (Perl 5)
- lancew/ScientistP6 (Perl 6)
- MadcapJake/Test-Lab (Perl 6)
- cwbriones/scientist (Elixir)
- calavera/go-scientist (Go)
- jelmersnoeck/experiment (Go)
- spoptchev/scientist (Kotlin / Java)
- junkpiano/scientist (Swift)
- serverless scientist (AWS Lambda)
- fightmegg/scientist (TypeScript, Browser / Node.js)
- MisterSpex/misterspex-scientist (Java, no dependencies)
About
🔬 A Ruby library for carefully refactoring critical paths.
Topics
Resources
License
Code of conduct
Security policy
Uh oh!
There was an error while loading.Please reload this page.