The “Dependency Cutout” Workflow Pattern, Part I
It’s important to be able to fix bugs in your open sourcedependencies, and not just work around them.
Tell me if you’ve heard this one before.
You’re working on an application. Let’s call it “FooApp”. FooApp has adependency on an open source library, let’s call it “LibBar”. You find a bugin LibBar that affects FooApp.
To envisage the best possible version of this scenario, let’s say you activelylike LibBar, both technically and socially. You’ve contributed to it in thepast. But this bug is causing production issues in FooApptoday, andLibBar’s release schedule is quarterly. FooApp is your job; LibBar is (atbest) your hobby. Blocking on the full upstream contribution cycle and waitingfor a release is an absolute non-starter.
What do you do?
There are a few common reactions to this type of scenario, all of which arebad options.
I will enumerate them specifically here, because I suspect that some of themmay resonate with many readers:
Find an alternative to LibBar, and switch to it.
This is a bad idea because a transition to a core infrastructure componentcould be extremely expensive.
Vendor LibBar into your codebase and fix your vendored version.
This is a bad idea because carrying this one fix now requires you tomaintain all the tooling associated with a monorepo1: you have to beable to start pulling in new versions from LibBar regularly, reconcile yourchanges even though you now have a separate version history on yourimported version, and so on.
Monkey-patch LibBar to include your fix.
This is a bad idea because you are now extremely tightly coupled to aspecific version of LibBar. By modifying LibBar internally like this,you’re inherently violating its compatibility contract, in a way which isgoing to be extremely difficult to test. Youcan test this change, ofcourse, but as LibBar changes, you will need to replicate any relevantportions of its test suite (which may be itsentire test suite) inFooApp. Lots of potential duplication of effort there.
Implement a workaround in your own code, rather than fixing it.
This is a bad idea because you are distorting the responsibility forcorrect behavior. LibBar is supposed to do LibBar’s job, and unless youhave a full wrapper for it in your own codebase, other engineers (including“yourself, personally”) might later forget to go through the alternate,workaround codepath, and invoke the buggy LibBar behavior again in some newplace.
Implement the fix upstream in LibBar anyway, because that’s the Right Thing To Do, and burn credibility with management while you anxiously wait for a release with the bug in production.
This is a bad idea because you are betraying your users — by allowing thebuggy behavior to persist — for the workflow convenience of your dependencyproviders. Your users are probably giving you money, and trusting you withtheir data. This means you have both ethical and economic obligations toconsider their interests.
As much as it’s nice to participate in the open source community and takeon an appropriate level of burden to maintain the commons, this cannotsustainably be at the explicit expense of the population you servedirectly.
Even if we only care about the open source maintainers here, there’sstill a problem: as you are likely to come under immediate pressure to shipyour changes, you will inevitably relay at least a bit of that stress tothe maintainers. Even if you try to be exceedingly polite, the maintainerswill know thatyou are coming under fire for not having shipped the fixyet, and are likely to feel an even greater burden of obligation to shipyour code fast.
Much as it’s good to contribute the fix, it’s not great to put this on themaintainers.
The respective incentive structures of software development — specifically, ofcorporate application development and open source infrastructure development —make options 1-4 very common.
On the corporate / application side, these issues are:
it’s difficult for corporate developers to get clearance to spend even small amounts of their work hours on upstream open source projects, but clearance to spend time on the project they actually work on is implicit. If it takes 3 hours of wrangling with Legal2 and 3 hours of implementation work to fix the issue in LibBar, but 0 hours of wrangling with Legal and 40 hours of implementation work in FooApp, a FooApp developer will often perceive it as “easier” to fix the issue downstream.
it’s difficult for corporate developers to get clearance from management to spend even small amounts ofmoney sponsoring upstream reviewers, so even if they can find the time to contribute the fix, chances are high that it will remain stuck in review unless they are personally well-integrated members of the LibBar development team already.
even assuming there’s zero pressure whatsoever to avoid open sourcing the upstream changes, there’s still the fact inherent to any development team that FooApp’s developers will be more familiar with FooApp’s codebase and development processes than they are with LibBar’s. It’s justeasier to work there, even if all other things are equal.
systems for tracking risk from open source dependencies often lack visibility into vendoring, particularly if you’re doing a hybrid approach and only vendoring afew things to address work in progress, rather than a comprehensive and disciplined approach to a monorepo. If you fully absorb a vendored dependency and then modify it, Dependabot isn’t going to tell you that a new version is available any more, because it won’t be present in your dependency list. Organizationally this is bad of course but from the perspective of anindividual developer this manifests mostly as fewer annoying emails.
But there are problems on the open source side as well. Those problems are allderived from one big issue: because we’re often working with relatively smallsums of money, it’s hard for upstream open source developers toconsumeeither money or patches from application developers. It’s nice to say that youshould contribute money to your dependencies, and you absolutelyshould, butthe cost-benefit function is discontinuous. Before a project reaches thefiscal threshold where it can be at leastone person’s full-time job to worryabout this stuff, there’s often no-one responsible in the first place.Developers will therefore gravitate to the issues that are either fun, orrelevant to theirown job.
These mutually-reinforcing incentive structures are a big reason that users ofopen source infrastructure, even teams who work at corporate users withzillions of dollars, don’t reliably contribute back.
The Answer We Want
All those options are bad. If we had a good option, what would it look like?
It is both practically necessary3 and morally required4 for you to have away to temporarily rely on a modified version of an open source dependency,without permanently diverging.
Below, I will describe a desirable abstract workflow for achieving this goal.
Step 0: Report the Problem
Before you get started with any of these other steps, write up a cleardescription of the problem and report it to the project as an issue;specifically,in contrast to writing it up as a pull request. Describe theproblembefore submitting a solution.
You may not be able to wait for a volunteer-run open source project to respondto your request, but you shouldat least tell the project what you’replanning on doing.
If you don’t hear back from them at all, you will have at least made sure tocomprehensively describe your issue and strategy beforehand, which will providesome clarity and focus to your changes.
If youdo hear back from them, in the worst case scenario, you may discoverthat a hard fork will be necessary because they don’t consider your issuevalid, but even that information will save you time, if you know it before youget started. In the best case, you may get a reply from the project tellingyou that you’ve misunderstood its functionality and that there is already aconfiguration parameter or usage pattern that will resolve your problems withno new code. But in all cases, you will benefit from early coordination onwhat needs fixing before you get tohow to fix it.
Step 1: Source Code and CI Setup
Fork the source code for your upstream dependency to a writable location whereit can live at least for the duration of this one bug-fix, and possibly for theduration of your application’s use of the dependency. After all, you mightwant to fix more thanone bug in LibBar.
You want to have a place where you can put your edits, that will be versioncontrolled and code reviewed according to your normal development process.This probably means you’ll need to have your own main branch that diverges fromyour upstream’s main branch.
Remember: you’re going to need to deploy this toyour production, so testinggates that your upstream only applies to final releases of LibBar will need tobe applied to every commit here.
Depending on your LibBar’s own development process, this may result in slightlyunusual configurations where, for example, your fixes are written against thelast LibBar release tag, rather than its current5main; if the project has a branch-freshness requirement, youmight need two branches, one for your upstream PR (based on main) and one foryour own use (based on the release branch with your changes).
Ideally for projects with really good CI and a strong “keep mainrelease-ready at all times” policy, you can deploy straight from a developmentbranch, but it’s good to take a moment to consider this before you get started.It’s usually easier to rebase changes from an older HEAD onto a newer one thanit is to go backwards.
Speaking of CI, you will want to have your own CI system. The fact that GitHubActions has become a de-facto lingua franca of continuous integration meansthat this step may be quite simple, and your forked repo can just run its owninstance.
Optional Bonus Step 1a: Artifact Management
If you have an in-house artifact repository, you should set that up for yourdependency too, and upload your own build artifacts to it. You can often treatyour modified dependency as an extension of your own source tree and installfrom a GitHub URL, but if you’ve already gone to the trouble of having anin-house package repository, you can pretend you’ve taken over maintenance ofthe upstream package temporarily (which you kind of have) and leverage thoseworkflows for caching and build-time savings as you would with any otherinternal repo.
Step 2: Do The Fix
Now that you’ve got somewhere to edit LibBar’s code, you will want to actuallyfix the bug.
Step 2a: Local Filesystem Setup
Before you have a production version on your own deployed branch, you’ll wantto test locally, which means havingboth repositories in a single integrateddevelopment environment.
At this point, you will want to have alocal filesystem reference to yourLibBar dependency, so that you can make real-time edits, without going througha slow cycle of pushing to a branch in your LibBar fork, pushing to a FooAppbranch, and waiting for all of CI to run on both.
This is useful in both directions: as you prepare the FooApp branch that makesany necessary updates on that end, you’ll want to make sure that FooApp canexercise the LibBar fix in any integration tests. As you work on the LibBarfix itself, you’ll also want to be able to use FooApp to exercise the code andsee if you’ve missed anything - and this, you wouldn’t get in CI, since LibBarcan’t depend on FooApp itself.
In short, you want to be able to treat both projects as an integrateddevelopment environment, with support from your usual testing and debuggingtools, just as much as you want your deployment output to be an integratedartifact.
Step 2b: Branch Setup for PR
However, for continuous integration to work, you willalso need to have aremote resource reference of some kind from FooApp’s branch to LibBar. Youwill need 2 pull requests: the first to land your LibBar changes to yourinternal LibBar fork and make sure it’s passing itsown tests, and then asecond PR to switch your LibBar dependency from the public repository to yourinternal fork.
At this step it isvery important to ensure that there is an issue filed onyour own internal backlog to drop your LibBar fork. You do not want to losetrack of this work; it is technical debt that must be addressed.
Until it’s addressed, automated tools like Dependabot will not be able to applysecurity updates to LibBar for you; you’re going to need to manually integrateevery upstream change. This type of work is itself very easy to drop or losetrack of, so you might just end up stuck on a vulnerable version.
Step 3: Deploy Internally
Now that you’re confident that the fix will work, and that yourtemporarily-internally-maintained version of LibBar isn’t going to breakanything onyour site, it’s time to deploy.
Somedeploymentheritageshould help to providesome evidence that your fix is ready to land inLibBar, but at the next step, please remember that your production environmentisn’t necessarily emblematic of that of all LibBar users.
Step 4: Propose Externally
You’ve got the fix, you’ve tested the fix, you’ve got the fix in your ownproduction, you’ve told upstream you want to send them some changes. Now, it’stime to make the pull request.
You’re likely going to get some feedback on the PR, even if you think it’salready ready to go; as I said, despite having been proven inyour productionenvironment, you may get feedback about additional concerns from other usersthat you’ll need to address before LibBar’s maintainers can land it.
As you process the feedback, make sure that each new iteration of your branchgets re-deployed to your own production. It would be a huge bummer to gothrough all this trouble, and then end up unable to deploy the next publiclyreleased version of LibBar within FooApp because you forgot to test that yourresponses to feedbackstill worked on your own environment.
Step 4a: Hurry Up And Wait
If you’re lucky, upstream will land your changes to LibBar. But, there’s stillno release version available. Here, you’ll have to stay in a holding patternuntil upstream can finalize the release on their end.
Depending on some particulars, itmight make sense at this point to archiveyour internal LibBar repository and move your pinned release version to a githash of the LibBar version where your fix landed, in their repository.
Before you do this, check in with the LibBar core team and make sure that theyunderstand that’s what you’re doing and they don’t have any wacky workflowswhich may involve rebasing or eliding that commit as part of their releaseprocess.
Step 5: Unwind Everything
Finally, you eventually want to stop carrying any patches and move back to anofficial released version that integrates your fix.
You want to do this because this is what the upstream will expect when you arereporting bugs. Part of the benefit of using open source is benefiting fromthe collective work to do bug-fixes and such, so you don’t want to be stuck offon a pinned git hash that the developers do not support for anyone else.
As I said in step 2b6, make sure tomaintain a tracking task for doing thiswork, because leaving this sort of relativelyeasy-to-clean-up technical debtlying around is something that can potentially create a lot of aggravation forno particular benefit. Make sure to put your internal LibBar repository intoan appropriate state at this point as well.
Up Next
This is part 1 of a 2-part series. In part 2, I will explore in depth how toexecute this workflow specifically for Python packages, using some populartools. I’ll discuss my own workflow, standards like PEP 517 andpyproject.toml, and of course, by the popular demand that I justknow willcome,uv.
Acknowledgments
Thank you tomy patrons who are supporting my writing onthis blog. If you like what you’ve read here and you’d like to read more ofit, or you’d like to support myvarious open-sourceendeavors, you cansupport my work as asponsor!
if you already have all the tooling associated with a monorepo,including the ability to manage divergence and reintegrate patches withupstream, you already have the higher-overhead version of the workflow I amgoing to propose, so, never mind. but chances are you don’t have that, veryfew companies do. ↩
In any business where one must wrangle with Legal, 3 hours is awildlyoptimistic estimate. ↩
In an ideal world every project wouldkeep its main branch ready torelease at all times, no matterwhatbut we do not live in an ideal world. ↩
In this case, there is no question. It’s 2b only, no not-2b. ↩
The Best Line Length
What’s a good maximum line length for your coding standard?
What’s a good maximum line length for your coding standard?
This is, of course, a trick question. By posing itas a question, I havecreated the misleading impression that itis a question, butBlack has selected the correct number for you;it’s88 which is obviously verylucky.
Thanks for reading my blog.
OK, OK. Clearly, there’s more to it than that. This is an age-old debate on thelevel of “tabs versus spaces”. So contentious, in fact, that even the famouslyopinionated Blackdoes in fact let you changeit.
Ancient History
One argument that certainsillypeople1 like to make is “why are wewrapping at 80 characters like we are using 80 character teletypes, it’s the2020s! I have an ultrawide monitor!”. The implication here is that the widthof 80-character terminals is an antiquated relic, based entirely around thehardware limitations of a bygone era, and modern displays can put tons of stuffon one line, so why notuse that capability?
This feels intuitively true, given thehuge disparity between ancient timesand now: on my own display, I can comfortably fitabout 350 characters on aline. What a shame, to have so much room for so many characters in each line,and to waste it all on blank space!
But... is that true?
I stretched out my editor window all the way to measure that ‘350’ number, butI did not continue editing at that window width. In order to have a morecomfortable editing experience, I switched back intowriteroommode, a mode which emulates aconsiderably morewriterlyapplication, which limits each line length to 92 characters, regardless offrame width.
You’ve probably noticed this too. Almost all sites that display prose of anykind limit their width, even on very wide screens.
As silly as that tiny little ribbon of text running down the middle of yourmonitor might look with a full-screened stereotypical news site or blog, if youfull-screen a site thatdoesn’t set that width-limit, although itmakessense that you can now use all that space up, it will lookextremely,almost unreadably bad.
Blogging software does not set a column width limit on your text because ofsome 80-character-wide accident of history in the form of a hardware terminal.
Similarly, if you really try to use that screen real estate to its fullest forcoding, and start editing 200-300 character lines, you’ll quickly notice itstarts to feel just a bit weird and confusing. It gets surprisingly easy tolose your place.Rhetorically the “80 characters is just because of dinosaurtechnology! Use all those ultrawide pixels!” talking point is quite popular,butpractically people usually just want a few more characters worth ofbreathing room, maxing out at 100 characters, far narrower than even the mostsvelte widescreen.
So maybe those 80 character terminals are holding us back alittle bit,but... wait a second. Why were theterminals 80 characters wide in the firstplace?
Ancienter History
Asthis lovely Software Engineering StackExchange postsummarizes, terminals were probably 80 characters because teletypes were 80characters, and teletypes were probably 80 characters because punch cards were80 characters, and punch cards were probably 80 characters becausethat’s justabout how many typewritten characters fit onto one line of a US-Letter piece ofpaper.
Even before typewriters, consider the averagenewspaper: why do we call aregularly-occurring featured article in a newspaper a “column”? Becausebroadsheet papers weretoo wide to have only a single column; they wouldalways be broken into multiple! Far more aggressive than 80 characters,columns in newspapers typically have30 characters per line.
The first newspaper printing machines were custom designed and could have usedwhatever width they wanted, so why standardize on something so narrow?3
Science!
There has been a surprising amount of scientific research aroundthisissue, but in brief, there’s areason here rooted in human physiology: when you read a block of text, you arenot consciously moving your eyes from word to word like you’re dragging a mousecursor, repositioning continuously. Human eyes reading text move in quickbursts of rotation called “saccades”.In order to quickly and accurately move from one line of text to another, thestart of the next line needs to be clearly visible in the reader’s peripheralvision in order for them to accurately target it. This limits the angle ofrotation that the reader can perform in a single saccade, and, thus, the lengthof a line that they can comfortably read without hunting around for the startof the next line every time they get to the end.
So, 80 (or 88) characters isn’t too unreasonable for a limit. It’s longer than30 characters, that’s for sure!
But, surely that’s notall, or this wouldn’t be so contentious in the firstplace?
Caveats
The screenis wide, though.
The ultrawide aficionadosdo have a point, even if it’s not really thesimple one about “old terminals” they originally thought. Our modernwide-screen displaysare criminally underutilized, particularly for text.Even adding in the big chunky file, class, and method tree browser over on theleft and the source code preview on the right, a brief survey of a Google Imagesearch for “vs code” shows alot of editors open with huge, blank areas onthe right side of the window.
Big screensare super useful as they allow us to leverage our spatialmemories to keep more relevant code around and simply glance around as wethink, rather than navigate interactively. But it only works if you rememberto do it.
Newspapers allowed us to read a ton ofinformation in one sitting with minimum shuffling by packing in as much as 6columns of text. You could read a column to the bottom of the page, back tothe top, and down again, several times.
Similarly, books fill both of their opposed pages with text at the same time,doubling the amount of stuff you can read at once before needing to turn thepage.
You may notice that reading text in a book, even in an ebook app, is morecomfortable than reading a ton of text by scrolling around in a web browser.That’s because our eyes arebuilt for saccades, and repeatedly tracking thecontinuous smooth motion of the page as it scrolls to a stop, then re-targetingthe new fixed location to start saccading around from, is literally morephysically strenuous on your eye’s muscles!
There’s a reason that thecodexwas a big technological innovation over the scroll. This is a regression!
Today, the right thing to do here is to make use of horizontally split panes inyour text editor or IDE, and just make a bit of conscious effort to set up theappropriate code on screen for the problem you’re working on. However, this isa potential area for different IDEs to really differentiate themselves, andbuild multi-column continuous-code-reading layouts that allow for buffers towrap and be navigable newspaper-style.
Similar,modern CSS has shockingly good support for multi-columnlayouts, and it’s ashame that true multi-column, page-turning layouts are so rare. If I everfigure out a way to deploy this here that isn’t horribly clunky and fightingmodern platform conventions like “scrolling horizontally is substantially moreannoying and inconsistent than scrolling vertically” maybe I will experimentwith such a layout on this blog one day. Until then… just make the browserwindow narrower so other useful stuff can be in the other parts of the screen,I guess.
Code Isn’t Prose
But, I digress. While I think that columnar layouts for reading proseare aninteresting thing more people should experiment with, code isn’t prose.
Themetric used for ideal line width, which you may have noticed if youclicked through some of those Wikipedia links earlier, is not “character cellsin your editor window”, it ischaracters per line, or “CPL”.
With an optimal CPL somewhere between 45 and 95, acode-line-width ofsomewhere around 90 might actually be the best idea, becausewhitespace usesup your line-width budget. In a typical object-oriented Python program2,most of your code ends up indented by at least 8 spaces: 4 for the classscope, 4 for the method scope. Most likely a lot of it is 12, because anyinteresting code will have at least one conditional or loop. So, by the timeyou’re done wasting all that horizontal space, a max line length of 90 actuallylooks more like a maximum of 78... right about that sweet spot from theUS-Letter page in the typewriter that we started with.
What about soft-wrap?
In principle, source code is structured information, whose presentation couldbe fully decoupled from its serialized representation. Everyone could configuretheir preferred line width appropriate to their custom preferences and thespecific physiological characteristics of their eyes, and the code could beformatted according to the language it was expressed in, and “hard wrapping”could be a silly antiquated thing.
The problem with this argument is the same as the argument against “but tabsaresemantic indentation”, to wit: nope, no it isn’t. What “in principle”means in the previous paragraph is actually “in a fantasy world which we do notinhabit”. I’d love it if editors treated code this way and we had a richhistory and tradition of structured manipulations rather than typing in stringsof symbols to construct source code textually. But that is not the world welive in. Hard wrapping is unfortunately necessary to integrate with difftools.
So what’s the optimal line width?
The exact, specific number here is still ultimately a matter of personalpreference.
Hopefully, understanding the long history, science, and underlying physicalconstraints can lead you to select a contextually appropriate value for yourown purposes that will balance ease of reading, integration with the relevanttools in your ecosystem, diff size, presentation in the editors and IDEs thatyour contributors tend to use, reasonable display in web contexts, onpresentation slides, and so on.
But — and this is important — counterpoint:
No it isn’t, you don’t need toselect an optimal width, because it’s already been selected for you. It is88.
Acknowledgments
Thank you for reading, and especially thank you tomypatrons who are supporting my writing on this blog. Ifyou like what you’ve read here and you’d like to read more of it, or you’d liketo support myvarious open-source endeavors, youcansupport my work as a sponsor!
I love the fact that this message is, itself, hard-wrapped to 77characters. ↩
Let’s be honest; we’re all object-oriented python programmers here,aren’t we? ↩
Unsurprisingly, there are also financial reasons.More, narrowercolumns meant it was easier to fix typesetting errors and to insert moreadvertisements asnecessary. Butreadability really did have a lot to do with it, too; scientists werelooking at ease of reading as far back as the 1800s. ↩
I Think I’m Done Thinking About genAI For Now
The conversation isn’t over, but I don’t think I have much to add to it.
The Problem
Like many other self-styled thinky programmer guys, I like to imagine myself asa sort ofHolmesian genius,making trenchant observations, collecting them, and then synergizing them intobrilliant deductions with the keen application of my powerful mind.
However, several years ago, I had an epiphany in my self-concept. I finallyunderstood that, to the extent that Iam usefully clever, it is less in aHolmesian idiom, and more, shall we say,Monkesque.
For those unfamiliar with either of the respective franchises:
- Holmes is a towering intellect honed by years of training, who catalogues intentional, systematic observations and deduces logical, factual conclusions from those observations.
- Monk, on the other hand, while also a reasonably intelligent guy, is highly neurotic, wracked by unresolved trauma and profound grief. As both a consulting job and a coping mechanism, he makes a habit of erratically wandering into crime scenes, and, driven by a carefully managed jenga tower of mental illnesses, leverages his dual inabilities to solve crimes. First, he is unable to filter out apparently inconsequential details, building up a mental rat’s nest of trivia about the problem; second, he is unable to let go of any minor incongruity, obsessively ruminating on the collection of facts until they all make sense in a consistent timeline.
Perhaps surprisingly, this tendency serves both this fictional wretch of adetective, and myself, reasonably well. I find annoying incongruities inabstractions and I fidget and fiddle with them until I end up buildingsomething thata lot of people like, or perhapssomething that a smaller number of people getreally excitedabout. At worst, at leastIeventually understand what’s goingon. This is a self-soothingactivity but it turns out that, managed properly, it can very effectivelysoothe others as well.
All that brings us to today’s topic, which is an incongruity I cannot smoothout or fit into a logical framework to make sense. I am, somewhat reluctantly,agenAIskeptic. However, I am,even morereluctantly, exposed to genAI Discourse every damn minute of every damn day.It is relentless, inescapable, and exhausting.
This preamble about personality should hopefully help you, dear reader, tounderstand how I usually address problematical ideas by thinking and thinkingand fidgeting with them until I manage to write some words — or perhaps a newopen source package — that logically orders the ideas around it in a way whichallows my brain to calm down and let it go, and how that process is importantto me.
In this particular instance, however, genAI has defeated me. I cannot make itmake sense, but I need to stop thinking about it anyway. It is too much and Ineed to give up.
My goal with this post is not toconvince anyone of anything in particular —and we’ll get to why that is a bit later — but rather:
- to set out my current understanding in one place, including all the various negative feelings which are still bothering me, so I can stop repeating it elsewhere,
- to explainwhy I cannot build a case that I thinkshould be particularly convincing to anyone else, particularly to someone who actively disagrees with me,
- in so doing, to illustrate why I think the discourse is so fractious and unresolvable, and finally
- to give myself, and hopefully by proxy to give others in the same situation, permission to just peace out of this nightmare quagmire corner of the noosphere.
But first, just because I can’tprove that my interlocutors areWrong On TheInternet, doesn’t mean I won’t explain why Ifeellike they are wrong.
The Anti-Antis
Most recently, at time of writing, there have been a spate of “the genAIdiscourse is bad” articles, almost exclusively written from the perspective of,notboosters exactly, but pragmatically minded (albeit concerned) genAIusers, wishing for the skeptics to be more pointed and accurate in ourcritiques. This is anti-anti-genAI content.
I am not going to link to any of these, because, as part of theirself-fulfilling prophecy about the “genAI discourse”, they’realso all bad.
Mostly, however, they had very little worthwhile to respond to because theywere straw-manning their erstwhile interlocutors. They are all getting annoyedat “bad genAI criticism” while failing to engage with — and often failing toevenmention — most of the actualsubstance of any serious genAIcriticism. At least, any of the criticism that I’ve personally read.
I understand wanting to avoid a callout or Gish-gallop culture and just expressyour own ideas. So, I understand that they didn’t link directly to particularsources or go point-by-point on anyone else’s writing. Obviously I get it,since that’s exactly what this post is doing too.
But if you’re going to talk about how bad the genAI conversation is, withoutevenmentioning huge categories of problem like “climate impact” or“disinformation”1 even once, I honestly don’t know what conversation you’reeven talking about. This is peak “make up a guy to get mad at” behavior, whichis especially confusing in this circumstance, because there’s an absolutelyhuge crowd of actual people that you could already be mad at.
The people writing these pieces have historically seemed very thoughtful to me.Some of them I know personally. It is worrying to me that their criticalthinking skills appear to have substantially degradedspecifically afterspending a bunch of time intensely using this technology which I believe has ascary risk ofdegrading one’s critical thinkingskills.Correlation is not causation or whatever, and sure, from a rhetoricalperspective this is “post hoc ergo propter hoc” and maybe a little “ad hominem”for good measure, but correlation can still beconcerning.
Yet, I cannoteffectively respond to these folks, because they are making apractical argument that I cannot, despite my best efforts, find compellingevidence to refute categorically.My experiences of genAI are all extremelybad, but that is barely even anecdata.Their experiences areneutral-to-positive. Little scientific data exists. How to resolve this?2
The Aesthetics
As I begin to state myown position, let me lead with this: my factualanalysis of genAI is hopelessly negatively biased. I find the vast majority ofthe aesthetic properties of genAI to beintensely unpleasant.
I have been tryingvery hard to correct for this bias, to try to payattention to the facts and to have a clear-eyed view of these systems’capabilities. But the feelings are visceral, and the effort to compensate istiring. It is, in fact, the desire to stop making thisparticular kind ofeffort that has me writing up this piece and trying to take an intentionalbreak from the subject, despite its intense relevance.
When I say its “aesthetic qualities” are unpleasant, I don’t just mean theaesthetic elements of output of genAIs themselves. The aesthetic quality ofgenAI writing, visual design, animation and so on, whilemostly atrocious, isalso highly variable. There are cherry-picked examples which look… fine.Maybe even good. For years now, there have been, famously, literallyaward-winning aesthetic outputs of genAI3.
While I am ideologically predisposed to see any “good” genAI art as accruingthe benefits of either a survivorship bias from thousands of terrible outputsor simple plagiarism rather than its own inherent quality, I cannot deny thatin many cases itis “good”.
However, I am not just talking about the product, but the process; theaesthetic experience of interfacing with the genAI system itself, rather thanthe aesthetic experience of the outputs of that system.
I am not a visual artist and I am not really a writer4, particularly not awriter of fiction or anything else whose experience is primarily aesthetic. SoI will speak directly to the experience of software development.
I have seen very few successful examples of using genAI to produce whole,working systems. There are no shortage of highly publicmiserablefailures, particularlyfromthe vendors of these systemsthemselves,where the outputs are confused, self-contradictory, full of subtle errors andgenerally unusable. While few studies exist, it surelooks like this is anautomated way of producing aNet Negative ProductivityProgrammer, throwing outchaff to slow down the rest of the team.5
Juxtapose this with my aforementioned psychological motivations, to wit, I wantto have everything in the computer beorderly andmake sense, I’m sure mostof you would have no trouble imagining that sitting through this sort ofpractice would make meextremely unhappy.
Despite this plethora of negative experiences, executives are aggressivelymandating the use of AI6. It looks likewithout such mandates, mostpeople will not bother to use such tools, so the executives will need muscularpolicies to enforce its use.7
Being forced to sit and argue with a robot while it struggles and fails toproduce a working output, while you have to rewrite the code at the end anyway,is incredibly demoralizing. This is the kind of activity that activateseverysingle major cause ofburnout atonce.
But, at least in that scenario, the thingultimately doesn’t work, so there’sa hope that after a very stressful six month pilot program, you can go tomanagement with a pile of meticulously collected evidence, and shut the wholething down.
I am inclined to believe that, in fact, it doesn’t work well enough to be usedthis way, and that we are going to see a big crash. But that is not the mostaesthetically distressing thing. The most distressing thing is that maybe itdoes work; if not well enough to actually do the work, at least ambiguouslyenough to fool the executives long-term.
Thisproject,in particular, stood out to me as an example. Its author, a self-professed “AIskeptic” who “thought LLMs were glorified Markov chain generators that didn’tactually understand code and couldn’t produce anything novel”, did agreen-field project to test this hypothesis.
Now, this particular project is nottotally inconsistent with a world inwhich LLMs cannot produce anything novel. One could imagine that, out in theworld of open source, perhaps there is enough “OAuth provider written inTypeScript” blended up into the slurry of “borrowed8” training data that theminor constraint of “make it work on Cloudflare Workers” is a small tweak9. Itis not fully dispositive of the question of the viability of “genAI coding”.
But it is a data point related to that question, and thus it did make mecontend with what might happen if itwere actually a fully demonstrativeexample. I reviewed the commit history, as the author suggested. For the sakeof argument, I tried to ask myself if I would like working this way. Just forclarity on this question, I wanted to suspend judgement about everything else;assuming:
- the model could be created with ethically, legally, voluntarily sourced training data
- its usage involved consent from labor rather than authoritarian mandates
- sensible levels of energy expenditure, with minimal CO2 impact
- it is substantially more efficient to work this way than to just write the code yourself
and so on, and so on… would Ilike to use this magic robot that could mostlyjust emit working code for me? Would I use it if it werefree, in all sensesof the word?
No. I absolutely would not.
I found the experience of reading this commit history and imagining myselfusing such a tool — without exaggeration — nauseating.
Unlikemany programmers, I lovecode review. I find that it is one of the best parts of the process ofprogramming. I can help people learn, and develop their skills, and learnfrom them, and appreciate the decisions they made, develop an impression of afellow programmer’s style. It’s a great way to build a mutual theory of mind.
Of course, it can still be really annoying; people make mistakes, often can’tsee things I find obvious, and in particular when you’re reviewing a lot ofcode from a lot of different people, you often end up having to repeatexplanations of thesame mistakes. So I can see why many programmers,particularly those more introverted than I am, hate it.
But, ultimately, when I review their code and work hard to provide clear andactionable feedback, people learn and grow and it’s worth that investment ininconvenience.
The process of coding with an “agentic” LLM appears to be the process ofcarefully distilling all the worst parts of code review, and removing anddiscarding all of its benefits.
The lazy, dumb, lying robot asshole keeps making the same mistakes over andover again, never improving, never genuinely reacting, always obsequiouslypretending to take your feedback on board.
Even when it “does” actually “understand” and manages to load your instructionsinto its context window, 200K tokens later it will slide cleanly out of itsmemory and you will have to say it again.
All the while, it is attempting to trick you. It gets most things right, butit consistently makes mistakes in the places that you are least likely tonotice. In places where a personwouldn’t make a mistake. Your brain keepstrying to develop a theory of mind to predict its behavior but there’s no mindthere, so it always behaves infuriatingly randomly.
I don’t think I am the only one who feels this way.
The Affordances
Whatever our environmentsafford,we tend to do more of. Whatever they resist, we tend to do less of. So in aworld where we were all writing all of our code and emails and blog posts andtexts to each other with LLMs, what do they afford that existing tools do not?
As a weirdo who enjoys code review, I also enjoy process engineering. Thecentral question of almost all process engineering is to continuously ask: howshall we shape our tools, to better shape ourselves?
LLMs are an affordance forproducing more text, faster. How is that going toshape us?
Again arguing in the alternative here, assuming the text is free from errorsand hallucinations and whatever, it’s all correct and fit for purpose, thatmeans it reduces the pain of circumstances where you have to repeat yourself.Less pain! Sounds great; I don’t like pain.
Every codebase has places where you need boilerplate. Every organization hasdefects in its information architecture that require repetition of certaininformation rather than a link back to the authoritative source of truth.Often, these problems persist for a very long time, because it is difficult toovercome the institutional inertia required to make real progress rather thangoing along with the status quo. But this is often where the highest-valueprojects can be found.Where there’s muck, there’sbrass.
The process-engineering function of an LLM, therefore, is to preventfundamental problems from ever getting fixed, to reward the rapid-fireoverwhelm of infrastructure teams with an immediate, catastrophic cascade oflegacy code that is now much harder to delete than it is to write.
There is a scene in Game of Thrones where Khal Drogo kills himself. He does soby replacing a stinging, burning, therapeutic antiseptic wound dressing withsome cool, soothing mud. The mud felt nice, addressed the immediate pain,removed the discomfort of the antiseptic, and immediately gave him a lethalinfection.
The pleasing feeling of immediate progress when one prompts an LLM to solvesome problem feels like cool mud on my brain.
The Economics
We are in the middle of amania aroundthis technology. As I have written about before, I believe the mania will end.There will then be a crash, and a “winter”. But, as I may not have stressedsufficiently, this crash will be the biggest of its kind — so big, that it isarguably not of a kind at all. The level of investment in these technologiesisbananas and the possibility that the investors will recoup theirinvestment seems close to zero. Meanwhile, that cost keeps going up, and up,and up.
Others have reported on this in detail10, and I will not reiterate that allhere, but in addition to being a looming and scary industry-wide (if we arelucky; more likely it’s probably “world-wide”) economic threat, it is alsogoing to drive some panicked behavior from management.
Panicky behavior from managementstressed that their idea is not panning out is, famously, the cause of muchhuman misery. I expect that even in the “good” scenario, wheresome profitis ultimately achieved, will still involve mass layoffs rocking the industry,panicked re-hiring, destruction of large amounts of wealth.
It feels bad to think about this.
The Energy Usage
For a long time I believed that the energy impact was overstated. I am even onrecord,about a year ago,saying I didn’t think the energy usage was a big deal. I think I was wrongabout that.
It initially seemed like it was letting regular old data centers off the hook.But recently I have learned that, while the numbers are incomplete because thevendors aren’t sharing information, they’re alsoextremely bad.11
I think there’s probably a version of this technology that isn’t a climateemergency nightmare, but that’s not the version that the general public hasaccess to today.
The Educational Impact
LLMs are making academic cheatingincredibly rampant.12
Not only is it so common as to be nearly universal, it’s also extremely harmfulto learning.13
For learning, genAI is aforklift at thegym.
To some extent, LLMs are simply revealing a structural rot within education andacademia that has been building for decades if not centuries. But it waswithin those inefficiencies and the inconveniences of the academic experiencethat real learningwas, against all odds, still happening in schools.
LLMs produce a frictionless, streamlined process where students caneffortlessly glide through the entire credential, learning nothing. Onceagain, they dull the pain without regard to its cause.
This is not good.
The Invasion of Privacy
This is obviously only a problem with the big cloud models, but then, the bigcloud models are the only ones that people actually use. If you are havingconversations about anything private with ChatGPT, you are sending all of thatprivate information directly to Sam Altman, to do with as he wishes.
Even if you don’t think he is a particularly bad guy, maybe he won’t evencreate the privacy nightmare on purpose. Maybe he will be forced to do so as aresult of some bizarre kafkaesque accident.14
Imagine the scenario, for example, where a woman is tracking her cycle anduploading the logs to ChatGPT so she can chat with it about a health concern.Except, surprise, you don’t have to imagine, you can just search for it, asIhave personally, organically, seen three separate women on YouTube, at leastone of whomlives in Texas, not only do this on camera butrecommend doingthis to their audiences.
Citation links withheld on this particular claim for hopefully obvious reasons.
I assure you that I am neither particularly interested in menstrual productsnor genAI content, and ifI am seeing this more than once, it is probably adistressingly large trend.
The Stealing
The training data for LLMs is stolen. I don’t mean like “pirated” in the sensewhere someone illicitly shares a copy they obtained legitimately; I mean theirscrapers are ignoring both norms15 and laws16 to obtain copies under falsepretenses, destroying other people’s infrastructure17 in the process.
The Fatigue
I have provided references to numerous articles outlining rhetorical andsometimes data-driven cases for the existence of certain properties andconsequences of genAI tools. But I can’tprove any of these properties,either at a point in time or as a durable ongoing problem.
The LLMs themselves are simply too large to model with the usual kind ofheuristics one would use to think about software. I’d sooner be able topredict the physics of dice in a casino than a 2 trillion parameter neuralnetwork. They resist scientific understanding, not just because of their sizeand complexity, but because unlike a natural phenomenon (which could of coursebe considerably larger and more complex) theyresist experimentation.
The first form of genAI resistance to experiment is that every discussion is amotte-and-bailey. IfI use a free model and get a bad result I’m told it’s because I should haveused the paid model. If I get a bad result with ChatGPT I should have usedClaude. If I get a bad result with a chatbot I need to start using an agentictool. If an agentic tool deletes my hard drive by puttingos.system(“rm -rf~/”) intositecustomize.py then I guess I should have built my own MCPintegration with a completely novel heretofore never even considered securitysandbox or something?
What configuration, exactly, would let me make a categorical claim about thesethings? What specific methodological approach should I stick to, to getreliably adequate prompts?
For the record though, if the idea of the free models is that they are going tobe provocative demonstrations of the impressive capabilities of the commercialmodels, and the results are consistently dogshit, I am finding it increasinglyhard to care how muchbetter the paid ones are supposed to be, especially since the “better”-nesscannot really be quantified in any meaningful way.
The motte-and-bailey doesn’t stop there though. It’s a war on all fronts.Concerned about energy usage? That’s OK, you can use a local model. Concernedabout infringement? That’s okay, somewhere, somebody, maybe, has figured outhow to train models consensually18. Worried about the politics of enrichingthe richest monsters in the world? Don’t worry, you can always download an“open source” model from Hugging Face. It doesn’t matter that many of theseproperties are mutually exclusive and attempting to fix one breaks two others;there’s always an answer, the field is so abuzz with so many people trying topull in so many directions at once that it is legitimately difficult tounderstand what’s going on.
Even here though, I can see that characterizing everything this way is unfairto a hypothetical sort of person. If there is someone working at one of thesethousands of AI companies that have been springing up like toadstools after arain, and theyreally are solving one of these extremely difficult problems,how can I handwave that away? We need people working on problems, that’s like,the whole point of having an economy. And I really don’t like shitting onother people’s earnest efforts, so I try not to dismiss whole fields. Givenhow AI has gotten intoeverything, in a way that e.g. cryptocurrency neverdid, painting with that broad a brush inevitably ends up tarring a bunch ofstuff that isn’t even really AI at all.
The second form of genAI resistance to experiment is the inherent obfuscationof productization. The models themselves are already complicated enough, buttheproducts that are built around the models are evolving extremely rapidly.ChatGPT is not just a “model”, and with the rapid19 deployment of ModelContext Protocol tools, the edges of all these things will blur even further.Every LLM is now just an enormous unbounded soup of arbitrary software doingarbitrary whatever. How could I possibly get my arms around that to understandit?
The Challenge
I have woefully little experience with these tools.
I’ve tried them out a little bit, and almost every single time the result hasbeen a disaster that has not made me curious to push further. Yet, I keephearing from all over the industry that I should.
To some extent, I feel like the motte-and-bailey characterization above isfair; if the technology itself can really do real software development, itought to be able to do it in multiple modalities, and there’s nothing anyonecanarticulate to me about GPT-4o which puts it in a fundamentally differentclass than GPT-3.5.
But, also, I consistently hear that thesubjective experience of using thepremium versions of the tools is actually good, and the free ones are actuallybad.
I keep struggling to find ways to try them “the right way”, the way that peopleI know and otherwise respect claim to be using them, but I haven’t managed todo so in any meaningful way yet.
I do not want to be using the cloud versions of these models with theirpotentially hideous energy demands; I’d like to use a local model. But thereis obviously not a nicely composed way to use local models like this.
Since there are apparentlyzero models with ethically-sourced training data,and litigation is ongoing20 to determine the legal relationships of trainingdata and outputs, even if I can be comfortable with some level of plagiarism ona project, I don’t feel that I can introduce the existential legal risk intoother people’s infrastructure, so I would needto make anew project.
Others have differing opinions of course, including some within my dependencychain, which does worry me, but I still don’t feel like I can freely contributefurther to the problem; it’s going to be bad enough to unwind any impactupstream. Even just for my own sake, I don’t want to make it worse.
This especially presents a problem because I haveway too much stuff goingon already. A new projectis not practical.
Finally, even if Idid manage to satisfy all of my quirky21 constraints,would this experiment really be worth anything? The models and tools thatpeople are raving about are the big, expensive, harmful ones. If I proved tomyself yet again that a small model with bad tools was unpleasant to use, Iwouldn’t really be addressing my opponents’ views.
I’m stuck.
The Surrender
I am writing this piece to makemy peace with giving up on this topic, atleast for a while. While I do idly hope that some folks might find bits of itconvincing, and perhaps find ways to be more mindful with their own usage ofgenAI tools, and consider the harm they may be causing, that’s not actually thegoal. And that is not the goal because it is just so much goddamnwork toprove.
Here, I must return to my philosophical hobbyhorse ofsprachspiel. Inthis case, specifically to use it as an analytical tool, not just to understandwhat I am trying to say, but what thepurpose for my speech is.
The concept of sprachspiel is most frequently deployed to describe thegoalof the language game being played, but in game theory, that’s only half thestory. Speech — particularly rigorously justified speech — has acost, aswell as a benefit. I can make shit up pretty easily, but if I want to doanything remotely like scientific or academic rigor, that cost can beastronomical. In the case of developing an abstract understanding of LLMs, thecost is just too high.
So what is my goal, then? To be king Canute, standing astride the shore of“tech”, whatever that is, commanding the LLM tide not to rise? This is amulti-trillion dollar juggernaut.
Even the rump, loser, also-ran fragment of it has the power to literallysuffocate us in our homes22 if they so choose, completely insulated from anyconsequence. If the power curve starts there, imagine what thewinners inthis industry are going to be capable of, irrespective of the technologythey’re building - just with the resources they have to hand. Am I going towrite a blog post that can rival their propaganda apparatus? Doubtful.
Instead, I will just have to concede that maybe I’m wrong. I don’t have theskill, or the knowledge, or the energy, to demonstrate with any level of rigorthat LLMs are generally, in fact, hot garbage. Intellectually, I will have toacknowledge that maybe the boosters are right. Maybe it’ll be OK.
Maybe the carbon emissions aren’t so bad. Maybe everybody is keeping themsecret in ways that they don’t for other types of datacenter for perfectlylegitimate reasons. Maybe the tools really can write novel and correct code,and with a little more tweaking, it won’t be so difficult to get them to do it.Maybe by the time they become a mandatory condition of access to developertools, they won’t be miserable.
Sure, I even sincerely agree, intellectual property really has been a prettybad idea from the beginning. Maybe it’s OK that we’ve made an exception tothose rules. The rules were stupid anyway, so what does it matter if we let afew billionaires break them? Really, everybody should be able to break them(although of course, regular people can’t, because we can’t afford the lawyersto fight off the MPAA and RIAA, but that’s a problem with the legal system, nottech).
I come not to praise “AI skepticism”, but to bury it.
Maybe it really is all going to be fine. Perhaps I am simply catastrophizing;I have been known to do that from time to time. I can even sort of believe it,in my head. Still, even after writing all this out, I can’t quite manage tobelieve it in the pit of my stomach.
Unfortunately, that feeling is not something that you, or I, can argue with.
Acknowledgments
Thank you tomy patrons. Normally, I would say, “who aresupporting my writing on this blog”, but in the case of this piece, I feel morelike I should apologize to them for this than to thank them; these thoughtshave been preventing me from thinking more productive, useful things that Iactually have relevant skill and expertise in; this felt more like a creativeblockage that I just needed to expel than a deliberately written article. Ifyou like what you’ve read here and you’d like to read more of it, well, toobad; I amsincerely determined to stop writing about this topic. But, ifyou’d like to read more stuff likeother things I have written, or you’d liketo support myvarious open-source endeavors, youcansupport my work as a sponsor!
And yes, disinformation is still an issue even if you’re “just” using itfor coding. Even sidestepping the practical matter that technology isinherently political,validation and propagation of poor technique is aform ofdisinformation. ↩
I can’t resolve it, that’s the whole tragedy here, but I guess we haveto pretend I will to maintain narrative momentum here. ↩
The story inCreativeBloq, ortheNYT,if you must ↩
although it’s not for lack of trying, Jesus, look at the word count on this ↩
These are sometimes referred to as “10x” programmers, because they makeeveryone around them 10x slower. ↩
Douglas B. Laney at Forbes,Viral Shopify CEO Manifesto Says AI Now Mandatory For All Employees ↩
The National CIO Review,AI Mandates, Minimal Use: Closing theWorkplace ReadinessGap ↩
Matt O’Brien at the AP,Reddit sues AI company Anthropic for allegedly ‘scraping’ user comments to train chatbot Claude ↩
Using the usual tricks to find plagiarism like searching for literaltranscriptions of snippets of training data did not pull up anything when Itried, but then, that’s not how LLMs work these days, is it? If it didn’tobfuscate the plagiarism it wouldn’t be a very goodplagiarism-obfuscator. ↩
David Gerard at Pivot to AI, “Microsoft and AI: spending billions tomakemillions”,Edward Zitron at Where’s Your Ed At, “The Era Of The BusinessIdiot”, bothsobering reads ↩
James O’Donnell and Casey Crownhart at the MIT Technology Review,We didthe math on AI’s energy footprint. Here’s the story you haven’theard. ↩
Lucas Ropek at Gizmodo,AI Cheating Is So Out of Hand In America’sSchools That the Blue Books Are ComingBack ↩
James D. Walsh at the New York Magazine Intelligencer,Everyone Is Cheating Their Way Through College ↩
Ashley Belanger at Ars Technica,OpenAI slams court order to save allChatGPT logs, including deletedchats ↩
Ashley Belanger at Ars Technica,AI haters build tarpits to trap and trickAI scrapers that ignorerobots.txt ↩
Blake Brittain at Reuters,Judge in Meta case warns AI could ‘obliterate’ market for original works ↩
Xkeeper,TCRF has been getting DDoSed ↩
Kate Knibbs at Wired,Here’s Proof You Can Train an AI Model WithoutSlurping CopyrightedContent ↩
and, I should note,extremely irresponsible ↩
Porter Anderson at Publishing Perspectives,Meta AI Lawsuit: USPublishers File AmicusBrief ↩
It feels bizarre to characterize what feel like baseline ethicalconcerns this way, but the fact remains that within the “genAI community”,this places me into a tiny and obscure minority. ↩
Ariel Wittenberg for Politico,‘How come I can’t breathe?’: Musk’s datacompany draws a backlash inMemphis ↩
Stop Writing `__init__` Methods
YEARS OF DATACLASSES yet NO REAL-WORLD USE FOUND for overridingspecial methods just so you can have some attributes.
The History
Before dataclasses were added to Python in version 3.7 — in June of 2018 — the__init__ special method had an important use. If you had a classrepresenting a data structure — for example a2DCoordinate, withx andyattributes — you would want to be able to construct it as2DCoordinate(x=1,y=2), which would require you to add an__init__ method withx andyparameters.
The other options available at the time all had pretty bad problems:
- You could remove
2DCoordinatefrom your public API and instead expose amake_2d_coordinatefunction and make it non-importable, but then how would you document your return or parameter types? - You could document the
xandyattributes and make the user assign each one themselves, but then2DCoordinate()would return an invalid object. - You could default your coordinates to 0 with class attributes, and while that would fix the problem with option 2, this would now require all
2DCoordinateobjects to be not just mutable, but mutated at every call site. - You could fix the problems with option 1 by adding a newabstract class that you could expose in your public API, but this would explode the complexity of every new public class, no matter how simple. To make matters worse,
typing.Protocoldidn’t even arrive until Python 3.8, so, in the pre-3.7 world this would condemn you to using concrete inheritance and declaring multiple classes even for the most basic data structure imaginable.
Also, an__init__ methodthat does nothing but assign a few attributesdoesn’t have any significant problems, so it is an obvious choice in this case.Given all the problems that I just described with the alternatives, it makessense that it became the obviousdefault choice, in most cases.
However, by accepting “define a custom__init__” as thedefault way toallow users to create your objects, we make a habit of beginningevery classwith a pile ofarbitrary code that gets executed every time it isinstantiated.
Wherever there is arbitrary code, there are arbitrary problems.
The Problems
Let’s consider a data structure more complex than one that simply holds acouple of attributes. We will create one that represents a reference to someI/O in the external world: aFileReader.
Of course Python hasits own open-file objectabstraction, but Iwill be ignoring that for the purposes of the example.
Let’s assume a world where we have the following functions, in an imaginaryfileio module:
open(path: str) -> intread(fileno: int, length: int)close(fileno: int)
Our hypotheticalfileio.open returns an integer representing a filedescriptor1,fileio.read allows us to readlength bytes from an openfile descriptor, andfileio.close closes that file descriptor, invalidatingit for future use.
With the habit that we have built from writing thousands of__init__ methods,we might want to write ourFileReader class like this:
1234567 | |
For our initial use-case, this is fine. Client code creates aFileReader bydoing something likeFileReader("./config.json"), which always creates aFileReader that maintains its file descriptorint internally as privatestate. This is as it should be; we don’t want user code to see or mess with_fd, as that might violateFileReader’s invariants. All the necessary workto construct a validFileReader — i.e. the call toopen — is always takencare of for you byFileReader.__init__.
However, additional requirements will creep in, and as they do,FileReader.__init__ becomes increasingly awkward.
Initially we only care aboutfileio.open, but later, we may have to deal witha library that has its own reasons for managing the call tofileio.open byitself, and wants to give us anint that we use as our_fd, we now have toresort to weird workarounds like:
1234 | |
Now, all those nice properties that we got from trying to force objectconstruction to give us a valid object are gone.reader_from_fd’s typesignature, which takes a plainint, has no way of even suggesting to clientcode how to ensure that it has passed in the rightkind ofint.
Testing is much more of a hassle, because we have to patch in our own copy offileio.open any time we want an instance of aFileReader in a test withoutdoing any real-life file I/O, even if we could (for example) share a singlefile descriptor among manyFileReader s for testing purposes.
All of this also assumes afileio.open that issynchronous. Although forliteral file I/O this is more of ahypotheticalconcern, there are many types of networked resource which are really onlyavailable via an asynchronous (and thus: potentially slow, potentiallyerror-prone) API. If you’ve ever found yourself wanting to typeasync def__init__(self): ... then you have seen this limitation in practice.
Comprehensively describingall the possible problems with this approach wouldend up being a book-length treatise on a philosophy of object oriented design,so I will sum up by saying that thecause of all these problems is the same:we are inextricably linking the act ofcreating a data structure withwhatever side-effects aremost often associated with that data structure.If they are “often” associated with it, then by definition they are not“always” associated with it, and all the cases where theyaren’t associatedbecome unweildy and potentially broken.
Defining an__init__ is an anti-pattern, and we need a replacement for it.
The Solutions
I believe this tripartite assemblage of design techniques will address theproblems raised above:
- using
dataclassto define attributes, - replacing behavior that previously would have previously been in
__init__with a new classmethod that does the same thing, and - using precise types to describe what a valid instance looks like.
Usingdataclass attributes to create an__init__ for you
To begin, let’s refactorFileReader into adataclass. This does get us an__init__ method, but itwon’t be one an arbitrary one we define ourselves;it will get the useful constraint enforced on it that it will just assignattributes.
1234567 | |
Except... oops. In fixing the problems that we created with our custom__init__ that callsfileio.open, we have re-introduced several problemsthat it solved:
- We have removed all the convenience of
FileReader("path"). Now the user needs to import the low-levelfileio.openagain, making the most common type of construction both more verbose and less discoverable; if we want users to know how to build aFileReaderin a practical scenario, we will have to add something in our documentation to point at a separate module entirely. - There’s no enforcement of the validity of
_fdas a file descriptor; it’s just some integer, which the user could easily pass an incorrect instance of, with no error.
In isolation,dataclass by itself can’t solve all our problems, so let’s addin the second technique.
Usingclassmethod factories to create objects
We don’t want to require any additional imports, or require users to go lookingat any other modules — or indeed anything other thanFileReader itself — tofigure out how to create aFileReader for its intended usage.
Luckily we have a tool that can easily address all of these concerns at once:@classmethod. Let’s define aFileReader.open class method:
1234567 | |
Now, your callers can replaceFileReader("path") withFileReader.open("path"), and get all the same benefits.
Additionally, if we needed toawait fileio.open(...), and thus we needed itssignature to be@classmethod async def open, we are freed from the constraintof__init__ as a special method. There is nothing that would prevent a@classmethod from beingasync, or indeed, from having any othermodification to its return value, such as returning atuple of related valuesrather than just the object being constructed.
UsingNewType to address object validity
Next, let’s address the slightly trickier issue of enforcing object validity.
Our type signature calls this thing anint, and indeed, that is unfortunatelywhat the lower-levelfileio.open gives us, and that’s beyond our control.But for ourown purposes, we can be more precise in our definitions, usingNewType:
12 | |
There are a few different ways to address the underlying library, but for thesake of brevity and to illustrate that this can be done withzero run-timeoverhead, let’s justinsist to Mypy that we have versions offileio.open,fileio.read, andfileio.write which actually already takeFileDescriptorintegers rather than regular ones.
1234 | |
We do of course have to slightly adjustFileReader, too, but the changes arevery small. Putting it all together, we get:
1 2 3 4 5 6 7 8 91011 | |
Note that the main technique here is notnecessarily usingNewTypespecifically, but rather aligning an instance’s property of “has all attributesset” as closely as possible with an instance’s property of “fully validinstance of its class”;NewType is just a handy tool to enforce any necessaryconstraints on the places where you need to use a primitive type likeint,str orbytes.
In Summary - The New Best Practice
From now on, when you’re defining a new Python class:
- Make it a dataclass2.
- Use its default
__init__method3. - Add
@classmethods to provide your users convenient and discoverable ways to build your objects. - Require thatall dependencies be satisfied by attributes, so you always start with a valid object.
- Use
typing.NewTypeto enforce any constraints on primitive data types (likeintandstr) which might have magical external attributes, like needing to come from a particular library, needing to be random, and so on.
If you define all your classes this way, you will get all the benefits of acustom__init__ method:
- All consumers of your data structures will receive valid objects, because an object with all its attributes populated correctly is inherently valid.
- Users of your library will be presented with convenient ways to create your objects that do as much work as is necessary to make them easy to use, and they can discover these just by looking at the methods on your class itself.
Along with some nice new benefits:
- You will be future-proofed against new requirements for different ways that users may need to construct your object.
- If there are already multiple ways to instantiate your class, you can now give each of them a meaningful name; no need to have monstrosities like
def __init__(self, maybe_a_filename: int | str | None = None): - Your test suite can always construct an object by satisfying all its dependencies; no need to monkey-patch anything when you can always call the type and never do any I/O or generate any side effects.
Before dataclasses, it was always a bit weird that such a basic feature of thePython language — giving data to a data structure to make it valid — requiredoverriding a method with 4 underscores in its name.__init__ stuck out likea sore thumb. Other such methods like__add__ or even__repr__ wereinherently customizing esoteric attributes of classes.
For many years now, that historical language wart has beenresolved.@dataclass,@classmethod, andNewType give you everything youneed to build classes which are convenient, idiomatic, flexible, testable, androbust.
Acknowledgments
Thank you tomy patrons who are supporting my writing onthis blog. If you like what you’ve read here and you’d like to read more ofit, or you’d like to support myvarious open-sourceendeavors, you cansupport my work as asponsor! I am alsoavailable forconsulting work if you think your organizationcould benefit from expertise on topics like “but whatis a ‘class’, really?”.
If you aren’t already familiar, a “file descriptor” is an integer whichhas meaning only within your program; you tell the operating system to opena file, it says “I have opened file 7 for you”, and then whenever you referto “7” it is that file, until you
close(7). ↩Or anattrs class, if you’re nasty. ↩
Unless you have a really good reason to, of course. Backwardscompatibility, or compatibility with another library, might be good reasonsto do that. Or certain types of data-consistency validation which cannotbe expressed within the type system. The most common example of thesewould be a class that requires consistencybetween two different fields,such as a “range” object where
startmust always be less thanend.There are always exceptions to these types of rules. Still, it’s prettymuchnever a good idea to do any I/O in__init__, and nearly all of theremaining stuff that maysometimes be a good idea in edge-cases can beachieved with a__post_init__rather than writing a literal__init__. ↩
Small PINPal Update
I made a new release of PINPal today and that made me want to remindyou all about it.
Todayon stream, I updatedPINPal to fix the memorization algorithm.
If you haven’t heard of PINPal before, it is a vault password memorizationtool. For more detail on what that means, you can check it outtheREADME, and why not giveit a ⭐ while you’re at it.
As I started writing up an update post I realized that I wanted tocontextualize it a bit more, because it’s a tool I really wish were morepopular. It solves one of those small security problems that you can mostlyignore, right up until the point where it’s a huge problem and it’s too late todo anything about it.
In brief, PINPal helps you memorize newsecure passcodes for things youactually have to remember and can’t simply put into your password manager, likethe passwordto your password manager, your PC user account login, your emailaccount1, or the PIN code to your phone or debit card.
Too often, even if you’re properly using a goodpasswordmanager for your passwords, you’ll be protecting itwith a password optimized for memorability, which is to say, one that isn’trandom and thus isn’t secure. But I have also seen folks veer too far in theother direction, trying to make a really secure password that they then forgetright after switching to a password manager. Forgetting your vault passwordcan also be areally big deal, making you do password resets across every appyou’ve loaded into it so far, so having an opportunity to practice itperiodically is important.
PINPal usesspacedrepetition to ensure that youremember the codes it generates.
While periodic forced password resets area badidea,if (and only if!) you can actually remember the new password, itis a goodidea to get rid of old passwordseventually — like, let’s say, when you get anew computer or phone. Doing so reduces the risk that a password storedsomewhere on a very old hard drive or darkweb data dump is still floatingaround out there, forever haunting your current security posture. If you do areset every 2 years or so, you know you’ve never got more than 2 years ofhistory to worry about.
PINPal is also particularly secure in the way it incrementally generates yourpassword; the computer you install it on only ever stores the entire passwordin memory when you type it in. It stores even the partial fragments that youare in the process of memorizing using the securekeyring module, avoiding plain-textwhenever possible.
I’ve been using PINPal to generate and memorize new codes for a while, just incase2, and the change I made today was because encountered a recurringproblem. The problem was, I’d forget a token after it had been hidden, andthere was never any going back. The moment that a token was hidden from theuser, it was removed from storage, so you could never get a reminder. WhileI’ve successfully memorized about 10 different passwords with it so far, I’vehad to delete 3 or 4.
So, in the updated algorithm, the visual presentation now hides tokens in theprompt several memorizations before they’re removed. Previously, if thepassword you were generating was ‘hello world’, you’d seehello world 5 timesor so, times, then•••• world; if you ever got it wrong past that point, toobad, start over. Now, you’ll seehello world, then°°°° world, then afteryou have gotten the prompt rightwithout seeing the token a few times, you’llsee•••• world after the backend has locked it in and it’s properly erasedfrom your computer.
If you get the prompt wrong, breaking your streak reveals the recently-hiddentoken until you get it right again. I also did a new release on that samelivestream, so if this update sounds like it might make the memorizationprocess more appealing, check it out viapip installpinpal today.
Right now this tool is still only extremely for a specific type of nerd — it’scommand-line only, and you probably need to hand-customize your shell prompt toinvoke it periodically. But I’mworking on making it moreaccessible to a broader audience.It’s open source, of course, so you can feel free to contribute your own code!
Acknowledgments
Thank you tomy patrons who are supporting my writing onthis blog. If you like what you’ve read here and you’d like to read morethings like it, or you’d like to support myvarious open-sourceendeavors, you cansupport my work as asponsor!
Your email account password can be stored in your password manager, ofcourse, but given that email is the root-of-trust reset factor for so manythings, being able toremember that password is very helpful in certainsituations. ↩
Funny story: at one point, Apple had an outage which made it brieflyappear as if a lot of people needed to reset their iCloud passwords, myselfincluded. Because I’d been testing PINPal a bunch, I actually had severalhighly secure random passwords already memorized. It was a strange feelingto just respond to the scary password reset prompt with a new, highlysecure password and just continue on with my day secure in the knowledge Iwouldn't forget it. ↩




