CH

January 3, 2017

How to Fuck Up Without Being a Fuckup

Filed under: management — Benjamin Vulpes @ 11:39 p.m.

You have fucked up. You know that you have fucked up. Ideally, you have figured this out before anyone else has but even if you haven't, this guide will still serve as a shovel with which you may cut stairs into the side of the hole of despair in which you find yourself.

First, determine whether or not you can fix your fuckup on your own without intervention, and how long that will take. If the fix is trivial and guaranteed to work, effect it and move on. Should it hint at taking enough time to fix that someone else might reasonably discover your mistake, get on the phone immediately, the conversation should sound something like: "Sorry boss, I hosed A in manner C, effects are Y, I am considering fixes P and Q, if you have something else to suggest please do so now otherwise I have to go clean up my mess". In the unlikely event that you fix your mistake without anyone spotting it, for the sake of honesty you should also tell your immediate superior, and in more or less the same fashion: "I hosed A by doing C, effects were Y. I fixed it with P, and the effect upon our shared goals is D. Apologies for the fuckup, ship's upright again." In all cases you must own your mess and its resolution, lest you develop the reputation for dodging either responsibility or work.

Your response to the unleashing of your immediate superior's rage in both the case where he's discovered that you've fucked up before you have and the case where he's still wrathful after the bleeding's staunched must be identical: eg to stand there, spine erect and accept whatever lashings of tongue or leather he decides to hand out. Under no circumstances may you engage in weepy self-flagellation, a reprehensible and cowardly behavior I've seen trotted out in an effort to bypass the critical phase of punishment.

Resolving your fuckup means your ass is getting whooped, and your having committed the crime disqualifies you on the spot from determining the appropriate degree of whoopin'. Either the universe or your superior officer will determine the appropriate punishment for your work, and you will bear it as best you can.

Weepily declaring that you're a piece of shit and so sorry and stupid &c. &c. is a blatant tactic to coerce the weak-willed into forgiving your crimes because you are obviously so contrite and already feel so bad. Insofar as you attempt to escape your due punishment, you are a coward1. It may have worked on your father, but it will never work on me. If you are not willing to bear the punishment from those to whom you report, you are not worthy to report to them. Moreover, your boss will forever resent you for stealing the yelling at you from him, and you'll forever suffer from not getting yelled at (although at least in America, your reputation in the company will erode further, and you'll simply "fail to thrive", as we say of babies who suckle vigorously but whose stressful home environs forestall secretion of enzymes humans need to digest breast milk. Yes, this means that maturation into any sort of respectable person entails a boatload of punishment either at the hands of the universe or your superiors).

Once you've taken your licks and fixed the problem, the final step in salvaging your reputation is conveying to everyone involved that you know the following things: a) the mistakes you made while fucking up b) what you should have done instead, and c) several classes of fuckup that having made this one has taught you to not make in the future.

It ain't hard: admit your mistakes, work your ass off to correct them, take your licks like a man, and demonstrate to everyone involved that you're a better person for the trials. This may make the difference between "ha ha, remember that one time you shared your passphrase with the world?" and "yeah no, you can't trust him with anything trickier than addition."

  1. I assume here you want to maintain good relations. If you're a thief or charlatan I have utterly no idea why you'd read this piece. Simply start running, and don't stop until you've found your next victim. []

December 2, 2016

Software Maintenance Costs and Depreciation Schedules

Filed under: management, software development — Benjamin Vulpes @ 8:47 p.m.

From Mircea Popescu's "The planetary model of software development.":

mircea_popescu: ...It's starting to look to me as if software is in the same situation, every distinct item gravitating against the Sun of practice.
diana_coman: ...FWIW as experience: this structure of the bot which is quite sane still is actually at least the 2nd total re-write basically. Not because I started with an insane structure but because the first one got totally messed up when confronted with practice basically.

...

There exists a closest-safe distance from origin, given by the specific resistance of materials (ie, hardware, programming languages and other meta-tools), wherein the software presents as a molten core surrounded by a solid crust. Should such a planet move closer to the origin, through a rationalization of its design, it will thereby implode.

Implode, shred and smear into an accretion disk due to gravitational forces, whatever. Given a static problem, the solution to which delivers some utility, and a software proggy designed and built to solve that problem, the little proggy experiences gravitational stresses from edge case handling alone: at a certain point in the boiling-off-of-pointless complexity, software authors encounter the brick wall of edge cases and malformed inputs, and this forms the complexity floor for a piece of software handling static business requirements. The ideal programmer burns off absolutely everything not essential to the functioning of the thing, rejects inputs aggressively, and writes the whole system to do one single thing reliably. I imagine that Phuctor is a good example of this.

Now this proggy should run forever! Untouched! It's not as though the mathematics underpinning key factorization are changing anywhere near as quickly as a Javascript developer's taste in frameworks weathervanes around, are they?

Nevertheless, Stan finds himself condemned to rewrite the thing yearly. Because the problem changes, or someone wants to ship him SSH keys, or because the DB can't handle replication, or whatever. Merely that Stan is going to change things entails some amount of system rewrite, as it was designed to fit its task extremely narrowly, and will not readily embrace "just a little change". Mutating software to respond to a changing problem is expensive, time-consuming, and must be planned for. A program that must respond to feedback from its use in practice survives not just work performed upon it from going around and around its star of value at high speeds, but also (to stretch the analogy) highly local gravitational gradients as its managers and operators reshape it in realtime.

The model (incorrect though they all may be) does provide some utility in explaining observed phenomenon. On the docket today: depreciation schedules and maintenance costs.

A very rough treatment of depreciation: "the speed at which your shit rots". For freight-hauling trucks and other hardware like lathes and mills, this is largely a function of their time-in-use. As we crank bar stock through the lathe, its working parts experience all sorts of vibratory and static loads, the components deform (perhaps permanently), shit gets into the bearings, and entropy wins like she always does.

Surely software doesn't depreciate, though! It's code! It does the same thing every time! It can't wear out, and it certainly cannot break down! If it breaks, it can't be said to ever have actually worked in the first place, and that's a different case. But working software does not depreciate! Paul Niquette expounds at length on this idea in Software Does Not Fail(archived).

Fine, your shit doesn't stink. It doesn't rot, it doesn't change, but the world in which you wrote it does. Possibly you need to handle CRLF where before you expected to only ever catch a CR or LF; or your boss procured new colleagues for you; or the only person who understood that system went sessile and ate his brain. The list of changes that could affect the relationship between the business and the code it bought is not enumerable.

The business manager then has 2 extremely difficult questions to wrestle with when handling software: on what schedule shall I depreciate this asset, and how do I estimate its maintenance costs?

A few key factors drive software depreciation schedules: team turnover rate and employee makeup, correctness-ensuring infrastructure, and the speed at which the business demands that system respond to changing business requirements.

A high team turnover rate, coupled with an employee makeup such that new staffers take quite some time to get up to speed on system internals and aren't terrifically productive for even longer drives depreciation rates up. Managers work against this with various strategies: "we're a Perl shop", "we only hire the top 1% of applicants", "we only hire Stanford grads", but the actually useful strategies are impossibly difficult for a shop hiring commodity labor: hire smart people that get along with your existing team, who understand the programming languages and environments in which your systems are written, and for the love of fuck keep turnover low and by extension context switching as well. A mid-size team with a high turnover rate can result in systems that nobody in your organization understands how to work on. In this particular nightmare, it may be simpler to approximate the depreciation schedule as a function of how long it takes to turn your workforce over -- if nobody's around who understood how it worked in the first place, it may be worth the org's money to rewrite the thing (especially if your hiring pipeline is comprised of people chasing trends in javascript development...). Obviously, a team of three working on a given system for a decade or more will be negligibly bitten by this dynamic.

Correctness-ensuring infrastructure comes in many different forms: QA teams, a robust testing philosophy and the discipline to follow it, and type systems. Well, "type systems"; in practice this is a framework for ensuring the human-cogs all mesh together neatly like Java, Objective-C or C#.net and of course an overbearing IDE to combat the tendency of a mean-reverting population of programmers to write code that doesn't work. Smaller teams whose members think more highly of themselves may run on "strongly typed" languages like Haskell, and use that as a large part of their correctness-ensuring infrastructure. This infrastructure drives down both maintenance costs and the depreciation rate, by dropping developers into an environment in which even if they can't say for sure whether or not what they wrote will work, they have tooling to identify if they broke other parts of the system in pursuit of today's task.

Turnover and meta-tooling aside, possibly the largest factor in depreciation schedule is the speed at which the system must change to accomodate feedback from practice, and how much change it must swallow. If large new features need shipping on a regular basis, you must hire high-quality programmers and pay them well over a long term in order to keep that pace up. Should you be attempting to cram a shitload of changes through your system, you will probably need to staff out sideways, not necessarily hiring the biggest and most expensive guns, but smearing human horsepower across the feature attack surface to drag the whole thing together. The peasant vector field from Seven Samurai, if you will.

A system designed and implemented by smart folks, with a limited functional attack surface, for an organization with low turnover that doesn't need many changes after delivery or where the scope of changes is fairly well-constrained will have a very long depreciation schedule, possibly in excess of five years. A system not so much designed as stapled together ad-hoc out of JavaScript frameworks by a couple of Code Academy dropouts that accidentally finds itself with some venture capital and customers may need replacing within two years as the costs of maintenance balloon and pace of feature delivery retards.

Maintenance cost analysis concerns itself with a related problem: given an established software system that needs regular deployments (let us roll with the server-side application development model), and has an ongoing stream of new features and bug fixes applied to it, how does one get a handle on the relationship between the rate of feature development and bug patching and the dollars spent?

Maintenance encompasses (among other things): developing new features and completing tasks, fixing existing bugs, ensuring that the team has not introduced any new bugs or broken existing functionality, pushing the new code out to the servers where customers will use it, and mutation of database systems in support of new code (to cherry-pick just a few topics). Costs and project velocity are a function of team size and composition, the aforementioned correctness-ensuring harnesses (automated testing, mature programming languages), the cost of deployment, and development cycle time.

Team makeup and composition factors that drive maintenance costs: skill distribution among the team members; how quickly team members can perform tasks; how rigorously they test or decline to test their own work; with how much discipline team members handle the automated or manual testing process. Intelligent people who get shit done quickly and care about writing tests to cover both new bugs and new features are very valuable but only leading or a part of teams that share those values. Average schmucks who just want to collect a paycheck and do the least amount of work without getting fired will slow the team down, and set the foundation for an extremely sharp mean reversion if included in high-functioning teams. You can imagine the glacial speed with which entire teams comprised of such people might move, and the concomittant costs.

Cost and speed are also intimately affected by the correctness-harnessing infrastructure: test suites not only protect against shipping bad software, but they also accelerate developer cycle time and confidence in system correctness. Some GUI-heavy apps built with no consideration for automated testing may impose a 2 minute (or more!) cycle time, as a developer: compiles the app, boots it, logs in (is the person in question smart enough to hard code login credentials during testing? Rigorous enough to ensure those changes are never committed to the codebase?), and pokes it into the questionable state. Running the whole test suite might take 30 seconds, and running just a single test might be as fast as 10 seconds (it's entirely reasonable to shoot for 1 test per second, hardly a single backend system needs that, but ever since the last round of upgrades I've despaired of getting any sort of performance out of Xcode). A high rate of feedback with the system-under-hack is necessary not just to maintain the precious high-concentration state but also to keep dipshits from thinking they're excused to watch a cat video and to forget that their app is compiling, not to mention trashing all of the valuable cognitive state that goes into debugging years of spackled-on complexity.

Maintenance also entails delivering the damn code, the costs of which also vary strongly as a function of institutional history and bent. Some organizations (still!) deploy code manually to disastrous result: see the case of Knight Capital, where a manual update process missed a server and burned through nearly half a billion dollars. Other organizations deploy code to production servers automatically, every time that the canonical repository is updated and the tests pass. A given organization's site on the continuum from entirely-manual to entirely-automated code deployment is a solid data point that hints at the organization's ability (or at least one of its software arms) to effectively expend capital to reduce maintenance costs (but only if they haven't gone unnecessarily overboard with the capex. Not everyone needs a Chaos Gorilla). This dynamic explains quite a bit of Heroku and other IaaS companies success: "Instead of writing and maintaining a pile of shell scripts to mutate a set of servers in a data center somewhere (capital expenditure), we can rent servers from Amazon/Heroku/Google and pay a monthly fee atop that for their deployment abstractions!" For some orgs, this works really well. The breakeven point on Heroku is somewhere around 16.6KUSD/mo - 12 for the operations engineer, and 4 for the colocation fees. You can buy a lot of Heroku for 15K/mo. Whether you can rent a virtual server in a virtual server that Paul Graham is renting from Jeff Bezos without wanting to open your wrists and barf into the nearest liberal arts graduate's mouth is however, beyond the scope of today's piece.

We all want to build software that lasts, and runs unperturbed in a closet for decades. Failing that, we would like to be able to respond to changes in the real world as expediently as possible. The best solution is obviously the trivial one: the smartest person possible, working with his favorite tools, in a domain he knows intimately. Should we need to go to war with the budget and army we have and not the ones we wish that we had, some guidelines emerge: keep teams small and turnover low; invest (appropriately!) in correctness-ensuring infrastructure, be that tests or type systems; and automate deployments to the extent possible.

---