CH

December 2, 2016

Software Maintenance Costs and Depreciation Schedules

Filed under: management, software development — Benjamin Vulpes @ 8:47 p.m.

From Mircea Popescu's "The planetary model of software development.":

mircea_popescu: ...It's starting to look to me as if software is in the same situation, every distinct item gravitating against the Sun of practice.
diana_coman: ...FWIW as experience: this structure of the bot which is quite sane still is actually at least the 2nd total re-write basically. Not because I started with an insane structure but because the first one got totally messed up when confronted with practice basically.

...

There exists a closest-safe distance from origin, given by the specific resistance of materials (ie, hardware, programming languages and other meta-tools), wherein the software presents as a molten core surrounded by a solid crust. Should such a planet move closer to the origin, through a rationalization of its design, it will thereby implode.

Implode, shred and smear into an accretion disk due to gravitational forces, whatever. Given a static problem, the solution to which delivers some utility, and a software proggy designed and built to solve that problem, the little proggy experiences gravitational stresses from edge case handling alone: at a certain point in the boiling-off-of-pointless complexity, software authors encounter the brick wall of edge cases and malformed inputs, and this forms the complexity floor for a piece of software handling static business requirements. The ideal programmer burns off absolutely everything not essential to the functioning of the thing, rejects inputs aggressively, and writes the whole system to do one single thing reliably. I imagine that Phuctor is a good example of this.

Now this proggy should run forever! Untouched! It's not as though the mathematics underpinning key factorization are changing anywhere near as quickly as a Javascript developer's taste in frameworks weathervanes around, are they?

Nevertheless, Stan finds himself condemned to rewrite the thing yearly. Because the problem changes, or someone wants to ship him SSH keys, or because the DB can't handle replication, or whatever. Merely that Stan is going to change things entails some amount of system rewrite, as it was designed to fit its task extremely narrowly, and will not readily embrace "just a little change". Mutating software to respond to a changing problem is expensive, time-consuming, and must be planned for. A program that must respond to feedback from its use in practice survives not just work performed upon it from going around and around its star of value at high speeds, but also (to stretch the analogy) highly local gravitational gradients as its managers and operators reshape it in realtime.

The model (incorrect though they all may be) does provide some utility in explaining observed phenomenon. On the docket today: depreciation schedules and maintenance costs.

A very rough treatment of depreciation: "the speed at which your shit rots". For freight-hauling trucks and other hardware like lathes and mills, this is largely a function of their time-in-use. As we crank bar stock through the lathe, its working parts experience all sorts of vibratory and static loads, the components deform (perhaps permanently), shit gets into the bearings, and entropy wins like she always does.

Surely software doesn't depreciate, though! It's code! It does the same thing every time! It can't wear out, and it certainly cannot break down! If it breaks, it can't be said to ever have actually worked in the first place, and that's a different case. But working software does not depreciate! Paul Niquette expounds at length on this idea in Software Does Not Fail(archived).

Fine, your shit doesn't stink. It doesn't rot, it doesn't change, but the world in which you wrote it does. Possibly you need to handle CRLF where before you expected to only ever catch a CR or LF; or your boss procured new colleagues for you; or the only person who understood that system went sessile and ate his brain. The list of changes that could affect the relationship between the business and the code it bought is not enumerable.

The business manager then has 2 extremely difficult questions to wrestle with when handling software: on what schedule shall I depreciate this asset, and how do I estimate its maintenance costs?

A few key factors drive software depreciation schedules: team turnover rate and employee makeup, correctness-ensuring infrastructure, and the speed at which the business demands that system respond to changing business requirements.

A high team turnover rate, coupled with an employee makeup such that new staffers take quite some time to get up to speed on system internals and aren't terrifically productive for even longer drives depreciation rates up. Managers work against this with various strategies: "we're a Perl shop", "we only hire the top 1% of applicants", "we only hire Stanford grads", but the actually useful strategies are impossibly difficult for a shop hiring commodity labor: hire smart people that get along with your existing team, who understand the programming languages and environments in which your systems are written, and for the love of fuck keep turnover low and by extension context switching as well. A mid-size team with a high turnover rate can result in systems that nobody in your organization understands how to work on. In this particular nightmare, it may be simpler to approximate the depreciation schedule as a function of how long it takes to turn your workforce over -- if nobody's around who understood how it worked in the first place, it may be worth the org's money to rewrite the thing (especially if your hiring pipeline is comprised of people chasing trends in javascript development...). Obviously, a team of three working on a given system for a decade or more will be negligibly bitten by this dynamic.

Correctness-ensuring infrastructure comes in many different forms: QA teams, a robust testing philosophy and the discipline to follow it, and type systems. Well, "type systems"; in practice this is a framework for ensuring the human-cogs all mesh together neatly like Java, Objective-C or C#.net and of course an overbearing IDE to combat the tendency of a mean-reverting population of programmers to write code that doesn't work. Smaller teams whose members think more highly of themselves may run on "strongly typed" languages like Haskell, and use that as a large part of their correctness-ensuring infrastructure. This infrastructure drives down both maintenance costs and the depreciation rate, by dropping developers into an environment in which even if they can't say for sure whether or not what they wrote will work, they have tooling to identify if they broke other parts of the system in pursuit of today's task.

Turnover and meta-tooling aside, possibly the largest factor in depreciation schedule is the speed at which the system must change to accomodate feedback from practice, and how much change it must swallow. If large new features need shipping on a regular basis, you must hire high-quality programmers and pay them well over a long term in order to keep that pace up. Should you be attempting to cram a shitload of changes through your system, you will probably need to staff out sideways, not necessarily hiring the biggest and most expensive guns, but smearing human horsepower across the feature attack surface to drag the whole thing together. The peasant vector field from Seven Samurai, if you will.

A system designed and implemented by smart folks, with a limited functional attack surface, for an organization with low turnover that doesn't need many changes after delivery or where the scope of changes is fairly well-constrained will have a very long depreciation schedule, possibly in excess of five years. A system not so much designed as stapled together ad-hoc out of JavaScript frameworks by a couple of Code Academy dropouts that accidentally finds itself with some venture capital and customers may need replacing within two years as the costs of maintenance balloon and pace of feature delivery retards.

Maintenance cost analysis concerns itself with a related problem: given an established software system that needs regular deployments (let us roll with the server-side application development model), and has an ongoing stream of new features and bug fixes applied to it, how does one get a handle on the relationship between the rate of feature development and bug patching and the dollars spent?

Maintenance encompasses (among other things): developing new features and completing tasks, fixing existing bugs, ensuring that the team has not introduced any new bugs or broken existing functionality, pushing the new code out to the servers where customers will use it, and mutation of database systems in support of new code (to cherry-pick just a few topics). Costs and project velocity are a function of team size and composition, the aforementioned correctness-ensuring harnesses (automated testing, mature programming languages), the cost of deployment, and development cycle time.

Team makeup and composition factors that drive maintenance costs: skill distribution among the team members; how quickly team members can perform tasks; how rigorously they test or decline to test their own work; with how much discipline team members handle the automated or manual testing process. Intelligent people who get shit done quickly and care about writing tests to cover both new bugs and new features are very valuable but only leading or a part of teams that share those values. Average schmucks who just want to collect a paycheck and do the least amount of work without getting fired will slow the team down, and set the foundation for an extremely sharp mean reversion if included in high-functioning teams. You can imagine the glacial speed with which entire teams comprised of such people might move, and the concomittant costs.

Cost and speed are also intimately affected by the correctness-harnessing infrastructure: test suites not only protect against shipping bad software, but they also accelerate developer cycle time and confidence in system correctness. Some GUI-heavy apps built with no consideration for automated testing may impose a 2 minute (or more!) cycle time, as a developer: compiles the app, boots it, logs in (is the person in question smart enough to hard code login credentials during testing? Rigorous enough to ensure those changes are never committed to the codebase?), and pokes it into the questionable state. Running the whole test suite might take 30 seconds, and running just a single test might be as fast as 10 seconds (it's entirely reasonable to shoot for 1 test per second, hardly a single backend system needs that, but ever since the last round of upgrades I've despaired of getting any sort of performance out of Xcode). A high rate of feedback with the system-under-hack is necessary not just to maintain the precious high-concentration state but also to keep dipshits from thinking they're excused to watch a cat video and to forget that their app is compiling, not to mention trashing all of the valuable cognitive state that goes into debugging years of spackled-on complexity.

Maintenance also entails delivering the damn code, the costs of which also vary strongly as a function of institutional history and bent. Some organizations (still!) deploy code manually to disastrous result: see the case of Knight Capital, where a manual update process missed a server and burned through nearly half a billion dollars. Other organizations deploy code to production servers automatically, every time that the canonical repository is updated and the tests pass. A given organization's site on the continuum from entirely-manual to entirely-automated code deployment is a solid data point that hints at the organization's ability (or at least one of its software arms) to effectively expend capital to reduce maintenance costs (but only if they haven't gone unnecessarily overboard with the capex. Not everyone needs a Chaos Gorilla). This dynamic explains quite a bit of Heroku and other IaaS companies success: "Instead of writing and maintaining a pile of shell scripts to mutate a set of servers in a data center somewhere (capital expenditure), we can rent servers from Amazon/Heroku/Google and pay a monthly fee atop that for their deployment abstractions!" For some orgs, this works really well. The breakeven point on Heroku is somewhere around 16.6KUSD/mo - 12 for the operations engineer, and 4 for the colocation fees. You can buy a lot of Heroku for 15K/mo. Whether you can rent a virtual server in a virtual server that Paul Graham is renting from Jeff Bezos without wanting to open your wrists and barf into the nearest liberal arts graduate's mouth is however, beyond the scope of today's piece.

We all want to build software that lasts, and runs unperturbed in a closet for decades. Failing that, we would like to be able to respond to changes in the real world as expediently as possible. The best solution is obviously the trivial one: the smartest person possible, working with his favorite tools, in a domain he knows intimately. Should we need to go to war with the budget and army we have and not the ones we wish that we had, some guidelines emerge: keep teams small and turnover low; invest (appropriately!) in correctness-ensuring infrastructure, be that tests or type systems; and automate deployments to the extent possible.

3 Comments »

  1. Matthew Bailin says:

    Hi Ben,

    I've been a fan of your blog for a few months now. I find your writing lucid, your message clear, and your code surprisingly readable despite my

  2. Benjamin Vulpes says:

    Matthew,

    Glad that you like it. Let me know if you want that comment fixed...

    Yours,
    Ben

  3. Matthew Bailin says:

    That would be lovely. Not sure why my comment cut off like that.

RSS feed for comments on this post. TrackBack URL

Reply

« vpatch adding multi-channel support for trinque's ircbot and logbot --- How to Fuck Up Without Being a Fuckup »