CH

February 7, 2016

V-tronics 101: A gentle introduction to The Most Serene Republic of Bitcoin's cryptographically-backed version control system

Filed under: V, common lisp, software development, tmsr — Benjamin Vulpes @ 12:00 a.m.
V-tronics 101: A gentle introduction to The Most Serene Republic of Bitcoin's cryptographically-backed version control system

As befits a properly revolutionary agglomeration of individuals, The Most Serene Republic of Bitcoin is so willing to throw out staples of the "modern development workflow" that one might go so far as to call us hidebound reactionaries lusting for the bad old days of subversion and CVS. The inherent appeal of watching not just enemies but also random idiots passing by squirming on the stake aside, V marks a sea change in how thinking people work on software together. I am not the first to note this.

What else has been written on the topic?

  • Mircea Popescu's original Ode to V (technically an ode to the Genesis Patch)
  • Mircea Popescu's The V Manual Genesis
  • Stanislav's original V
  • punkman's V
  • Shane's muchly-iterated upon V
  • my own shitty and not-working-properly-or-maybe-even-at-all V
  • phf's excellent patch flow visualizer

The operators of any of these V implementations may be found and interrogated politely in #trilema.

How does it work? What does it do?

If you have:

  • source code patches
  • signatures of those patches
  • public keys for those signatures

…V will 'press' a source tree from those patches, filtered by those signatures, and further filtered by the public keys. In other words, V delivers a hard guarantee that you (or anyone) can produce a source tree comprised only of code that people you trust authored or reviewed. This stands in stark contrast to building the "master" branch of the Power Ranger Bitcoin Client, which contains a zillion patches from people you've never heard of, much less that you trust to write code that will in point of fact touch your money.

V is a very simple beast. It loads (v)patches from one directory, detached signatures of those patches from a second directory, public keys from another directory, determines the correct order to apply those patches, and then does so.

Stan's original V shipped with 8 tools:

  • wot Lists out the keys V will use to filter the patchset. Distinct from the B,TMSR~ Web of Trust, the key dir represents the WoT that is trusted to produce a given reference implementation (RI) source tree.
  • roots Implicitly a function of the whole patches/seals/wot setup (as is the whole system), identifies and prints to the terminal all "root" patches (literarily, the patches upon which the entire patchset depends but that do not themselves depend on anything else, and technically any patch with a false antecedent).
  • antecedents Takes 1 argument, the patch for which to look up antecedents. Returns a list of patches upon which the given patch depends.
  • descendants Mirror of antecedents, returns patches that depend on its argument.
  • flow Prints the ordered list of all (valid, signed) patches that could be pressed.
  • press Takes one argument, the final patch V should apply to the RI source tree. Has the side effect of applying all patches returned by flow up to and including the patch passed in as an argument.
  • origin Takes one argument, the SHA512 hash of one of the files in the tree, and returns the patch that produced that file in that state.

vpatches and vdiff

The whole edifice is built upon a foundation of vpatches, which are produced by the hitherto unmentioned vdiff. I'll reproduce vdiff (another of Stan's productions) here, as it's brief enough to inline:

#!/bin/bash

diff -uNr $1 $2 | awk 'm = /^(---|\+\+\+)/{s="sha512sum \"" $2 "\" 2>/dev/null  " | getline x; if (s) { split(x, a, " "); o = a[1]; } else {o = "false";} print $0 " " o} !m { print $0 }'

Which, near enough as I can tell from reading the source, and as far as my experiments corroborate, mutates the output of the Unix diff tool to include hashes of each file as it is written to disk in directory a and as it is written to disk in directory b. The algo is as follows:

  1. diff the two directories
  2. Pipe the results of that diff into an awk command that:
    1. searches for either '—' or '+'
    2. runs sha512sum on the awk match immediately following
      Which you'll note is always the relative path to a specific file in the tree, and is guaranteed to be the filename because uncoerced awk splits on spaces
    3. assigns the first line from the output of sha512sum to the variable x
    4. if sha512sum returned a hash value for the file, replace the '—/+++ <filename> <timestamp>' line with a '—/+++ <filename> <hash>' line
  3. Write mutated Unix diff to standard out

Perhaps a worked example will make this a little bit more clear:

$ tree
.
-> a
--> a.txt
--> b.txt
-> b
--> a.txt
--> b.txt

2 directories, 4 files

$ diff -uNr a b
diff -uNr a/a.txt b/a.txt
--- a/a.txt     2016-02-07 05:41:46.000000000 +0000
+++ b/a.txt     2016-02-07 05:41:46.000000000 +0000
@@ -1 +1 @@
-foo
\ No newline at end of file
+bar
\ No newline at end of file

$ vdiff a b
diff -uNr a/a.txt b/a.txt
--- a/a.txt f7fbba6e0636f890e56fbbf3283e524c6fa3204ae298382d624741d0dc6638326e282c41be5e4254d8820772c5518a2c5a8c0c7f7eda19594a7eb539453e1ed7
+++ b/a.txt d82c4eb5261cb9c8aa9855edd67d1bd10482f41529858d925094d173fa662aa91ff39bc5b188615273484021dfb16fd8284cf684ccf0fc795be3aa2fc1e6c181
@@ -1 +1 @@
-foo
\ No newline at end of file
+bar
\ No newline at end of file

$ vdiff a b
diff -uNr a/a.txt b/a.txt
--- a/a.txt f7fbba6e0636f890e56fbbf3283e524c6fa3204ae298382d624741d0dc6638326e282c41be5e4254d8820772c5518a2c5a8c0c7f7eda19594a7eb539453e1ed7
+++ b/a.txt d82c4eb5261cb9c8aa9855edd67d1bd10482f41529858d925094d173fa662aa91ff39bc5b188615273484021dfb16fd8284cf684ccf0fc795be3aa2fc1e6c181
@@ -1 +1 @@
-foo
\ No newline at end of file
+bar
\ No newline at end of file
diff -uNr a/b.txt b/b.txt
--- a/b.txt 6ac1a0cc10e1337e14eaa72e33c0f64fa4d401f9b284c59142b595b7f1c7bf4f72da82f81c903845b242b837b237feacf7b43481187cb35f73c77b936f498188
+++ b/b.txt 67574bf4234031b94f660f9bfc5bf2b08e470d5c1fdc06a5cb67dc405c21c6be15a0a19df54fd81c50caf0a0650b5da0622f9a1108771329ebda8329d66ca5a6
@@ -1 +1 @@
-zanzibar
+zanzibare

$ tree
.
-> a
--> a.txt
--> b.txt
-> b
--> a.txt
--> b.txt
--> c.txt

2 directories, 5 files

$ vdiff a b
diff -uNr a/a.txt b/a.txt
--- a/a.txt f7fbba6e0636f890e56fbbf3283e524c6fa3204ae298382d624741d0dc6638326e282c41be5e4254d8820772c5518a2c5a8c0c7f7eda19594a7eb539453e1ed7
+++ b/a.txt d82c4eb5261cb9c8aa9855edd67d1bd10482f41529858d925094d173fa662aa91ff39bc5b188615273484021dfb16fd8284cf684ccf0fc795be3aa2fc1e6c181
@@ -1 +1 @@
-foo
\ No newline at end of file
+bar
\ No newline at end of file
diff -uNr a/b.txt b/b.txt
--- a/b.txt 6ac1a0cc10e1337e14eaa72e33c0f64fa4d401f9b284c59142b595b7f1c7bf4f72da82f81c903845b242b837b237feacf7b43481187cb35f73c77b936f498188
+++ b/b.txt 67574bf4234031b94f660f9bfc5bf2b08e470d5c1fdc06a5cb67dc405c21c6be15a0a19df54fd81c50caf0a0650b5da0622f9a1108771329ebda8329d66ca5a6
@@ -1 +1 @@
-zanzibar
+zanzibare
diff -uNr a/c.txt b/c.txt
--- a/c.txt false
+++ b/c.txt 5a7992b1a354c174e074b9a110561fca44d6e38f45e3f9dd80207f30139587ea00f2392ad285f2287f6033348b7190779d1f48c1628926c8da30e63b8eef1e5f
@@ -0,0 +1 @@
+trichome

$ sha512sum a/a.txt
f7fbba6e0636f890e56fbbf3283e524c6fa3204ae298382d624741d0dc6638326e282c41be5e4254d8820772c5518a2c5a8c0c7f7eda19594a7eb539453e1ed7  a/a.txt

sha512sum b/a.txt
d82c4eb5261cb9c8aa9855edd67d1bd10482f41529858d925094d173fa662aa91ff39bc5b188615273484021dfb16fd8284cf684ccf0fc795be3aa2fc1e6c181  b/a.txt

Of note:

  • identical files across trees are never mentioned
  • files in tree B but not A show 'false' for their antecedent hash value
  • vdiff output shows the 512-round SHA hash of a/a.txt and a/b.txt in the expected place in the diff

Complexities

There are 2 complex subsystems in a V implementation: the GPG integration, and the toposorting. Let's start with the simple one and move to the complex one.

GPG, amusingly to me, is actually the simpler of the two. The sole complexity in working with GPG is that it really really really wants to be a stateful program, going through "key importation" gyrations, and writing those keys to a special place on disk – the "keyring". The approach I take with my V-tron is to create a brand-spanking-new GPG "homedir" on every single run, and pass the location of that homedir to all calls to GPG on that run. New run? New GPG homedir. If you go so far as to use `mktemp`, you may even choose to eschew trashing the GPG homedir at the end of the run. The most important thing to note about working with GPG in the context of a V implementation is that persistence of state past a given run is categorically forbidden, and doubly so whenever touching GPG.

By far the most complex component of V is the toposort, and the program semantics supporting it. I'll include 2 implementations: one from Stan's original v.py and one from my own derpy lispy implementation.

Original first:

def toposort(unsorted):
    sorted = []
    unsorted = dict(unsorted)
    while unsorted:
        acyclic = False
        for node, edges in unsorted.items():
            for edge in edges:
                if edge in unsorted:
                    break
            else:
                acyclic = True
                del unsorted[node]
                sorted.append((node, edges))
        if not acyclic:
            fatal("Cyclic graph!")
    return sorted

Stan delivers a beautifully terse toposort implementation here. He leverages language features in both data structure and control flow, and the resulting code is compact and readable.

It takes a single argument, the unsorted list of patches, and returns the sorted list of patches. In between, it coerces the unsorted list to a dictionary (leveraging a language feature), and starts looping over the list of patches, setting the cyclicity flag to false on each run. While iterating over each patch, it also iterates over each patch's descendents (eg, the patches that depend upon it). If it finds any of the patch's descendents in the list of yet-unsorted patches, it terminates that iteration at the patch level, and moves on to the next patch. Should it fail to find any descendents in the list of yet-unsorted patches, this indicates that the current patch is a leaf node on the patch graph, can be appended to the list of sorted patches, deleted from the list of unsorted patches, and the acyclicity flag set to true for this iteration over the unsorted list. Predictably, this leaves the root node (assuming there's only one) for last, which means the list must be reversed once it's been constructed. Once the list of unsorted patches has been exhausted, the sorted list is complete and returned to the call site.

Should at any point the iteration over patches (nodes in the implementation) complete without setting the acyclicity flag, this indicates that the patchset is cyclic and invalid. Cyclic patch sets (eg patch A depending on B depending on C which in turn depends on A) will be impossible to press, and a V implementation must guard against it.

Now, for an exercise in humility. Since the staff rarely let me program anymore, and only in environments in which I was moderately well-practiced before self-promoting to the appropriate level for my own incompetence (which is to say shell scripting and Linux sysadminnery-cum-automation, Python webapps, and Clojure whatever), I like to do my work for B,TMSR~ in Common Lisp wherever possible. Clojure having been a wonderful learning experience, and myriad people to whom I look up claiming that 99% of other languages "features" to have been implemented in CL before I was even born, I strive for competence in it. Strive being the key word. I am also slaving to wrap my head around CLOS, the Common Lisp Object System, so there's a fair bit of object-oriented programming in there.

(defmethod toposort (unsorted)
  (let1
   (loop
      while unsorted
      with sorted = '() do
        (loop
           for node in unsorted
           for edges = (get-descendents node patches)
           with acyclic = nil
           do
             (block node
               (loop for e in edges do
                    (if (find e unsorted)
                        (return-from node)))
               (setf acyclic t
                     unsorted (remove node unsorted)
                     sorted (cons node sorted)))
           finally (assert (eq t acyclic)))
      finally
        (return sorted))))

I strove for a clean port of the previously-cited implementation to CL, but there are obviously some differences. Instead of contrasting differences, I'll simply describe how mine works.

The first thing done is to make a local copy of the unsorted patches – this makes referentially-transparent calls into the rest of the project significantly easier. Stan's program makes excellent and judicious use of global state, but I am nowhere near disciplined enough to do the same to good effect. After binding a copy of the unsorted list, the toposort implementation enters its main loop.

The main loop repeats for as long as there are patches in the unsorted list. Iteration over the list of patches sets the acyclicity flag and manages the mapping of patch descendents to graph edges in Stan's semantic map. With the node, edges, and cyclicity flags assigned, the toposort function enters the 'node' block, looping over the edges and returning from the 'node' block if any patch descendants remain in the unsorted list. Should the node block fail to find descendents in the unsorted list, it performs the same actions as the equivalent Python function, setting the acyclic flag to `t`, removing the node (patch) in question from the unsorted list, and consing the node (patch) onto the sorted list.

Yes, it conses furiously. No, I don't know why that's a bad thing, or even that it is a bad thing in this context. GAZE UPON MY IGNORANCE YE MIGHTY AND TREMBLE.

Finally (heh), the toposort function returns the sorted list.

Who should use it?

People who:

  • need a version control system built on a foundation of strong crypto
    In stark contrast to version control systems with strong crypto bolted on the side, as in the case of Git (and I assume DARCS, but this is probably just underinformed slander), a robust V implementation will perform all possible cryptographic operations before talking to the user, and should any such operation fail it will complain loudly and take no other actions.
  • need to publish changes to a codebase independently of any version control system
    This is entirely possible with V! vpatches can be applied by the unix patch tool with no modifications, and in an utterly worst-case scenario it should be entirely possible for anyone to eventually cough up a toposort, if they need it badly enough.
  • people who need guaranteed code evolution provenance
    While it's possible to dig through a Git history for the commit logs, what actual hard guarantee do I have that those logs have not been conjured forth from thin air to say whatever the conjurer wishes? What V brings the world is a system for collapsing signed patches into a source tree.

Who should write their own?

Anyone who intends to work on the Reference Implementation should reimplement V (and I suspect that most doing so already have). It is not a terrifically complex task, I knocked my first version out over the course of a weekend, and a weekend is a small price to pay for the knowledge so derived and to build the confidence of those you might hope some day to call your peers in your work.

To paraphrase a frequent utterance of Mircea Popescu, the era of writing lots of software is over. The Republic is now far more interested in well-vetted software of provably known provenance, reimplementations of extant tooling by members of the WoT, and perserverance in the face of nobody giving a shit about your life's work, than we are in "more tools", "better tools" or random code delivered by random nobodies.

Where do I get a copy?

Mine's not ready for public consumption, so you can't. There's an old version kicking around on various hdd's, but it's really really not suited for consumption, so I shan't link it again. I linked Stan's original above in my sources section.

Beyond that, I strongly suggest you start at The Foundation website. Search for "V:" and "vdiff.sh". Check signatures, and read the source. Then sit down, and write your own.

Summa

This is mostly a bridge between the ultra-technical and highly manual world of V operator tooling, and the rest of the world that might be capable of reading various V sources but can't be arsed for whatever reason. I'll leave the bombast and politics to those typically responsible for such.

Footnotes:

The man(?) may be one of the more succesfully anonymous contributors to B,TMSR~.

The very same who wrote the original ERC/gribble integration! Of note to noobs is that Phil sent that to me months before he ever started talking regularly in #bitcoin-assets. This eased his entry into the fold dramatically. Contrast it with polarbeard's recent contributions: a 2,600-long patch containing both an epic rewrite of the log messages and a nuking of the rather stupid 'log rotation' feature Satoshi's bitcoin shipped with, that I had to bereate him into splitting the useful part out of so that I could review it in isolation.

The PFV implies some amount of V implementation running behind the scenes and under the hood, but as of this date phf's declined to share the operating mechanics of the thing. His implementation calls out to a patched C GPG library instead of shelling out to the GPG binaries, which is particularly interesting.

There are no users of V. One may operate V for results, but one may not simply use a V implementation and expect it to work. It is not a cell phone to be used by people who have no business touching computers, it is far more akin to the cockpit of a glider with analog dials and physical levers upon which to push and pull for the extremely-well-versed-in-how-it's-actually-supposed-to-work-and-in-which-environments.

In the same way that there are no users of V, there is no single V. At this point in time, and as far as I can tell, everyone working on The Most Serene Republic's Reference Implementation also has their own implementation of V's base functionality. Stan has his own, Phil idem, Shane one that bears the imprimatur of The Foundation (by virtue of his signing a zillion copies of it), punkman his own variant descended from Stan's, and me my own cobbled-together version.

I know that this is misformatted. Thank you for your concern.

NB: this list is actually ascii-betically sorted before it's passed to this function.

My deep and abiding hatred of OOP stems from having learned it in systems that implement it miserably. Python and Ruby: I'm looking at you.

And in point of fact when rewriting my own V implementation was bit more times than I care to remember by careless bits of global state lying around.

The function `get-descendents` takes a node and a list of patches, and examines the patchset to find the node's descendents. This is done by sucking the patch-in-question's hash out of its body (read in during patchset instantiation), and regexing all patches bodies for that hash in the antecedent position (you may remember my referencing '—/+++' in describing vpatches: hashes in the '—' line are antecedent hashes, and hashes in the '+' line are descendent patches. `get-descendents` now takes the patchlist as an argument in a vain attempt to burn the errant global state out of this program.

Loop macro aside, this is probably the most notable divergence between Stan and my toposort implementations: I have to name the block from which I want to break out, and in Python breaking out of a loop is trivial.

  1. patches unsorted []

2 Comments »

  1. [...] At phf's prodding, I present in this post a genesis vpatch and corresponding signature for my Common Lisp implementation of asciilifeform's V. In case you've forgotten, V is a hard-crypto software source distribution tool that gives The Republic delightfully hard guarantees about who has endorsed what changes to a software project. Details are here: V-tronics 101. [...]

  2. [...] V is a hard crypto source distribution tool. Reimplementing a working V will demonstrate that you understand a foundational building block of our world. [...]

RSS feed for comments on this post. TrackBack URL

Reply

« vpatch adding multi-channel support for trinque's ircbot and logbot --- How to Fuck Up Without Being a Fuckup »