CH

April 28, 2017

A Brief Foray Into The Swamp of TRB Boost Usage

Filed under: bitcoin — Benjamin Vulpes @ 5:13 p.m.

While I have learned little new in this sojurn, timely documentation and publication differentiates the Republic's research ratchet from the Empire's soft fecal matter of nominally peer-reviewed journals. To that end, I herein report on a short scouting trip into the borderlands of The Real Bitcoin's internals, mapping the edges of what a Boost excision might entail.

For those just tuning in, the Bitcoin reference implementation is written in the C++ "programming language". C++ is a notoriously impoverished tool, to the extent that for a great many years video game companies and other consumers of commodity programmer inventory built and maintained their own collection of abstractions to lighten the burden. "Boost", a collection of C++ libraries, eventually emerged from the primordial internet-soup in much the same fashion and with much the same result as other large open-source projects like Open Office, or The Gimp (which while adequate for some value thereof, discerning folks generally don't call "good"; little is in this life anyways, and libraries to patch over underlying language design failures or implementation lag are fundamentally unlikely to achieve the distinction).

Circa 2011, the C++ folks published a new version of the language, incorporating some Boost libraries wholesale, and reimplementing others. This release at least in theory provided the tooling for C++ programmers to build systems of non-trivial size without relying on Boost for niceties like...the BOOST_FOREACH construct; an iterator. That Bitcoin's author should have selected a software development toolchain of such poverty will forever be a stain upon his name (that and the fact that he thought Windows an adequate environment on which to build crypto-related software omfg), but his decision to lean on Boost can at least be understood in the context of working in C++ on a Microsoft machine before the C++11 release. Not excused, obviously, but understood.

Now "if we can, we must" replace Boost-isms with C++11-isms, but that's not what actually sent me down this rabbit hole in the first place. I'd been working on an only-tangentially-C++11 related patch to TRB and found myself in a position where I could either write a new function to use as a predicate in remove_if, or upgrade the codebase to C++11 and thereby buy myself lambdas (pre-11 C++ lacks anonymous functions) and iterators that could in theory replace at least a megatonne of BOOST_FOREACH's in the TRB codebase, and (hope springs eternal...) possibly even the use of Boost altogether.

Here is a vpatch (descending from `wires_rev1`) to compile TRB with C++11 semantics. It's 1% compiler, flags, 1% ambiguity resolution, 1% pointer type twiddling, and 97% addition of spaces to please GCC. I've not transmitted it to the mailing list, nor signed it yet as I'm not even convinced that it's useful, or an actual "decrufting" of the codebase on its own:

c11.vpatch

I wanted to know, of all the source files, which have references to boost, and how many.

$ find . -name "*.cpp" -o -name "*.h" | xargs grep -oi "boost" | awk -F ":" {'print $1'} | uniq -c | sort -nr

    116 ./test/util_tests.cpp
     65 ./bitcoinrpc.cpp
     53 ./serialize.h
     48 ./main.cpp
     43 ./json/json_spirit_reader_template.h
     34 ./json/json_spirit_value.h
     29 ./test/script_tests.cpp
     28 ./util.cpp
     27 ./test/DoS_tests.cpp
     25 ./net.cpp
     24 ./wallet.cpp
     16 ./util.h
     14 ./main.h
     13 ./test/Checkpoints_tests.cpp
     10 ./db.cpp
      9 ./test/base58_tests.cpp
      8 ./wallet.h
      8 ./init.cpp
      7 ./test/uint256_tests.cpp
      7 ./test/uint160_tests.cpp
      6 ./test/transaction_tests.cpp
      6 ./test/miner_tests.cpp
      6 ./test/base64_tests.cpp
      6 ./net.h
      5 ./script.cpp
      4 ./checkpoints.cpp
      2 ./test/test_bitcoin.cpp
      2 ./script.h
      2 ./noui.h
      2 ./json/json_spirit_writer.h
      2 ./json/json_spirit_writer.cpp
      2 ./json/json_spirit_reader.h
      2 ./json/json_spirit_reader.cpp
      2 ./headers.h
      1 ./keystore.cpp
      1 ./json/json_spirit_utils.h

To get a rough picture of the scope of the work that might be involved in a Boost excision, I removed Boost imports from `bitcoinrpc.cpp` (in TRB's source tree), and recompiled TRB. Output interspersed with notes follows:

bitcoinrpc.cpp: In function ‘int ReadHTTPStatus(std::basic_istream&)’:
bitcoinrpc.cpp:2082:5: error: ‘split’ is not a member of ‘boost’
     boost::split(vWords, str, boost::is_any_of(" "));
     ^
bitcoinrpc.cpp:2082:31: error: ‘is_any_of’ is not a member of ‘boost’
     boost::split(vWords, str, boost::is_any_of(" "));
                               ^

The code in question:

int ReadHTTPStatus(std::basic_istream<char>& stream)
{
    string str;
    getline(stream, str);
    vector<string> vWords;
    boost::split(vWords, str, boost::is_any_of(" "));
    if (vWords.size() < 2)
        return 500;
    return atoi(vWords[1].c_str());
}

As you can see, Boost is used in this context to split the string on spaces, ensure that there are at least two elements in the string, and return the second. Why it would not have sufficed to search the string for the space character and return the substring delimited by the first and second space, or first space and end of the string is not immediately apparent to me. C and C++ are full of caltrops in string handling which compounds when dealing with potentially external data. This nugget smells somewhat as though it were written by someone perhaps subconsciously afraid of the weaknesses of their tools, patching over that very important-to-listen-to suspicion by using what claims to be a well-reviewed and secure library for Getting Shit Done in C++.

Further up the call stack, ReadHTTPStatus is called from ReadHTTP, a function which is itself called on thread from within ThreadRPCServer2 and passed HTTP streams, and return value pointers for the callee to mutate while handling a request from connections that may or may not be authorized to poke the RPC interface. ReadHTTPS is also called from within CallRPC to get the HTTP response code for a call /to/ the RPC server. By my read, this particular pile of chairs can attempt to read data from the network if TRB is booted with the config variable `rpcallowip` bound to a public network interface and `rpcport` to an open port, but I invite correction on this point.

While slicing the Boost out of this call is far from intractable, it would entail adding several lines of string tokenization, not to mention the risks associated with handling strings from the net in C++.

Returning to the compilation failures:

bitcoinrpc.cpp: In function ‘int ReadHTTPHeader(std::basic_istream&, std::map, std::basic_string >&)’:
bitcoinrpc.cpp:2101:13: error: ‘trim’ is not a member of ‘boost’
             boost::trim(strHeader);
             ^
bitcoinrpc.cpp:2102:13: error: ‘to_lower’ is not a member of ‘boost’
             boost::to_lower(strHeader);
             ^
bitcoinrpc.cpp:2104:13: error: ‘trim’ is not a member of ‘boost’
             boost::trim(strValue);
             ^

Trimming and lower-casing strings is not an impossible task without Boost.

I always wanted to get into the guts of an "ad-hoc, informally specified, and bug-ridden" implementation of an HTTP (header) parser!

int ReadHTTPHeader(std::basic_istream<char>& stream, map<string, string>& mapHeadersRet)
{
    int nLen = 0;
    loop
    {
        string str;
        std::getline(stream, str);
        if (str.empty() || str == "\r")
            break;
        string::size_type nColon = str.find(":");
        if (nColon != string::npos)
        {
            string strHeader = str.substr(0, nColon);
            boost::trim(strHeader);
            boost::to_lower(strHeader);
            string strValue = str.substr(nColon+1);
            boost::trim(strValue);
            mapHeadersRet[strHeader] = strValue;
            if (strHeader == "content-length")
                nLen = atoi(strValue.c_str());
        }
    }
    return nLen;
}

Contrast the use of standard library functions in splitting on the colon in ReadHTTPHeader, with the Boost usage in ReadHTTPStatus to achieve approximately the same thing. boost::trim could be replaced with standard library calls, along with boost::to_lower.

Moving on to the truly hairy Boost stuffs:

bitcoinrpc.cpp: In function ‘bool HTTPAuthorized(std::map, std::basic_string >&)’:
bitcoinrpc.cpp:2142:47: error: ‘trim’ is not a member of ‘boost’
     string strUserPass64 = strAuth.substr(6); boost::trim(strUserPass64);
                                               ^
bitcoinrpc.cpp: In function ‘bool ClientAllowed(const string&)’:
bitcoinrpc.cpp:2191:23: error: ‘asio’ has not been declared
     if (strAddress == asio::ip::address_v4::loopback().to_string())
                       ^
bitcoinrpc.cpp: In function ‘void ThreadRPCServer2(void*)’:
bitcoinrpc.cpp:2247:5: error: ‘asio’ has not been declared
     asio::ip::address bindAddress = mapArgs.count("-rpcallowip") ? asio::ip::address_v4::any() : asio::ip::address_v4::loopback();
     ^
bitcoinrpc.cpp:2249:5: error: ‘asio’ has not been declared
     asio::io_service io_service;
     ^
bitcoinrpc.cpp:2250:5: error: ‘ip’ has not been declared
     ip::tcp::endpoint endpoint(bindAddress, GetArg("-rpcport", 8332));
     ^
bitcoinrpc.cpp:2251:5: error: ‘ip’ has not been declared
     ip::tcp::acceptor acceptor(io_service, endpoint);
     ^
bitcoinrpc.cpp:2253:5: error: ‘acceptor’ was not declared in this scope
     acceptor.set_option(boost::asio::ip::tcp::acceptor::reuse_address(true));
     ^
bitcoinrpc.cpp:2253:32: error: ‘boost::asio’ has not been declared
     acceptor.set_option(boost::asio::ip::tcp::acceptor::reuse_address(true));
                                ^
bitcoinrpc.cpp:2258:9: error: ‘ip’ has not been declared
         ip::tcp::iostream stream;
         ^
bitcoinrpc.cpp:2260:9: error: ‘ip’ has not been declared
         ip::tcp::endpoint peer;
         ^
bitcoinrpc.cpp:2262:26: error: ‘stream’ was not declared in this scope
         acceptor.accept(*stream.rdbuf(), peer);
                          ^
bitcoinrpc.cpp:2262:42: error: ‘peer’ was not declared in this scope
         acceptor.accept(*stream.rdbuf(), peer);
                                          ^
bitcoinrpc.cpp: In function ‘json_spirit::Object CallRPC(const string&, const Array&)’:
bitcoinrpc.cpp:2389:5: error: ‘ip’ has not been declared
     ip::tcp::iostream stream(GetArg("-rpcconnect", "127.0.0.1"), GetArg("-rpcport", "8332"));
     ^
bitcoinrpc.cpp:2390:9: error: ‘stream’ was not declared in this scope
     if (stream.fail())
         ^
bitcoinrpc.cpp:2401:5: error: ‘stream’ was not declared in this scope
     stream << strPost << std::flush;
     ^

asio::ip::address_v4::any(), loopback(), ip::tcp::endpoint, ip::tcp::acceptor, ip::tcp::iostream, and asio::io_service have no standard library alternatives (and aren't terrifically meaningful outside of their own context of "asynchronous input/output" anyways). References to the loopback interface could be excised, such that the user must specify allowed IP addresses at runtime. Cutting the io_service out will entail a rewrite of the JSON/RPC machinery as well. Relatedly, this is an excellent example of an explosion in system complexity and the dependency graph directly attributable to importing the HTTP stack, which in turn is directly attributable to the cultural blind spot that is "whaddaya mean, your meegroserbusses don't all talk over HTTP?"

So yes, count me among those who looked into trimming TRB into shape and concluded that a rewrite is unavoidable. Conveniently, with a reference implementation in hand, the rewrite is just lots of work. "Yesterday's Nobel is tomorrow's homework" or how did it go.

January 4, 2017

veh patch: overall improvements

Filed under: bitcoin, common lisp — Benjamin Vulpes @ 11:37 p.m.

V is not just a sharp tool, but a tool that sharpens the mind. Baking a vpatch is a political statement in a way that committing on a git tree and pushing those files to a server somewhere simply cannot be. "This does what it says on the tin." "This code bears my imprimatur." "I have thought carefully about the issues at hand and this is what I have done to address them." "I measured seven thousand times before I cut this here board." "All of these changes are necessary." etceteras.

This particular vpatch to my own implementation of V barely squeaks by with propriety: it contains just about as many changes as I could possibly batter the poor < 300 line file with. The user-facing interface has changed not one bit, but I've purged many of the "unixisms" with which the previous version was riddled. A particularly salient point was:

phf: the main folly are the unixisms all over the place. lisp works with a clear read/eval/print cycle. read means that you want to take outside input and convert it into a concrete data structure. so you shouldn't have a hash with strings in it. things like (string= "false" (gethash 'hash c)) should not happen so far down the call chain. your ~reader~ should convert the input data into a format that's easy to work with. the check could've been (if (not (hashed-path-hash c)) ...) because you ~reader~ should've already massaged it all into the kind of data computers understand. btcbase uses nil for empty hashes, you could have :empty or whatever, but certainly not carying strings and dictionaries all over the place.

Which directly informed a lot of the rewriting I put veh through.

Anyways, a patch of this size, filled with this many changes demands a point-by-point explanation. Onwards!

 (defun files (dir &optional suffix)
+  (declare (pathname dir))
   (let ((files (directory (pathname (if suffix
                                         (format nil "~A/*.~A" dir suffix)
                                         (format nil "~A/*.*" dir))))))
     files))

On one hand, "all functions must declare the types of their inputs". On the other hand, "Why do you need this type declaration? Why don't you exercise a modicum of discipline and always pass it a pathname?". In this patch, I've added type declarations in a few places because it constrains the possible states through which the program can move, and I am a fan of that.


 (define-condition external-program-error (error)
-  ((text :initarg :text :reader text)
-   (program :initarg :program :reader program)
+  ((program :initarg :program :reader program)
    (stderr :initarg :stderr :reader stderr)
-   (stdout :initarg :stdout :reader stdout)))
+   (stdout :initarg :stdout :reader stdout)
+   (cmd :initarg :cmd :reader cmd))
+  (:report
+   (lambda (condition stream)
+     (format stream "external command ~A failed with:~%standard output:~%~A~%standard error:~%~A~%"
+             (cmd condition)
+             (stdout condition)
+             (stderr condition)))))

In the previous version, errors from external programs did not carry *any* useful information to the REPL debugger. This is a less-broken use of the Common Lisp condition signaling machinery.


-(defun run (&key cmd (args '()) (input nil))
+(defun run (cmd args &key input)
   (let* ((e (make-string-output-stream))
          (o (make-string-output-stream))
          (p (run-program cmd args :search t
                          :error e :output o
-                         :input input)))
+                         :input input))
+         (error-string (get-output-stream-string e))
+         (output-string (get-output-stream-string o)))
     (if (= 0 (process-exit-code p))
-        (values p o e)
-        (let (( (get-output-stream-string e)))
-          (error 'external-program-error
-                 :text (format nil "program ~A called with args ~A failed with error:~%~A"
-                               cmd args s)
-                 :program p :stdout (get-output-stream-string o) :stderr (get-output-stream-string e))))))
+        (values p output-string error-string)
+        (error 'external-program-error
+               :program p :cmd cmd :stdout output-string :stderr error-string))))

The `run' function's argument list is now less wacky, redundant `get-output-stream-string' calls are gone, and `run' now returns the process, the string from stdout and the string from stderr. Now that i'm at the end of this leg of the odyssey, every time I see `values' I get this hankering to stick a struct or CLOS object in there to dial down the frequency of `multiple-value-bind' and ignore declarations that returning multiple values imposes everywhere.


 (defun rm (target)
-  (run :cmd "rm" :args `("-r" ,target)))
+  (run "rm" `("-r" ,target)))

 (defun mktempdir ()
   (let ((dir-string (format nil "/tmp/veh.~A" (gensym))))
-    (run :cmd "mkdir" :args `("-p" ,dir-string))
+    (run "mkdir" `("-p" ,dir-string))
     dir-string))

Above functions updated to use the new `run' signature.

 (defun sha512-file (path)
+  (declare (pathname path))
   (multiple-value-bind (p o e)
-      (run :cmd "sha512sum" :args `(,(namestring path)))
+      (run "sha512sum" `(,(namestring path)))
     (declare (ignore p e))
-    (let ((out-str (get-output-stream-string o)))
+    (let ((out-str o))
      (subseq out-str 0 (search " " out-str)))))

 (defun gpg (workdir &rest args)
-  (run :cmd "gpg" :args `("--no-permission-warning" "--homedir" ,workdir "--verbose" "--with-colons" ,@args)))
+  (run "gpg" `("--no-permission-warning" "--homedir" ,workdir "--verbose" "--with-colons" ,@args)))

 (defun import-keys! (gpgdir keydir)
   (let ((keypaths (mapcar 'namestring (files keydir))))
-    (run :cmd "gpg" :args `("--homedir" ,gpgdir "--batch" "--import" ,@keypaths) )))
+    (run "gpg" `("--homedir" ,gpgdir "--batch" "--import" ,@keypaths) )))

Various upgrades to use new signature for `run'.


-(define-condition bad-signature (external-program-error)
+(define-condition bad-signature (error)
   ((sig-path :initarg :sig-path :reader sig-path)
-   (patch-path :initarg :patch-path :reader patch-path)))
+   (patch-path :initarg :patch-path :reader patch-path))
+  (:report
+   (lambda (condition stream)
+     (format stream "signature verification failed for file ~A and signature ~A"
+             (sig-path condition)
+             (patch-path condition)))))

Another corrected condition definition.

 (defun verify (gpgdir sigpath blobpath)
   (handler-bind
@@ -61,14 +73,10 @@
         #'(lambda (c)
             (if
              (search "BAD signature" (slot-value c 'stderr))
-             (error 'bad-signature :text (format nil "bad signature at ~A for patch at ~A"
-                                                 sigpath blobpath)
-                    :sig-path sigpath :patch-path blobpath)))))
-    (let* ((proggy
-            (gpg gpgdir "--verify" sigpath blobpath)))
-      (if (= 0 (process-exit-code proggy))
-          t
-          nil))))
+             (error 'bad-signature
+                    :sig-path sigpath :patch-path blobpath)
+             (signal c)))))
+    (gpg gpgdir "--verify" sigpath blobpath)))

`verify' uses `run' properly, and has output strings available to it trivially; this cleaned up some cruft. Also, explicitly re-signals the condition signaled by `run' if the error in question is not a bad signature.

+(defstruct hashed-path hash path)

 (defmethod parents-hashes ((p patch))
-  (loop for parent in (parents p) collecting (gethash 'hash parent)))
+  (loop for parent in (parents p) collecting (hashed-path-hash parent)))

 (defmethod childrens-hashes ((p patch))
-  (loop for child in (children p) collecting (gethash 'hash child)))
+  (loop for child in (children p) collecting (hashed-path-hash child)))
+

Here is where the "read all data in and convert it to proper data structures as soon as possible" rewrite starts to raise its head. The `defstruct' macro makes accessors for the hash and the path, hence the terseness with which this particular snippet migrated from dictionaries to structs.

+
+(defun parse-vpdata (vpd1 vpd2)
+  (let* ((v1-data (split " " vpd1))
+         (v2-data (split " " vpd2))
+         (path-1 (nth 1 v1-data))
+         (path-2 (nth 1 v2-data)))
+    (loop
+       for a in (rest (pathname-directory path-1))
+       for b in (rest (pathname-directory path-2))
+       when (string= a b) collect a into dir
+       finally
+         (return
+           (make-hashed-path
+            :hash (let ((h (nth 2 v1-data)))
+                    (if (string= "false" h) nil h))
+            :path (make-pathname :name (pathname-name path-1) :type (pathname-type path-1)
+                                 :directory `(:relative ,@dir)))))))

-(defun vpdata (patch-string regex)
-  "Looks up the hashes for a given patch. Supplied regex must have 2 registers and do the correct thing."
-  (let ((result (list)))
-    (cl-ppcre:do-register-groups (path hash)
-        ((create-scanner regex :multi-line-mode t)
-         patch-string)
-      (let ((temp-ht (make-hash-table)))
-        (setf (gethash 'path temp-ht) path)
-        (setf (gethash 'hash temp-ht) hash)
-        (pushnew temp-ht result)))
-    result))
+(defun vpdata (patch-string)
+  (declare (string patch-string))
+  (loop
+     for p in (all-matches-as-strings
+               (create-scanner "^--- (\\S+) (\\S+)$" :multi-line-mode t)
+               patch-string)
+     for c in (all-matches-as-strings
+               (create-scanner "^\\+\\+\\+ (\\S+) (\\S+)$" :multi-line-mode t)
+               patch-string)
+     collecting (parse-vpdata p c) into ps
+     collecting (parse-vpdata c p) into cs
+     finally (return (values cs ps))))

`vpdata' now conforms a lot more closely to the REPL paradigm, calling `parse-vpdata' to parse the vpatch data into useful CL data structures very early in the operation lifecycle. `parse-vpdata' contains what I consider veh's last sin, which is to store hashes as strings. It is an acceptable sin for the meantime, as the radioactive API surface area is all around the comparison of hashes as provided to veh in string form by various external tools, eg `vpatch' and `sha512sum'. As Ironclad (the One True Common Lisp Crypto Package) doesn't actually work as distributed via Quicklisp, I'm leaving these stinky string comparisons in place as warning flags around a sinkhole in the program.

I also cleaned up the regexing in`vpdata', dropped the stateful `pushnew' and `do-register-groups' crap in favor of the higher-level `all-matches-as-strings'.


 (defun vdir (root subdir)
   (make-pathname :directory (list :relative root subdir)))
@@ -109,7 +131,7 @@
     (let ((patch-contents (make-string (file-length patch-stream))))
       (read-sequence patch-contents patch-stream)
       (multiple-value-bind (cs ps)
-          (calc-parents-and-children patch-contents)
+          (vpdata patch-contents)
         (make-instance
          'patch
          :path path :body patch-contents
@@ -119,11 +141,12 @@
   (loop for c in (children patch)
      with r = '()
      do
-       (let* ((h (gethash 'hash c))
+       (let* ((h (hashed-path-hash c))
               (match (find h all-patches :test
                            (lambda (hash patch)
-                             (member hash (parents-hashes patch)
-                                     :test #'equal)))))
+                             (and hash
+                                  (member hash (parents-hashes patch)
+                                          :test #'equal))))))
          (when match (pushnew match r :test #'equal)))
      finally (return r)))

`vdir' now uses the simplified and improved `vpdata' instead of `calc-parents-and-children', and accesses the structs produced by the new `vpdata' as one accesses structs instead of hash-maps.

`get-antecedents' and `get-descendents' now access hashed-path structs properly, and now test for the existence of a hash in hashed-path structs when calculating antecedents and descendents.

+(defun patch-seals (patch seals)
+  (remove-if-not
+   #'(lambda (s)
+       (search (file-namestring (path patch)) (file-namestring s)))
+   seals))
+
 (defun patch-signed? (patch seals gpgdir)
   (let* ((p (path patch))
-         (seals (remove-if-not #'(lambda (s) (search (file-namestring p) (file-namestring s))) seals)))
-   (loop for s in seals
-      thereis (verify gpgdir (namestring s) (namestring p)))))
+         (seals (patch-seals patch seals)))
+    (when seals
+     (loop for s in seals
+        always (verify gpgdir (namestring s) (namestring p))))))

I moved logic filtering out the list of seals down to the set relevant to a patch into its own function, `patch-seals', and ensure that `patch-signed?' only returns true if `patch-signed?' is provided a list of seals.1


 (defun press-patch (patch target)
   (let ((patch-location (namestring (path patch)))
         (target-dir-string (namestring target)))
-    (run :cmd "patch" :args `("-E" "-d" ,target-dir-string "-p1" "-F" "0")
-         :input patch-location)
+    (run  "patch" `("-E" "-d" ,target-dir-string "-p1" "-F0") :input patch-location)
     (loop for c in (children patch) do
-         (let ((hash (gethash 'hash c))
-               (patched-pathname
-                (pathname (concatenate 'string target-dir-string (subseq (gethash 'path c) 2)))))
-           (if (not (string= "false" (gethash 'hash c)))
-               (let ((computed-hash (sha512-file patched-pathname)))
-                 (assert (string= hash computed-hash) nil
-                         "~S does not match vpatch hash of ~S for patch ~A and file ~A"
-                         computed-hash hash (file-namestring (path patch)) patched-pathname))
+         (let* ((hash (hashed-path-hash c))
+                (path (hashed-path-path c)))
+           (if (hashed-path-hash c)
+               (assert (string= hash (sha512-file (merge-pathnames path target))) nil
+                       "hash verification failed for patch ~A and file ~A"
+                       (file-namestring (path patch)) (file-namestring path))
                (assert (not (probe-file
-                             (pathname (concatenate 'string target-dir-string (subseq (gethash 'path c) 2)))))))))))
+                             (merge-pathnames path target)))))))))

 ;; user interface

@@ -202,12 +230,10 @@
       (import-keys! gpgdir wotdir)
       (when (probe-file (namestring target))
         (rm (namestring target)))
-      (run :cmd "mkdir" :args `("-p" ,(namestring target)))
+      (run "mkdir" `("-p" ,(namestring target)))
       (let* ((sorted-and-filtered-patches
-              (toposort (filter-patches all-patches seals gpgdir)))
+              (toposort (signed-patches all-patches seals gpgdir)))
              (headpos (position headpatch sorted-and-filtered-patches)))
-        (when (null headpos)
-          (error "requested head bears no signature corresponding to a key in .wot"))
         (loop for p in (subseq sorted-and-filtered-patches 0 (+ 1 headpos)) do
              (press-patch p target))))
     (when teardown
@@ -215,13 +241,16 @@
     t))

All of the unixisms previously sited in `press-patch' are now way further up the call chain, and transform strings into data structures. `press-patch' is significantly leaner and more legible, and leans heavily on the Common Lisp pathname interface. I renamed `filter-patches' to `signed-patches' to better reflect the filter that it performs.


 (defun flow (&key press-root)
-  (let ((patches (toposort
-                  (mapcar #'make-patch (files (vdir press-root "patches")))))
-        (wot (vdir press-root ".wot"))
-        (gpgdir (mktempdir)))
-    (import-keys! gpgdir wot)
-    (format t "~{~A~%~}" (mapcar (lambda (p) (file-namestring (path p)))
-                                 patches))))
+  (let* ((gpgdir (mktempdir))
+         (wotdir (vdir press-root ".wot"))
+         (seals (files (vdir press-root ".seals"))))
+    (import-keys! gpgdir wotdir)
+    (let ((patches (toposort
+                    (signed-patches
+                     (mapcar #'make-patch (files (vdir press-root "patches")))
+                     seals gpgdir))))
+     (format t "~{~A~%~}" (mapcar (lambda (p) (file-namestring (path p)))
+                                  patches)))))

Flow now filters out unsigned patches.

 (defun wot (&key press-root)
-  (format
-   t "~{~A~%~}"
-   (mapcar #'file-namestring (files (vdir press-root ".wot")))))
+  (let ((gpgdir (mktempdir))
+        (keydir (vdir press-root ".wot")))
+    (import-keys! gpgdir keydir)
+    (multiple-value-bind
+          (p o e)
+        (gpg gpgdir "--list-public-keys")
+      (declare (ignore p e))
+      (format t "~{~{~A: ~A~}~%~}"
+       (loop for i in (split #\newline o)
+          when (string= "pub" (first (split ":" i)))
+          collect
+            (list
+             (nth 4 (split ":" i))
+             (nth 9 (split ":" i))))))))

`wot' now shells out to GPG and prints the fingerprint and user string for each key.

Aaand that's this vpatch! Lots of changes, mostly to improve Lisp semantics.

2017_cleanup.vpatch
2017_cleanup.vpatch.ben_vulpes.sig

  1. Also, please enjoy the following chapter from Annals of the Empty Set:

    VEH> (loop for i in nil always i)
    T
    

    []

How to (actually) "learn programming"

Filed under: bitcoin, software development, tmsr — Benjamin Vulpes @ 3:03 a.m.

More or less the same way you learn any other very complicated craft with oodles of knowledge both formalized and oral: by finding the most strict and knowledgeable master you can, and slaving for him as best you can for as long as you can tolerate it. Proper apprenticeships are an unlikely model in the States, as everyone with 9 months of React under their belt expects 140KUSD per annum and a title, but you wanted to know how to actually learn programming.

Most masters that you'll find in the wild world of shartups are neither particularly masterful nor particularly willing to entertain your novicehood. This manifests in "industry" (to the extent that building javascript webapps might be called industry) as "software engineers" (lacking the "senior" honorific) train "junior software engineers" inserted into their organization by the Diversity Machine. This is not the sort of master you'll learn much useful from, regardless of what you think of the type of master you'd like to learn from.

Since you'll not find anyone to beat 40 years of slapdash hacks into your head on the shartup circuit, you're stuck learning from the cruel, busy, cryptic and reluctant peers of The Republic, who won't be particularly useful on the curricula front.

Reading:

- Applied Cryptography, Bruce Schneier (first edition)
Read the first edition, with the red BLUE (Red is the bullshit version. Kudos to mod6 for the catch.) cover. Schneier redacted all of the actual goodies so that he might land a job with people who find that kind of behavior appealing and not appalling.

- Common Lisp the Language, 2nd Edition, Guy Steele
The peers have largely settled on Common Lisp as a programming lingua franca. It's an entirely adequate language, featuring ~everything you'll find in "modern" programming languages like PHP or Python. While I'm not convinced that one can "learn programming" in any other way than by building things and practicing constantly and with a relentless eye towards self improvement, reading this book won't hurt you (too much).

Exercises:

- generate and secure GPG keys

This is the single most important task for anyone who intends to join The Republic. You must learn what it means to generate keys securely, how to use them securely, enumerate the kinds of threat you wish to secure your keys against, and then effect a system that tends to all of these needs.

You also must establish and practice your backup and restoration process for these keys. Everything dies, including computer hardware, so you must ensure that you do not fail to maintain access to the anchor to reality and key to the door of The Republic's forum.

- set up and operate a virtual server

While I cannot recommend that you make a permanent home in a virtualized server on someone else's hardware, you need a persistent Linux box that can do...things. It more or less doesn't matter which Linux you settle on if you're reading this for advice, but you should operate under the assumptions that a) you'll be relegating the machine to the dustbin at some point and b) you'll probably want to change Linux distributions as well.

- set up an IRC bouncer

If you have the remotest dream of anyone in The Most Serene Republic of Bitcoin giving a shit about you and your problems, you'll quickly discover the importance of maintaining your own connection to the forum and not annoying the peers by reconnecting constantly. Establishing and maintaining a persistent and robust IRC connection will teach you much about the Linux and IRC client you've chosen to operate.

- set up a blog

Recount your travails in "learning programming". Muse in public. Offer your thoughts that others may know them and contradict you. This is as close as you'll get to "having a master", so have opinions, be ready to defend them, and prepare to accept that you're wrong. Don't neglect your comment system and for the love of all that is holy don't outsource it.

- operate a server

There are many ways to get into operating your own hardware, and many tradeoffs to make in the hardware procurement project. Migrating from virtualized servers to your own metal in a datacenter somewhere will illuminate all sorts of dusty corners in your head where the advocates of feeding the world with McDonald's hide the assumptions they programmed you with as a child. This project will acquaint you with the engineering tradeoffs with which programmering as a career is rife.

- run a "The Real Bitcoin" node

Once you've grown into your own hardware and have at least 5GB of RAM and 200GB of disk to spare, consider operating a TRB node. TRB is downright finicky in constrained and virtualized environments, and you're on a course to digital literacy and self-sufficiency.

Projects:

- extend Diana Coman's "foxybot", a bot for Mircea Popescu's MMORPG Eulora

MP runs an MMORPG and encourages players to automate their activities in it. Diana Coman, the current project lead/developer (do forgive the possibly-insulting title) maintains and extends both the game's codebase and that of it's dominant bot "foxybot". The link to foxybot above has a list of features the playerbase would like to see implemented.

Working in this environment will teach you about the wonders of C++ and Crystalspace; a programming language with which one must be conversant but that is not particularly...good, and a "game development engine" that isn't as loathsome as other engines respectively.

- (re)implement V

V is a hard crypto source distribution tool. Reimplementing a working V will demonstrate that you understand a foundational building block of our world.

- make a Lamport Parachute

Stan says it all, go read it.

- operate an IRC bot

trinque and I (but mostly trinque) have put some work into a Common Lisp IRC bot. Stand one up and keep it up.

- build and host a log viewer

If you're already operating an IRC bot (and when we've made it so easy for you to do so, not doing so begins to look a bit lazy), you may contribute to The Republic's own form of distributed redundancy: many different implementations of core functionality -- in this case, log viewers. Public logs civilize the chaos and noise of IRC, and cross-referencing upgrades logs to Talmudic stature. phf hosts the canonical logs at http://btcbase.org/log , I host a set at http://bvulpes.com/logs , and Framedragger hosts a set at http://log.mkj.lt/trilema/today .

This project will acquaint you with the miseries of building wwwtronic software. Implementing search and cross-referencing will teach you even more.

Coda

There is no point to "learn programming" if you're just going to further the works of evil by battling for the empire's hegemony with JavaScript and "mobile apps". If all you desire is a good job and enough money to pay for beer, food, and box for your meat so that you may attract a girl in her late thirties who's looking to settle, go sign up for your local code school, capitalize on their placement program, and settle down to devour your brain elsewhere. The Republic will continue to fight without you, ensuring access to strong cryptography (see: FUCKGOATS, the only high-grade entropy source on the market in the whole world), and a Bitcoin implementation that keeps pestilential currency-fascists and -devaluators at bay.

The reading section is currently woefully incomplete, indicative of both the reading I've done in the field and what I consider the utility of various "programming books". Suggestions welcome.

December 22, 2016

veh.lisp genesis.vpatch

Filed under: bitcoin, software development, tmsr — Benjamin Vulpes @ 11:06 p.m.

At phf's prodding, I present in this post a genesis vpatch and corresponding signature for my Common Lisp implementation of asciilifeform's V. In case you've forgotten, V is a hard-crypto software source distribution tool that gives The Republic delightfully hard guarantees about who has endorsed what changes to a software project. Details are here: V-tronics 101.

It is useful for code-savvy folks in The Republic to reimplement basic tools like this. Multiple implementations of an ambiguous specification provide far more value than the "many eyes" mantra of open source advocates. For example, an implementation in Python might burn the eyes of a Perl hacker, and the Perl be entirely inscrutable to a man who's never touched it before, and even were such a man to sit down and learn Python for the purpose of auditing another's V implementation, it is in no way obvious that the time cost of his learning the language combined with the risk that he misses details in the audit is a better resource expenditure than simply implementing the tool again in his language of choice. Multiple implementations provide the Republic defense in depth, in stark contrast to the Silicon Valley software monocultures, and demonstrate to the Peers that the authors understand the goals and subtleties of the project in question.

phf did not just prod me to post my implementation, however. The charges are serious, so allow me to quote in full:

phf: ben_vulpes: this subthread since your response to my original statement is one example of what i'm talking about. in this case none of the v implementations are on btcbase, because nobody wants to sign own hacks, because the cost of failure is too high.

For an example of just how this notion that "the cost of failure is too high" came to be:

mircea_popescu: to put it in you'll have to sign it. if it turns out later to have a hole, people will negrate you.

To contextualize phf's comment properly: the man set up a spiffy loggotron (the one I cite here constantly, actually) and then hared off to the desert for a few weeks without ironing some stability issues out first, which left us without logs for a bothersome amount of time. While kicking a process over may be acceptable (in some contexts, on the deficit budget the Republic operates), that style of process monitoring and uptime insurance only works if someone is available to restart the process in question whenever it goes down. Which it wasn't, and for which he was roundly scolded upon his return.

So yes, the reputational costs of operating critical infrastructure (in phf's case, the canonical log of the Forum's dealings) for The Republic and then letting that infrastructure fail is rather steep. Note, however, that he has since ironed the stability issues out and the whole episode largely left behind. No negative ratings were issued as a result, that's for damn sure.

The brouhaha that kicked off my rewrite of my V implementation is barely worth going into1 but for four details: the discovered bug was not a hole, but required that an operator attempt an action actively harmful to their own health; the implementation's author fixed the problem in short order; was already a member in good standing of the #trilema Web of Trust; and the issue was discovered by members of the Republic and not leveraged into an attack.

Much of the Republic's otherwise incomprehensible-to-outsiders behavior may be chalked up to precisely this sort of "trust building exercise", and there is no way to build a nation of men but this way. A strong reputation buttresses a sapper against charges of treason, leaving space for the WoT to entertain the notion that the sapper is not treasonous but has merely made a mistake. Moreover, fear of failure's repercussions must always be evaluated and mitigated in the same way that one performs security analyses: "What are the downsides here? How might these changes fuck my wotmates? How pissed could they reasonably get at me for hosing them thusly? How would I respond to allegations of treason?" Not that anyone's on the stand for such, but one must entertain the gedankenexperiment.

So, in the spirit of:

phf: but the reason i made those statements yesterday is because i think that like saying things in log is an opportunity to be corrected, so does posting a vpatch, it could be a learning experience. instead the mindset seems to be http://btcbase.org/log/2016-02-20#1411214
a111: Logged on 2016-02-20 22:45 phf: "i, ordained computron mircea lifeform, in the year of our republic 1932, having devoted three months to the verification of checksums, with my heart pure and my press clean, sit down to transcribe vee patch ascii_limits_hack, signed by the illustrious asciilifeform, mircea_popescu, ben_vulpes and all saints."

I am proud to publish a genesis vpatch for my own V implementation in Common Lisp. It is a "harem-v" (which is to say a V implementation that this individual uses in the privacy of his own workshop, and may not suit your needs or even desires), but I daresay that it is correct in the important places. Even if it is wildly incorrect in those important places, it demonstrates quite completely that The Republic outperforms classic "open source" communities by reproducing and spot-checking each other's work instead of pretending to read it and only ever actually grepping for hateful words in order to be a most respectably-woke Github contributor. I also offer it in the spirit of the above log line: to seek correction and feedback on best practices from peers more competent with the parens than myself.

genesis.vpatch
genesis.vpatch.sig

Updated 12/27/2016 with hashes_and_errors.vpatch

hashes_and_errors.vpatch
hashes_and_errors.vpatch.ben_vulpes.sig

One simplification from v.pl and v.py that I made in this implementation is that I iterate naïvely through all of the signatures (until one is found that verifies) when confirming that a patch has a signature from wot-members, rather than sorting the list of patches and signatures and making assumptions about patch/signature naming. This slows `press' operations down significantly, but `flow' calculations complete nearly instantly.

Enjoy! If you find anything heinously wrong, do let me know. I shan't be falling on my sword over it, but I will fix it if you can show me that it is in fact broken.

Updated 01/04/2017 with 2017_cleanup.vpatch

2017_cleanup.vpatch
2017_cleanup.vpatch.ben_vulpes.sig

  1. tl;dr: a V implementation was willing to press to heads for which it had no signatures. Its author has since remedied that. []

November 18, 2016

How to (Actually) Process Bitcoin Transactions

Filed under: bitcoin — Benjamin Vulpes @ 8:29 a.m.

James Stanley developed opinions about how to handle bitcoins while building a nifty, supposedly anonymous, Bitcoin/telephony bridge. Unfortunately, he labors outside the walls of the Most Serene Republic of Bitcoin, and so he's missing the typical 30-point bump in IQ that can be had for simply reading the logs1. As they say, "build one to throw away"--and it's never truer than if you don't know what you're building in the first place2.

My first treatment apprently didn't expound on the subject in adequate depth, and so here I am again: correcting the internet. This guide is less "do these three simple things" and more "do these two impossible things". What, you wanted to learn something, didn't you?

Assumptions:
- you (can) run a full TRB node
- you can extract transaction data from raw blocks

Running a full TRB node is table stakes for this game. It means getting up to speed with "V", The Republic's software distribution tool, and running the daemon you'll compile with mod6's offline build system on a semi-serious piece of metal. The most-public node that I run consumes over 5.7 gigs of RAM, and its current uptime is ~48 days3. Use a process supervisor (I'll leave writing up the details to trinque, but start with runit, and its accompanying log service svlogd. A TRB node will shit several gigs of log lines in a month, so do consider accounting for the disk consumption.

Once your node runs reliably, you need to sync it. If you don't already have a synced node, there's no way around simply downloading the whole blockchain and verifying it sequentially (pity the poor Ethereum bagholders, whose blockchain is over 75 (!) gigs at this point4. ). If you're a lucky sod, and are running a previously-synced 0.5.X node, you can simply copy the bitcoin data directory to your new box and start the new bitcoind process over there. Do remember to stop the synced node first, lest you bring down the data-integrity gods' wrath upon your head. Given the times involved, if you have a 0.6.X, and possibly a 0.7.X series node running and synced, consider pulling the same trick. I have no idea if it'll work, but I am eager for reports on that kind of science.

With node in hand, you may now proceed to dissect blocks (if you've pressed a standard TRB, it has a 'dumpblock' command. Go forth and figure out how to use it). I've already written up how to extract block headers, and while I failed to beat the "binary-types" library into cleanly parsing the binary blocks (see "binary-types:define-binary-class bitcoin-block-header" for what a clean binary definition looks like in that DSL), I turned around and implemented all of the slicing code behind mimisbrunnr in just a few days once I decided to do it by hand and forgo the bijective transform5. This code is not publically available, but if you show up in #trilema, get in the WoT and ask nicely, I might give you a copy.

Given the ability to slice blocks into headers and transactions, you may now begin to receive transactions. Implement a system that inspects each new block (or the nth-most recent block, for n such that you can sleep at night in the face of reorgs, forklets, and double-spends) for transactions paying to your receipt address -- the hasty sapper might simply grep output scripts for their address, but they'd take on the risk of false positives if the address is discarded during the script evaluation process. When users want to deposit funds, take the amount they wish to deposit and increase that by a number of satoshis small enough that your customers won't gripe, and with enough decimal places to serve your client base (whatever that means in your business). Upon receipt of funds, credit your customer's account. Scale this by increasing the nonce range and adding additional receipt addresses into your pool. Combinatorics are your ally here.

This is not a trivial project, but in undertaking it you'll learn rather a lot about how poorly Bitcoin is really written under the hood, rather than pulling the open-source wool over your eyes and pretending that everything is okay. Remember, Bitcoin is a spectacularly shitty prototype, but it's still the only thing game in town. If you value your future, learn how this thing works.

  1. Note that this is just a typical bump. Human variance is wide, doncha know. []
  2. I can't imagine anyone who doesn't fuck with Bitcoin regularly already wanting to stick their head deep enough into the snakes nest to make heads or tails of it to the point of confidence in building an automated transaction system on a webapp buildout budget. It's far easier in the short term to delegate that hard thinking to the people publishing Bitcoin clients, like the Power Rangers, or the extremely-likely-to-be-usg-compromised-at-this-point Conformal btcd (a determination made solely on the basis of their failure to maintain representation in tmsr~), or in Mr. Stanley's case--Electrum.

    Electrum has always put me off; it's a very thick layer of abstraction over...vanilla bitcoind, preferably one of the USG-blessed forks. Its website describes it as a lightweight client, but in reality it is a perfect example of the modern "software development"/"open source" imperative to break hard protocols with various promisetronic layers; in this case Electrum lusers trade the one true blockchain's hard guarantees for convenience by booting up a client that talks to god only knows which "Electrum servers" -- I certainly couldn't tell who was running them or what protocol guarantees that software makes its users or if it even makes any promises beyond, you know, "making Bitcoin a safe space" or something. An Electrum user may run their own Electrum server, but even that requires, to quote, "...bitcoind, leveldb and plyvel".

    Fuck, buddy! What are you even buying at that point? If you have to run a Bitcoin node in the first place, I can think of a very few reasons to wrap it in more shitty open source software instead of getting your hands dirty, and none of them are flattering. []

  3. I calculate that this particular node has 99.962% uptime since I booted it sometime last month. During its first week of operation, it choked on something once and I had to go kick it over myself, but it's been rock solid ever since. []
  4. Never fear! Various clients are going to start pruning the blockchain any day now. Because that's a thing that makes sense; pruning a blockchain. []
  5. Were I to do this bit over again, I'd likely write all of the definitions in Ragel, as that's a proper state machine generator and well-suited to the problem at hand. Bueller? Bueller?. []
Older Posts »

---