CH

April 28, 2017

A Brief Foray Into The Swamp of TRB Boost Usage

Filed under: bitcoin — Benjamin Vulpes @ 5:13 p.m.

While I have learned little new in this sojurn, timely documentation and publication differentiates the Republic's research ratchet from the Empire's soft fecal matter of nominally peer-reviewed journals. To that end, I herein report on a short scouting trip into the borderlands of The Real Bitcoin's internals, mapping the edges of what a Boost excision might entail.

For those just tuning in, the Bitcoin reference implementation is written in the C++ "programming language". C++ is a notoriously impoverished tool, to the extent that for a great many years video game companies and other consumers of commodity programmer inventory built and maintained their own collection of abstractions to lighten the burden. "Boost", a collection of C++ libraries, eventually emerged from the primordial internet-soup in much the same fashion and with much the same result as other large open-source projects like Open Office, or The Gimp (which while adequate for some value thereof, discerning folks generally don't call "good"; little is in this life anyways, and libraries to patch over underlying language design failures or implementation lag are fundamentally unlikely to achieve the distinction).

Circa 2011, the C++ folks published a new version of the language, incorporating some Boost libraries wholesale, and reimplementing others. This release at least in theory provided the tooling for C++ programmers to build systems of non-trivial size without relying on Boost for niceties like...the BOOST_FOREACH construct; an iterator. That Bitcoin's author should have selected a software development toolchain of such poverty will forever be a stain upon his name (that and the fact that he thought Windows an adequate environment on which to build crypto-related software omfg), but his decision to lean on Boost can at least be understood in the context of working in C++ on a Microsoft machine before the C++11 release. Not excused, obviously, but understood.

Now "if we can, we must" replace Boost-isms with C++11-isms, but that's not what actually sent me down this rabbit hole in the first place. I'd been working on an only-tangentially-C++11 related patch to TRB and found myself in a position where I could either write a new function to use as a predicate in remove_if, or upgrade the codebase to C++11 and thereby buy myself lambdas (pre-11 C++ lacks anonymous functions) and iterators that could in theory replace at least a megatonne of BOOST_FOREACH's in the TRB codebase, and (hope springs eternal...) possibly even the use of Boost altogether.

Here is a vpatch (descending from `wires_rev1`) to compile TRB with C++11 semantics. It's 1% compiler, flags, 1% ambiguity resolution, 1% pointer type twiddling, and 97% addition of spaces to please GCC. I've not transmitted it to the mailing list, nor signed it yet as I'm not even convinced that it's useful, or an actual "decrufting" of the codebase on its own:

c11.vpatch

I wanted to know, of all the source files, which have references to boost, and how many.

$ find . -name "*.cpp" -o -name "*.h" | xargs grep -oi "boost" | awk -F ":" {'print $1'} | uniq -c | sort -nr

    116 ./test/util_tests.cpp
     65 ./bitcoinrpc.cpp
     53 ./serialize.h
     48 ./main.cpp
     43 ./json/json_spirit_reader_template.h
     34 ./json/json_spirit_value.h
     29 ./test/script_tests.cpp
     28 ./util.cpp
     27 ./test/DoS_tests.cpp
     25 ./net.cpp
     24 ./wallet.cpp
     16 ./util.h
     14 ./main.h
     13 ./test/Checkpoints_tests.cpp
     10 ./db.cpp
      9 ./test/base58_tests.cpp
      8 ./wallet.h
      8 ./init.cpp
      7 ./test/uint256_tests.cpp
      7 ./test/uint160_tests.cpp
      6 ./test/transaction_tests.cpp
      6 ./test/miner_tests.cpp
      6 ./test/base64_tests.cpp
      6 ./net.h
      5 ./script.cpp
      4 ./checkpoints.cpp
      2 ./test/test_bitcoin.cpp
      2 ./script.h
      2 ./noui.h
      2 ./json/json_spirit_writer.h
      2 ./json/json_spirit_writer.cpp
      2 ./json/json_spirit_reader.h
      2 ./json/json_spirit_reader.cpp
      2 ./headers.h
      1 ./keystore.cpp
      1 ./json/json_spirit_utils.h

To get a rough picture of the scope of the work that might be involved in a Boost excision, I removed Boost imports from `bitcoinrpc.cpp` (in TRB's source tree), and recompiled TRB. Output interspersed with notes follows:

bitcoinrpc.cpp: In function ‘int ReadHTTPStatus(std::basic_istream&)’:
bitcoinrpc.cpp:2082:5: error: ‘split’ is not a member of ‘boost’
     boost::split(vWords, str, boost::is_any_of(" "));
     ^
bitcoinrpc.cpp:2082:31: error: ‘is_any_of’ is not a member of ‘boost’
     boost::split(vWords, str, boost::is_any_of(" "));
                               ^

The code in question:

int ReadHTTPStatus(std::basic_istream<char>& stream)
{
    string str;
    getline(stream, str);
    vector<string> vWords;
    boost::split(vWords, str, boost::is_any_of(" "));
    if (vWords.size() < 2)
        return 500;
    return atoi(vWords[1].c_str());
}

As you can see, Boost is used in this context to split the string on spaces, ensure that there are at least two elements in the string, and return the second. Why it would not have sufficed to search the string for the space character and return the substring delimited by the first and second space, or first space and end of the string is not immediately apparent to me. C and C++ are full of caltrops in string handling which compounds when dealing with potentially external data. This nugget smells somewhat as though it were written by someone perhaps subconsciously afraid of the weaknesses of their tools, patching over that very important-to-listen-to suspicion by using what claims to be a well-reviewed and secure library for Getting Shit Done in C++.

Further up the call stack, ReadHTTPStatus is called from ReadHTTP, a function which is itself called on thread from within ThreadRPCServer2 and passed HTTP streams, and return value pointers for the callee to mutate while handling a request from connections that may or may not be authorized to poke the RPC interface. ReadHTTPS is also called from within CallRPC to get the HTTP response code for a call /to/ the RPC server. By my read, this particular pile of chairs can attempt to read data from the network if TRB is booted with the config variable `rpcallowip` bound to a public network interface and `rpcport` to an open port, but I invite correction on this point.

While slicing the Boost out of this call is far from intractable, it would entail adding several lines of string tokenization, not to mention the risks associated with handling strings from the net in C++.

Returning to the compilation failures:

bitcoinrpc.cpp: In function ‘int ReadHTTPHeader(std::basic_istream&, std::map, std::basic_string >&)’:
bitcoinrpc.cpp:2101:13: error: ‘trim’ is not a member of ‘boost’
             boost::trim(strHeader);
             ^
bitcoinrpc.cpp:2102:13: error: ‘to_lower’ is not a member of ‘boost’
             boost::to_lower(strHeader);
             ^
bitcoinrpc.cpp:2104:13: error: ‘trim’ is not a member of ‘boost’
             boost::trim(strValue);
             ^

Trimming and lower-casing strings is not an impossible task without Boost.

I always wanted to get into the guts of an "ad-hoc, informally specified, and bug-ridden" implementation of an HTTP (header) parser!

int ReadHTTPHeader(std::basic_istream<char>& stream, map<string, string>& mapHeadersRet)
{
    int nLen = 0;
    loop
    {
        string str;
        std::getline(stream, str);
        if (str.empty() || str == "\r")
            break;
        string::size_type nColon = str.find(":");
        if (nColon != string::npos)
        {
            string strHeader = str.substr(0, nColon);
            boost::trim(strHeader);
            boost::to_lower(strHeader);
            string strValue = str.substr(nColon+1);
            boost::trim(strValue);
            mapHeadersRet[strHeader] = strValue;
            if (strHeader == "content-length")
                nLen = atoi(strValue.c_str());
        }
    }
    return nLen;
}

Contrast the use of standard library functions in splitting on the colon in ReadHTTPHeader, with the Boost usage in ReadHTTPStatus to achieve approximately the same thing. boost::trim could be replaced with standard library calls, along with boost::to_lower.

Moving on to the truly hairy Boost stuffs:

bitcoinrpc.cpp: In function ‘bool HTTPAuthorized(std::map, std::basic_string >&)’:
bitcoinrpc.cpp:2142:47: error: ‘trim’ is not a member of ‘boost’
     string strUserPass64 = strAuth.substr(6); boost::trim(strUserPass64);
                                               ^
bitcoinrpc.cpp: In function ‘bool ClientAllowed(const string&)’:
bitcoinrpc.cpp:2191:23: error: ‘asio’ has not been declared
     if (strAddress == asio::ip::address_v4::loopback().to_string())
                       ^
bitcoinrpc.cpp: In function ‘void ThreadRPCServer2(void*)’:
bitcoinrpc.cpp:2247:5: error: ‘asio’ has not been declared
     asio::ip::address bindAddress = mapArgs.count("-rpcallowip") ? asio::ip::address_v4::any() : asio::ip::address_v4::loopback();
     ^
bitcoinrpc.cpp:2249:5: error: ‘asio’ has not been declared
     asio::io_service io_service;
     ^
bitcoinrpc.cpp:2250:5: error: ‘ip’ has not been declared
     ip::tcp::endpoint endpoint(bindAddress, GetArg("-rpcport", 8332));
     ^
bitcoinrpc.cpp:2251:5: error: ‘ip’ has not been declared
     ip::tcp::acceptor acceptor(io_service, endpoint);
     ^
bitcoinrpc.cpp:2253:5: error: ‘acceptor’ was not declared in this scope
     acceptor.set_option(boost::asio::ip::tcp::acceptor::reuse_address(true));
     ^
bitcoinrpc.cpp:2253:32: error: ‘boost::asio’ has not been declared
     acceptor.set_option(boost::asio::ip::tcp::acceptor::reuse_address(true));
                                ^
bitcoinrpc.cpp:2258:9: error: ‘ip’ has not been declared
         ip::tcp::iostream stream;
         ^
bitcoinrpc.cpp:2260:9: error: ‘ip’ has not been declared
         ip::tcp::endpoint peer;
         ^
bitcoinrpc.cpp:2262:26: error: ‘stream’ was not declared in this scope
         acceptor.accept(*stream.rdbuf(), peer);
                          ^
bitcoinrpc.cpp:2262:42: error: ‘peer’ was not declared in this scope
         acceptor.accept(*stream.rdbuf(), peer);
                                          ^
bitcoinrpc.cpp: In function ‘json_spirit::Object CallRPC(const string&, const Array&)’:
bitcoinrpc.cpp:2389:5: error: ‘ip’ has not been declared
     ip::tcp::iostream stream(GetArg("-rpcconnect", "127.0.0.1"), GetArg("-rpcport", "8332"));
     ^
bitcoinrpc.cpp:2390:9: error: ‘stream’ was not declared in this scope
     if (stream.fail())
         ^
bitcoinrpc.cpp:2401:5: error: ‘stream’ was not declared in this scope
     stream << strPost << std::flush;
     ^

asio::ip::address_v4::any(), loopback(), ip::tcp::endpoint, ip::tcp::acceptor, ip::tcp::iostream, and asio::io_service have no standard library alternatives (and aren't terrifically meaningful outside of their own context of "asynchronous input/output" anyways). References to the loopback interface could be excised, such that the user must specify allowed IP addresses at runtime. Cutting the io_service out will entail a rewrite of the JSON/RPC machinery as well. Relatedly, this is an excellent example of an explosion in system complexity and the dependency graph directly attributable to importing the HTTP stack, which in turn is directly attributable to the cultural blind spot that is "whaddaya mean, your meegroserbusses don't all talk over HTTP?"

So yes, count me among those who looked into trimming TRB into shape and concluded that a rewrite is unavoidable. Conveniently, with a reference implementation in hand, the rewrite is just lots of work. "Yesterday's Nobel is tomorrow's homework" or how did it go.

No Comments »

No comments yet.

RSS feed for comments on this post. TrackBack URL

Reply

« How to Fuck Up Without Being a Fuckup --- veh patch: overall improvements »