Sync or swim « TreeTops

Sync or swim

May 19, 2017

The headline article in last month’s Communications of the ACM (“Attack of the Killer Microseconds”¹) got me thinking. The article was drawing attention to the fact that computations are now happening so fast that we now must now even consider the time it takes data to move from one side of a server warehouse to the other at the speed of light. Multi-threading (a)synchronous programming methodologies need to be re-thought given that the overheads of supporting these methodologies are becoming more of a burden compared to the actual computation work being undertaken.

That is not what got me thinking.

What got me thinking was a point being made that despite our generally held belief that asynchronous models are better performers in demanding high-throughput situations, the authors claim that synchronous APIs lead to “code that is shorter, easier-to-understand, more maintainable, and potentially even more efficient.” This last claim is supported by saying that abstracting away an underlying microsecond-scale asynchronous device (e.g. document storage) requires support for a wait primitive and these incur overheads in OS support (specifically: tracking state via the heap instead of the thread stack). So, asynchronous callbacks are bad. Not only that, they’re also ugly.

Here’s a terse paraphrasing of the coding illustration presented by Borroso:

Input:   V
Program: X = Get1(V); Y = Get2(X); Result = fn(Y);

The two Get function are assumed to involve a microsecond-scale device, so the above program would incur two very brief delays before being able to calculate the final result using the synchronous “fn” function. During those delays, the state of the program is in the thread’s stack.

An asynchronous equivalent would invoke the Get1 and provide a callback (or “continuance”) for the Get1 to invoke when it has finished. The callback would then invoke Get2, again with a callback to carry on when Get2 completes. The second callback would invoke fn, synchronously, to compute the final result. This somewhat ugly program would look like this²:

Program:    X = AysncGet1(V,ProgramCB1);
ProgramCB1: Y = AsyncGet2(X,ProgramCB2);
ProgramCB2: Result = fn(Y);

To support the callback parameter, these AsyncGet functions now need to maintain the current state information, including references to the required callback programs, which occupies heap space because the context can switch away during the async operations. It’s this extra program state management overhead that leads to underperformance, while making the code harder to follow.

However, I’m not sure if it is right to expose the callback chain and heap objects to demonstrate unnecessary complexity, in order to “prove” that such code is longer, harder to understand and less maintainable. The authors presented an example in C++, which can be obscure at the best of times, and packed it with comments, which certainly made it look big and messy, but it feels contrived to be a bad example.

So I offer this:

Program: X = Get1(V); &{ Y = Get2(X); &{ Result = fn(Y); } }

In this I have introduced a simple notation “&{…}” to indicate the continuation following a function call. This works whether or not the Get functions are synchronous or asynchronous. Furthermore, a clever compiler (perhaps with some metadata hints) could easily discover the subsequent dependencies following a function call and automatically apply the continuation delimiters. This means that a compiler should be able to deal with automatically introducing the necessary async support even when the code is written like this without being explicit about continuations:

Program: X = Get1(V);    Y = Get2(X);    Result = fn(Y);

This is just the original program, with the compiler imbued with the ability to inject asynchronous support. I’m not saying that this solves the problems, as it’s obvious that even an implied (compiler discovered) continuation still needs to be translated into additional structures and heap operations, with the consequent performance overheads. However, appealing to the complexity of the program source by presenting a contrived piece of ugly C++, especially after admitting that “languages and libraries can make it somewhat easier” does their argument no service at all. The claim that the code “is arguably messier and more difficult to understand than [a] simple synchronous function-calling model” doesn’t hold up. Explicitly coding the solution in C++ is certainly “arguably messier”, but that’s more a problem of the language. Maintainable easy-to-understand asynchronous programs are achievable. Just maybe not in C++.

As for the real focus of the paper, the need to be aware of overheads down at the microsecond scale is clear. When the “housekeeping” surrounding the core operations is taking an increasing proportion of time, because those core operations are getting faster, then you have a problem. Perhaps a programming approach that has the productivity and maintainability of simple synchronous programming while automatically incorporating the best of asynchronous approaches could be part of the solution.

CACM 04/2017 Vol.60 No.04, L. Barroso (Google) et al. DOI:10.1145/3015148
For the C++ version, about 40 lines long, see the original paper.

Categorised as: Coding, Technology

Posted by Rotan on May 19, 2017 at 1:30 pm