Historically, highly concurrent programming has been closely associated with high performance computing. Two programming models have been dominant: shared memory machines in which concurrency was expressed via multiple threads, and distributed memory machines in which concurrency was expressed via explicit message passing. It is widely agreed that both of these programming models are very challenging, even for the veteran programmer. In both cases, the programmer is directly responsible for designing the program from top to bottom and handling all of the issues of granularity, consistency, and locality necessary to achieve acceptable performance, with very little help from the runtime or operating systems.
However, a new approach to concurrent programming has been emerging over the last several years, in which the user programs in a much higher level language and relies upon the system to handle many of the challenging underlying details. To achieve this, the program is successively decomposed into simpler representations, such that each layer of the system can gradually adapt it to the hardware available.
The layers can be described as follows:
- A declarative language (DL) for compactly representing a complete program.
- A directed graph (DAG) to represent the expanded program and its resources.
- A bag of independent tasks (BOT) with explicit input and output dependencies.
- A shared-nothing cluster to which data and tasks must be assigned.
A (very incomplete) selection of systems that follow this model:
Layer | Cloud Stack | Workflow Stack | HPC Stack |
Declarative Language (DL) | Pig | Weaver | Swift-T |
Directed Acyclic Graph (DAG) | Map-Reduce | Makeflow | - |
Bag of Tasks (BOT) | JobTracker | Work Queue Master | Turbine |
Distributed Data | HDFS | Work Queue Workers | MPI |
Each layer of the system fulfills a distinct need. The declarative language (DL) at the top is compact, expressive, and easy for end users, but is intractable to analyze in the general case because it may have a high order of complexity, possibly Turing-complete. The DL can be used to generate a (large) directed acyclic graph (DAG) that represents every single task to be executed. The DAG is not a great user-interface language, but it is much more suitable for a system to perform capacity management and optimization because it is a finite structure with discrete components. A DAG executor then plays the graph, dispatching individual tasks as their dependencies are satisfied. The BOT consists of all the tasks that are ready to run, and is then scheduled onto the underlying computer, using the data dependencies made available from the higher levels.
Why bother with this sort of model? It allows us to compare the fundamental capabilities and expressiveness of different kinds of systems. For example, in the realm of compilers, everyone knows that a proper compiler consists of a scanner, a parser, an optimizer, and a code generator. Through these stages, the input program is transformed from a series of tokens to an abstract syntax tree, an intermediate representation, and eventually to assembly code. Not every compiler uses all of these stages, much less the same code, but by using a common language, it is helpful to understand, compare, and design new systems.