Sunday, November 30, 2008

Visualizing a Large Distributed System with Enavis

Two students at Notre Dame, Qi Liao and Andrew Blaich, recently received the Best Paper award at USENIX LISA for their work on Enavis, a tool that gives a visual display of network traffic collected by the Lockdown network administration tool. Enavis gives the administrator of a large network a way to browse all of the users, programs, hosts, and network connections in a system of hundreds or thousands of machines. Here is what it looks like:

The picture doesn't really do it justice: you can grab, twist, and scroll the view, and the graph reacts in real-time. It's really quite fun to play around with. You can use it to debug performance problems, chase down intruders, or just observe system behavior over time.

The challenge with any visualization is deciding what small part of the available data to display. Lockdown collects an enormous amount of data: anytime a program makes a network connection, we record the host, user, program, and port numbers. This data has been recorded continuously across hundreds of machines for about a year now. Even if you pick one moment in time, you cannot possible display all of the active data in any reasonable way.

Instead, you begin by a known starting location and a point in time, say user 33 last Thursday. What you get is a graph with user 33 at the center, out to a radius of one. If you want to see more, increase the radius, and the view expands:

There are many different ways to slice and filter the data. In the simplest case, you might be interested in known which hosts are talking to each other, or which programs or talking to each other, or which users are talking to each other. Or, you might want a mix: show what users are talking to each other, via which programs. To control all of these possibilities, Enavas has a meta-visualization: a graph that controls which data to display:

The meta-visualization represents hosts (H), users (U), and applications (A). You simply click on the graph to add or remove edges and modify the main display. For example, if the user adds an edge between H and U, then the main graph will show the relationship between hosts and users. If H has a circular link, then the main graph will show which hosts are talking to each other. The meta-visualization is a nice compact way of representing all 63 possible slices of the data.

For more information, you can read the paper about Enavis or visit the Lockdown website.

Thursday, November 13, 2008

The Wavefront Abstraction

This is the third in a series of posts on the idea of abstractions for distributed computing on clusters, clouds, and grids. An abstraction is a simple interface that allows you to scale up well-structured problems to run on hundreds or thousands of computers at once.

The Wavefront abstraction came up in a discussion with several economists. You want to compute a recurrence relation where each result depends on one or more previous results. The user provides initial conditions along the edges of a matrix, and then you can compute F at position (1,1). Once you do that, then you can compute F at (1,2) and (2,1), and so on. The work progresses like a wave across the matrix, hence the name Wavefront. Here is what it looks like:

We have a first implementation of this abstraction that can run on a Condor pool of multicore machines. You simply run it by stating the function, size of the matrix, and providing some files that state the initial conditions:

wavefront func.exe 100 100

This abstraction is interesting for several reasons. First, it needs a variable number of CPUs over time. Even if you had an infinite number of CPUs, it can only use one in the first step, two in the second, and so on until the wavefront reaches the diagonal of the matrix, after which it decreases again. So, it would be impossible to program this efficiently in a system like MPI where you have to choose a fixed number of CPUs. Instead, you want to allocate more CPUs over time. For example, here is a timeline of a Wavefront run on a 64-CPU cluster. The red line shows the number of CPUs in use, and the green line shows the percent of the problem completed:

Second, the problem has a certain degree of asychrony built in. You do not need to run each diagonal slice of the system in lock step. Instead, each cell can be computed as soon as its neighbors are down. Because of this, different parts of the problem can be delegated to different nodes, allowing them to run and finish at their own pace. If each function is fast, then you can delegate an entire square chunk of the task to a remote processor, and allow it to complete independently.

You can see this in the progress images generated by our implementation of Wavefront. These images shows the state of a workload. Green boxes indicated completed cells, blue indicate cells currently running, yellow are ready to run, and red are not able to run. This is an example of a 10x10 Wavefront running on only five processors:

You can read more about the Wavefront abstraction at the Cooperative Computing Lab.