Prof. Douglas Thain

Monday, February 24, 2020

Getting Beyond Stack Overflow for CS Students

Question and answer sites like Stack Overflow (and similar Q&A sites) have become a valuable resource for programmers struggling with tricky problems. However, I have noticed that Q&A sites can become a "trap" for students learning how to program. You can get stuck for a long time searching for a question that doesn't quite address your question.

The problem is that Q&A sites are oriented toward solving very specific questions, rather than developing a general understanding of a technology. If a post answers the very specific question that you have, that's nice, but it doesn't necessarily help you to develop your own solution to other problems. What's worse, the answers are sometimes incorrect, or not applicable to the situation that you actually have.

Here is a classic train wreck with thousands of votes: How to validate an email address in JavaScript? One can assume the poster is looking for a quick one liner to solve this problem. The problem is that the question is not sufficiently well formed. How does one define an email address? Do you want to be conservative or flexible? What does it mean to "validate" anyhow? Some of the answers posted provide colossal regular expressions, but unless you understand regular expressions, you won't know if they work correctly. (Some of the solutions don't actually work.)

If you intend to be a professional who solves problems for people, you need to be able to think through these issues yourself, rather than just copy-pasting a solution. In the case of validating email addresses, that means learning how regular expressions work from first principles, and then thinking carefully about what problem you really intend to solve.

Now, I can't prohibit anyone from looking on Stack Overflow, nor will I try. But I would like to suggest some different habits of learning that will lead to more fulfilling results and less frustration. And, it will help you to become the sort of person who answers questions on Q&A sites.

An Approach To Solving Programming Problems

Learn the atoms that make up your system.
Experiment by combining them in simple ways.
Solve your actual problem gradually by building up complexity.

Learn the Atoms

Every layer of a computer system is an abstraction that is made up of some fundamental set of basic operations, which I'll just call atoms. Each atom manipulates the system in some particular and well-documented way.

If you are learning how to manage processes in Unix, then your atoms are system calls like fork, exec, wait, kill, and exit
If you are learning regular expressions, then your atoms are concatenation (xy), alternation (x|y, closure (x*), and so forth.
If you are learning how to render graphics in OpenGL, then your atoms are these functions like glBegin, glVertex, glColor, and glEnd.
If you are learning X86 assembly language, then your atoms are instructions like MOV, ADD, SUB, CMP, and JMP.

Sometimes just figuring out what the atoms are can be tricky. This is where a good introductory guide or textbook is helpful. Without giving you every last detail of the atom, an introductory guide tells you the set of atoms needed to get started and their general relationship. (This is probably the most important thing teachers do in the classroom: help you to understand the atoms of a system.)

Once you know what the atoms are, then you need to learn how they work in detail. This is going to require some effort on your part. Find the reference manuals for those atoms and read them. If it has a man page, read it. Yes, really read it, I'm not kidding. You ought to be able to easily summarize the purpose of each atom from memory, and look up the details when necessary.

A really elegant system design (e.g. Unix or LISP) will have only a handful of atoms. (That's what makes it elegant.) A not-so-elegant system (e.g. Win32) may have hundreds, and so you can't learn it all up front. In that case, you have to pick a few atoms that appear to work together, and proceed to the next step.

Experiment by Combining Things

Now that you understand some of the atoms, start to combine them together in simple ways and test them out. Don't even try to solve your initial problem yet, just try a few things at small scale.

For example, suppose you are learning Unix process management, and you think you have a basic understanding of the fork system call. Begin by writing the smallest program that does something, perhaps this:

pid_t pid = fork();
printf("hello from pid %d\n",getpid());

Now, test your understanding by making a prediction. What do you think this program will output? Ok, now run it. Did it output what you expect? Great, continue. If not, then go back and re-read the behavior of fork to see where you misunderstood it.

If that worked, then add one more atom and see what happens. Maybe you try this:

pid_t pid = fork();
execlp("/bin/ls","ls",0);

Again, make a prediction: what do you think this will output? Test your prediction. Did you get it right? Ok, add a little more:

pid_t pid = fork();
if(pid==0) {
execlp("/bin/ls","ls",0);
}
printf("created child %d\n",pid);

As you add complexity to your examples, start to think about undesired situations and edge cases. For example, what happens if you attempt to execute a program that doesn't exist? Change your little example to test that possibility, and predict its output:

pid_t pid = fork();
if(pid==0) {
execlp("/bin/junk","ls",0);
}
printf("created child %d\n",pid);

Hmm, that probably didn't have the desired effect. Can you explain what happened? You will have to add a little bit more code to handle the possibility of execlp() failing. Take a look at your atoms and see which one makes sense.

As you go on, you will gain confidence in your understanding of the atoms that make up the system, and you will be able to focus more on the structure of programs that use the atoms, rather than stumbling over how they work individually.

Solve Your Actual Problem Gradually

After spending some time working out the atoms and simple combinations, you are ready to come back to your initial problem. With the details of each atom clearly in your head, you can rely less on internet searches and online manuals, and more on using your own brainpower to make new combinations. Suppose you are given the following goal:

Your company has an important service that needs to be running continuously, except it has a habit of crashing every 30 seconds or so. Your boss asks you to write a "watchdog" program that keeps four copies of the service running all the time. Whenever any instance of the server crashes or exits, the watchdog should restart it.

There is no answer to this specific question on Stack Overflow, so you are going to have to figure it out for yourself. If the solution doesn't jump out at you, do not despair. Try solving a simpler problem first, and then approach the full problem by adding complexity gradually. For example, you could write four versions of the solution like this:

Version 1: Run one instance of the service and wait for it to finish.
Version 2: Run four instances of the service sequentially, starting one as soon as the previous as finished.
Version 3: Run four instances of the server in parallel, and wait until all of them have finished.
Version 4: Run four instances of the service continuously, so that as soon as one of the four exits, another one is started.

As you go along, you may find that you have forgotten a detail about one of your atoms. That's fine, go back to the manual and look it up again. Perhaps you encounter a very strange error message that is not mentioned in the manual. That's a great item to search for on the internet. But if you understand your atoms well, most of your mental energy can be focused on combining them into a coherent whole.

Monday, October 8, 2018

Compilers Book, First Edition

I am happy to announce that the first edition of "Introduction to Compilers and Language Design" is now available at http://compilerbook.org. This is a free online textbook: you can access the PDFs directly, or order an inexpensive hardcover book.

Thank you to all the students who previewed this book and fixed typos and other errata. Thanks especially to Andrew Litteken, who drafted and tested the chapter on ARM assembly.

Douglas Thain,
Introduction to Compilers and Language Design,
1st edition, 2018.
http://compilerbook.org
Hardcover ISBN: 978-0-359-13804-3

I first estimated the book would be finished in 2017, but it took another semester to finalize several chapters, while also teaching a different class!

Finally, I should mention that I was inspired by Andrea and Remzi Arpaci-Dusseau, who set a fine example by publishing Operating Systems in Three Easy Pieces as a free online textbook.

Monday, May 21, 2018

Graduation Address for Catholic Engineers

I had the pleasure of speaking at the commencement ceremony for the Department of Computer Science and Engineering at the University of Notre Dame in 2018. Here is what I wrote:

Good afternoon!

It is a real pleasure to see all of these young men and women dressed up so nicely. As you may know, I spent a semester with all of the CSE students here as juniors in operating systems, compilers, and some other classes as well. It was rather early in the morning, and so as often as not, they were dressed in pajamas, flip-flops, and sometimes sweaty gym clothes. It is lovely to see everyone at their best.

I invite you all to just relax for a little while, look around you, and take in this moment. Reflect a bit on the path that led you here.

I'm sure that for most of you, it seems like you just arrived! Perhaps you took a long road trip to South Bend, and unloaded the car on a sweaty August afternoon. You met your roommate and probably wondered about their peculiar taste in music or food. You found your way around campus, met some interesting professors, did a lot of homework, and probably slept through a lecture once in a while. I'm sure every one of you has had moments of triumph, and some of tears as well. I hope that you made some lifelong friends, and maybe found romance along the way as well.

But of course, the reason that you came to Notre Dame was to struggle with the timeless questions that scholars have asked throughout the ages:

What is free will, and do humans really have it?
Are we redeemed by our faith, by works, or by the grace of God?
What is the meaning of "segmentation fault"?
Why does "git pull" default to merging instead of rebasing?

As pleasant as it is to look back upon the college years, I am here to tell you that this is not the end, but only the beginning. The world needs your talents, and you have a lot of important work ahead of you.

The modern world needs engineers, and our moment in time particularly needs Catholic engineers. To that end, I would like to offer a little reflection on the personal motto of Bishop Rhoades, which is "Veritatem in Caritate" or "Truth in Charity" This is an excellent motto for an engineer to keep in mind.

Truth in Charity: You cannot speak the truth effectively unless you speak with charity. Likewise, it makes no sense to speak with charity if what you say is not the truth.

First, truth. Our society is currently undergoing a loss of confidence in the idea of truth itself. Senator Daniel Patrick Moynihan once said, "Everyone is entitled to their own opinion, but not their own facts." and this was often repeated as a sensible guideline for political debate. Today, we are finding it more and more difficult to agree on the basic facts of a situation. And there is no point debating our opinions without first having facts: What was the high temperature today? Which car is more fuel efficient? How many people in Indiana are unemployed?

The wonder of the Internet is that it has enabled everyone access to the world's knowledge, and allowed everyone to have a voice, which is empowering. But it has become harder for the average person to distinguish between trustworthy information and outright fabrications. I just took a look at the fact-checking site snopes.com, which felt it was necessary to debunk the myth that it is common for ostriches to go downhill skiing in Japan!

But an engineer deals with physical reality and knows that the truth exists, whether we like it or not. Nature cannot be cheated. A drone that runs out of battery will fall out of the sky. A program with a race condition will lock up. A circuit that draws too much power will catch fire. An engineer's job is to stand up for the truth -- however inconvenient it may be -- because they know that Nature will come to settle the account sooner or later.

Second, charity. It's no secret that our society is short on charity, which is a concern for the well-being of our neighbors. You all know the headlines about things like subprime mortgages, cheating on emissions tests, and the sale of personal data. In these and many more cases, clever people used their talents to benefit themselves while exacting a price not just against their immediate victims, but on society as a whole. Volkswagen might be able to pay back car owners who were defrauded, but there is nothing it can do to extract the excess pollution from the atmosphere.

As engineers, you will build the machines that make society operate. In the twentieth century, that meant things like locomotives and roads and bridges. In the twenty-first century, it includes search engines, voting machines, and self-driving cars. In every single one of these cases, yes, there is a customer to satisfy, but society at large a greater interest to be protected.

And so, as engineers, I charge you to speak the truth in charity:

Some of you may work on self-driving cars, which have the potential to reduce crashes and save time and money. But when a self-driving car has a flaw that threatens pedestrians, who will speak the truth in charity? You will!

Some of you may work on artificial intelligence, which enables us to find hidden patterns in massive datasets. But when a neural network perpetuates racism because of its poorly chosen training data, who will speak the truth in charity? You will!

Some of you may work on systems for digital voting, which can make it easier for every citizen to participate in the political process. But when a voting system with a security flaw puts our democracy at risk, who will speak the truth in charity? You will!

Now, perhaps that is all a bit heavy for this moment. It's a ~~beautiful~~ (rainy) spring day, you have worked hard to be here, and it's time to celebrate. So, let me instead give you one piece of advice that is more immediate. You can put it into use today!

It is all too easy to get attached to our phones, our computers, our gadgets, and get sucked into the endless chatter of status updates, news items, upvotes, comments, and so forth. These constant distractions can prevent us from having deeper experiences with other people, and draw us away from truth and charity.

Take time every day to turn off your gadgets. Enjoy a meal without looking at your phone. Spend an evening without the TV. Take a walk without looking at your watch. Trust me, your friends, your spouse, your children, and your parents will be much happier with your full attention. You can start at dinner with your family tonight: whoever picks up their phone first during dinner pays the bilk!

It has been a pleasure to have you here at Notre Dame.

Congratulations and good luck!

Don't forget to brush your teeth.

Monday, December 4, 2017

Build a Compiler for AlbaCore in Spring 2018

Profs. Brockman and Thain are currently recruiting multiple undergraduate students to participate in a spring project at the intersection of compilers and computer architecture. The objective is to build a software toolchain (compiler, assembler, simulator, documentation, etc) that will allow programs written in C-minor to be compiled to the albaCore computer architecture. This package will be used in future offerings of logic design to assist students in running real, complex programs on custom FPGA hardware.

Juniors or seniors who have taken either compilers or architecture (or both) are invited to apply by contacting either Prof. Thain (dthain@nd.edu) or Brockman (jbb@nd.edu). The project will be offered as a three-credit undergraduate research class in Spring 2018.

Tuesday, June 27, 2017

Talk at ScienceCloud Workshop

Prof. Thain gave the opening talk, "Seamless Scientific Computing from Laptops to Clouds", at the ScienceCloud workshop preceding High Performance Distributed Computing 2017 in Washington, DC. This talk gives an overview of the problem of migrating scientific codes from the comfortable environment of a laptop to the complex environment of a cluster or a cloud, highlighting our new tools for software deployment and resource management for bioinformatics and high energy physics applications.

Wednesday, May 31, 2017

Online Course in Data Intensive Scientific Computing

We are happy to announce the pilot of a new online short course in Data Intensive Scientific Computing. This is the equivalent of a one-credit seminar which provides an introduction to the challenges of scientific computing at large scale and the tools used to address those problems.

The course was designed to augment our summer REU program in DISC, but is also suitable for undergraduate students taking research credits, and for graduating students in all disciplines looking for an introduction to topics and tools in scientific computing.

By default, the online course is ungraded: anyone is welcome to sign up, view the lectures, take the quizzes, and follow the tutorials. If you want to receive a grade, talk to a faculty member at your institution to see if they will work with you on the material.

The course is developed by Prof. Douglas Thain and Prof. Paul Brenner, produced by the Office of Digital Learning at the University of Notre Dame, and offered through the EdX Edge platform.

You can check out a sample lecture here:

And here is an overview of the course structure:

Thursday, February 2, 2017

Writing a Compilers Textbook

To my surprise, I am in the final steps of writing a textbook! You can see a sample chapter today at compilerbook.org.

The effort began in the fall of 2016, as I was putting together my materials for CSE 40243, our undergraduate class in Compilers and Language Design. This class focuses on the challenges of engineering a working language: students implement a working compiler that translates a C-like language into X86 assembly.

While there are a variety of solid textbooks that are great for a graduate course in compiler theory and optimization, none quite had the flavor I was looking for. Nearly every CS grad needs to write a parser, evaluator, or translator for some kind of little language in their career, but relatively few need to dig deeply into assembly language optimization. So, I wanted to focus on language design choices and show that simple languages are not hard to implement.

I began to combine my handwritten chalkboard notes and some sample code into a LaTeX document, and the next thing you know, I have seven chapters written. I expect to finalize everything in the spring 2017 semester.

What has made it relatively easy so far is that my compiler automatically generates many of the figures and code examples automatically, so relatively few things have to be drawn by hand. For example, this sample AST is produced automatically by the compiler emitting Graphviz DOT code from the internal representation. Neat, eh?

Following the example of Remzi and Andrea Arpaci-Dusseau with OSTEP the book will be made available for free online in PDF form, and also in an inexpensive hardcover edition printed on-demand.

Stay tuned for the release later in 2017...