As with parallel programming languages, a lexicon of moves designed for jugging in pairs would help these jugglers plan their performances. Image courtesy Dave Herholz, CC 2.0
Whether on a grid, cloud, or cluster, today’s science software runs in parallel, executing many calculations simultaneously. Programmers have found many ways to create parallel programs in traditional programming languages. Nevertheless, today there are a number of emerging languages that are designed specifically for programming in parallel. To learn more about these unusual programming languages, iSGTW caught up with John Shalf, the team lead of the Advanced Technologies Group at the National Energy Research Scientific Computing Center.
iSGTW: Thanks for joining us today, John.
Shalf: It’s my pleasure to talk to you.
iSGTW: We’ve heard about parallel programming languages. Could you explain to us what that is?
Shalf: A sequential programming language, like conventional C/C++ or Fortran, assumes a set of instructions that are executed sequentially by a single processor. A parallel programming language is designed to program systems that consist of multiple processors, and therefore have multiple simultaneous instruction streams.
iSGTW: Interesting. So why does the world need parallel programming languages?
Shalf: Clock frequencies of chips are no longer increasing, so all future improvements in computer speed will come from parallelism. This affects computing devices all the way down to the scale of cell phones.
Many programmers still think in terms of converting serial programs to run in parallel, but that approach can be very limiting because the best parallel algorithms are completely different from their serial counterparts. Parallel languages can make it easier to write parallel algorithms and the resulting code will run more efficiently because the compiler will have more information to work with.
iSGTW: So, are parallel algorithms just parallelized versions of existing serial algorithms?
Shalf: Hopefully not. The expression of a good parallel algorithm is often fundamentally different than the way you would express a problem serially. For example, you can use OpenMP directives in parallelized code written in a serial langauge. These OpenMP directives can be ignored so that it will still run correctly on a serial processor. However, it is not true that applying the keywords to any arbitrary piece of serial code will necessarily result in a parallel speedup. Although you can often achieve a speedup by parallelizing a serial algorithm, the best parallel algorithm and the best serial algorithm may be completely different. So you hinder your ability to get the best possible performance if you start from a serial approach to solving a problem.
Of course, the method of expressing a problem is entirely different from the question of algorithms – these are two different things. You really need explicitly parallel algorithms to achieve good parallel efficiency, even if you leverage elements of existing serial languages.
iSGTW: Is it possible to use a sequential programming language to create a program that can run in parallel?
Shalf: Yes. In fact, some of the most popular approaches to parallel programming leverage the syntax of existing sequential languages, such as C, and add additional ornamentation or use the type system to provide more specific guarantees of locality of effect. These changes can be exploited by the compiler and runtime environment to facilitate parallelization. So, using only a handful of keywords, one can ornament C functions or subroutines to guarantee locality of effect. The changes, in effect, turn those procedures into elements of a functional language. And as mentioned above, functional languages provide guarantees regarding the scope of side-effects that enable the runtime environment to expose the necessary concurrency.
CILK is a good example of this approach. CILK uses a very simple set of keywords to annotate functions so that they can be scheduled implicitly. Intel’s Ct is interesting because it uses special vector data-types (Vec’s) to enable implicit expression of parallelism. Ct defines a single-assignment array (a construct from functional languages) to restrict the scope of side-effects (unbounded references to remote data) to enable its specially runtime system more freedom to schedule parallel work. The keyword annotations (in the case of CILK) and the single assignment array structures (in the case of Ct) are in fact implementing functional semantics. OpenMP, on the other hand, uses annotation to express explicitly parallel constructs. These are alternative approaches to expressing parallelism.
Most parallel applications to date use just one of these forms of expressing parallelism, but with the deeper hierarchical structure of HPC systems, there has been a movement towards hybrid programming models. Hybrid programming involves combining multiple techniques together – applying a different programming paradigm to different levels of the hierarchical machine. For example, mixing MPI for inter-node communication with OpenMP for the on-node parallelization. This hierarchical form of expressing parallelism requires additional work, but matches the expression of parallelism to the distinct properties of the underlying hardware to gain additional efficiencies.
GPUs require a different approach to parallel programming. Originally Brook and Cg were used to program early GPGPUs. NVidia has popularized CUDA as the primary language to program their general purpose GPUs. More recently, the industry has worked together on the OpenCL standard as a common model for programming GPUs of different architectures (including ATI, NVidia, and even microprocessors).
iSGTW: People already know serial languages. Isn’t it easier to use these ornamented serial languages to implement parallelism, rather than learning to use a brand new language?
Shalf: I worry in general that serial languages do not provide the necessary semantic guarantees about locality of effect that is necessary for efficient parallelism. Ornamenting the language to insert the semantics of such guarantees (as we do with OpenMP scoping of arrays) is arduous, prone to error, and quite frankly not very intuitive. It seems that existing sequential languages underspecify the semantic guarantees that are required to ensure safe parallelization, and overspecify the solution to a problem, which limits opportunities for parallel scheduling. Fixing those deficiencies after-the-fact with annotations seems more unfriendly to programmer productivity than other programming models that are currently being considered.
iSGTW: So how are truly parallel programming languages different from ornamented sequential programming languages?
Shalf: It depends on the language. For example, PGAS (partioned global address space) languages such as UPC (Unified Parallel C) and Co-array Fortran offer more explicit control over threading and data location. Functional languages, such as SCALA and Haskell, offer an implicit way of exposing parallelism, making it easier to identify data dependencies so that the runtime system can implicitly schedule independent work (work that respects the dependency graph).
iSGTW: You mention SCALA and Haskell. Hasn’t Haskell been around for years? Why would functional languages be important now?
Shalf: Dataflow and other functional language research from the 80s seems to be getting renewed interest precisely because they offered some elegant approach to exposing implicit parallelism. Efficient parallelization requires strict boundaries on the scope of side-effects. The need for locality-of-effect has been examined in exquisite detail during the computer science community’s obsession with dataflow hardware back in the 1980s. Functional languages provide the most efficient approach to programming dataflow computing architectures because they provide strong semantics that enforce locality of effect, and therefore make safe, automatic parallelization tractable (if and only if you are willing to express your algorithm using a functional language).
There was a lot of interest in these kinds of functional languages in the 70s and 80s, which was largely choked off in the following decade because of the relentless improvements in sequential CPU performance during the 90s. I think we can learn a lot by retracing our steps.
iSGTW: What are some other interesting approaches to making parallel programming more tractable?
Shalf: The examples we’ve discussed so far represent general-purpose languages for expressing parallelism. There are also domain-specific languages that enable parallel algorithm expression for a more restricted problem domain. This approach can be very powerful because conventional languages tend to over-specify the solution to a problem. The Ct team is looking at supporting various parallel patterns within the language – a family of domain-specific constructs to support domain-specific parallelism. SEJITS is another approach to packaging DSL’s in a conventional language framework by allowing specializers to identify pieces of code for domain-specific parallelization and dynamically rewriting the code to create the parallel implementation.
We are looking at domain-specific languages with the goal of making it easier to write complex PDE solver codes, such as the climate code. Climate scientists find writing loops for stencil-based PDE solver kernels to be very tedious. They would much rather write the PDE equation and have the code that implements the parallelism written for them. This turns out to be relatively straightforward to accomplish at a coarse-grained level (and we have done this in the past with the Cactus framework at www.cactuscode.org), but there is a lot of work to do in order to target more complex architectures such as GPUs in the future.
iSGTW: Any thoughts about the future?
Shalf: Well, as I stated before, I think implicit parallelism and constructs derived from functional languages are likely to see a resurgence. So in some sense, some of the most important ideas for the future may well be dug up from research of the past. But the most important thing moving forward is that everyone is now affected by parallelism. So the ideas have moved from being the interest of a handful of academics and supercomputing advocates to a mainstream problem that is important to the broader computing industry – the likes of Microsoft and Intel. This means it is even more imperative that we train future computer scientists to solve problems using parallelism from the get-go. The transformation of our educational system will be as big and disruptive as the changes to our software environment. So we’d better start now.
—Miriam Boon, iSGTW