Non-Blocking synchronization
Maurice Herlihy

Computer architecture is about to undergo, if not another revolution, then a vigorous shaking-up. The major chip manufacturers have, for the time being, simply given up trying to make processors run faster. Instead, they have recently started shipping ``multicore'' architectures, in which multiple processors (cores) communicate directly through shared hardware caches, providing increased concurrency instead of increased clock speed. This course will cover the basics of non-blocking synchronization in multiprocessors. We will cover the relative power of synchronization instructions, universal non-blocking constuctins, as well as specific wait-free, lock-free, and obstruction-free algorithms and data structures.

Atomicity analysis for concurrent software
Shaz Qadeer

Ensuring the correctness of multithreaded programs is difficult, due to the potential for unexpected and nondeterministic interactions between threads. This problem has traditionally been addressed by devising tools for detecting race conditions, a situation where two threads simultaneously access the same data variable, and at least one of the accesses is a write. However, verifying the absence of such simultaneous-access race conditions is neither necessary nor sufficient to ensure the absence of errors due to unexpected thread interactions. Recent work has proposed a stronger non-interference property is required, namely atomicity. Atomic methods can be assumed to execute serially, without interleaved steps of other threads. Thus, atomic methods are amenable to sequential reasoning techniques, which significantly simplifies both formal and informal reasoning about program correctness. My lectures will introduce the atomicity specification and present various analyses for checking this specification.

Dissecting Transactional Semantics and Implementations
Suresh Jagannathan, Jan Vitek

There has been much recent interest in exploring the role of software transactions as an alternative concurrency control mechanism to mutual exclusion locks. While transactions can provide numerous benefits with respect to programmability, safety, and performance, their semantics and implementations are necessarily complex.

However, by carefully examining different feature sets for software transactions, we can better understand their utility and limitations. In this course, we will cover semantic frameworks for software transactions, and implementations inspired by these frameworks. Topics include issues related to atomicity, nesting, multit-threaded transactions, and language memory models.

We will also describe how to leverage insights from the semantics to define implementations, not only for general-purpose concurrent programming, but also for more constrained, but equally challenging domains. Examples will include safe software-level speculation, atomic regions for realtime systems, and lightweight checkpointing.

Adaptive Algorithms for new Parallel Supports
Bruno RAFFIN, Jean-Louis ROCH and Denis TRYSTRAM

Today, most of the high performance computations run on new parallel and distributed systems (clusters or grid computing). They are characterized by new features including heterogeneity and versatility of the resources. Some computing or communication resources evolve (they become available or are temporary utilized by other applications). This changing context occurs not only for the resources, but also in the application itself: an algorithm can be modified to react to some dynamic changes. Thus, we have to imagine mechanisms able to adapt their behaviour to changes during the execution without slow down the running time. This talk will investigate several ways to address this important problem of adaptivity focusing at the middleware level.

The design, implementation, and use of java.util.concurrent
Doug Lea

This short course presents the synchronization and concurrency control constructs available since Java release 5.0. The presentation includes the discussion of some new blocking and non-blocking algorithms designed for this package, adaptations of others, engineering issues that arise in their development and widespread use, and examples of practical usages. This course will alternate between lectures and short hands-on exercises.

Bisimulation and co-induction
Davide Sangiorgi

Bisimulation has been introduced in Concurrency Theory as a way to establish when two processes are "the same", i.e., undistinguishable to an external observer interacting with them. Such a notion of undistinguishability is at the heart of any theory of processes. The success of bisimulation is largely due to the associated bisimulation proof method that can be used to prove bisimilarities among processes.

Bisimulation is an instance of the more general notion of co-induction. Nowadays, bisimulation and co-induction are widely used, not only in Concurrency, but, more broadly, in Computer Science, in a number of areas, including: functional languages, type theory, data types, domains, databases, compiler optimisations, program analysis, verification tools.

In the lectures, I will present bisimulation and co-induction. In particular I will discuss their associated proof method. If time permits, I will also discuss some questions and problems that today are still unsettled.

A Theoretical Basis of Communication-Centred Programming for Web Services
Nobuko Yoshida

This lecture introduces a formalisation of a novel programming discipline for Web Service, based on communication-centred specifications. This work is based on long conversations and collaborations with W3C's working group on a description language for web services called choreography description language (CDL). My participation in W3C's standardisationon process started three years ago, when three pi-calculus experts (Milner, Honda and Yoshida) were invited to the CDL WG. Our specification resulted in a standard which will become a W3C recommendation soon. It has also led to a new principle of programming which consists of two distinct ways for describing communication-centric software, one centring on global message flows and another centring on end-point behaviour, distilled as formal calculi. The two paradigms of description share a common feature, structured representation of communications. The global calculus originates from CDL itself, while the local calculus is based on the pi-calculus, one of the representative calculi for communicating processes.

The theoretical part of the lecture will illustrate these two frameworks using simple but non-trivial examples, present the static and dynamic semantics of the calculi, and show a basic theory of end-point projection --- that any well-formed description in the global calculus has a precise representation in the local calculus. The theory is intended to be a step towards a pi-calculus-based foundation of Web Service technologies.

X10: Computing at Scale
Vijay Saraswat

We present the design of X10, a modern programming language intended to address the twin challenges of high performance and high productivity on high-end computers operating with hundreds of thousands of hardware threads. At such scales, latency and bandwidth can vary widely across the machine, making it impractical to support (in mainstram architectures) a programming model based on uniform shared memory. Similarly, it does not appear that at such scales the SPMD model can provide the needed flexibility and performance (e.g. because of load imbalance).

X10 develops a few simple and powerful ideas. First, like other Partitioned Global Address Space languages, X10 abandons the notion of a monolithic address space: a *place* contains data, together with one or more activities operating on the data. A computation may have millions of places. Aggregate data-structures, such as multi-dimensional arrays, may be scattered across multiple places. Second, like Cilk, X10 emphasizes asynchronous, lightweight activities (recursive parallelism) rather than SPMD computations. An activity may spawn other activities locally or in remote places. Data must be operated upon only by activities in that place. Multiple activities mutating shared data may use atomic blocks (atomic stm) to ensure that data invariants are preserved. An activity may wait for subactivities to terminate before progressing (finish stm). Clocks (dynamic barriers) can be created dynamically and permit activities registered on the clock to progress determinately in coordinated phases.

Third, X10 leverages the flexibility, versatility and modularity of the mainstream object-oriented model. X10 may be viewed as an extension of sequential Java to support concurrency and distribution. Fourth, the X10 design guarantees that the programmer cannot commit large classes of errors. X10 adds to the guarantees (such as type safety, memory safety) it inherits from Java: X10 computations that do not use conditional atomic blocks cannot deadlock.

Taken together, these ideas provide a very rich, scalable and yet disciplined framework for concurrent programming over distributed datastructures. X10 covers those areas which are today handled by a combination of OpenMP, MPI and SPMD languages. We illustrate through several programming examples (e.g. from NAS parallel benchmarks) and discuss implementation issues.

Acute: Type-Safe and Version-Safe Distributed Programming
Peter Sewell

This course will discuss the main programming language design issues addressed in the Acute language, which supports type-safe and abstraction-safe communication, rebinding to local resources, version change, and computation mobility. This builds on the pi calculus and on the type theory underlying ML-style module systems, and the first part of the course will introduce the two.