CSE 443/543 High Performance Computing (3 credits)

Catalog description:

Introduction to the practical use of multi-processor workstations and supercomputing clusters. Developing and using parallel programs for solving computationally intensive problems. The course builds on basic concepts of programming and problem solving.

Prerequisite:

CSE 381

Required topics (approximate weeks allocated):

  • Introduction to parallel programming and high performance distributed computing (HPDC) (1)
    • Motivation for HPDC
    • Review of parallel programs and platforms
    • Implicit parallelism and limitations of instruction level parallelism (ILP)
    • Survey of architecture of commonly used HPDC platforms
  • Concurrency and parallelism (1)
    • Introduction to concurrency & Parallelism
    • Levels of parallelism
    • Instruction level parallelism
    • SIMD versus MIMD
  • Review of C programming language and the Linux environment (1.5)
    • Review of basic programming constructs
    • Applying Java/C++ syntax and semantics to C language
    • Introduction to problem solving using the C language
    • Introduction to Linux
    • C programming using Linux
    • C structures
  • Exploring instruction level parallelism (1)
    • Review of instruction level parallelism and sources of hazards
    • Concepts of hazard elimination via code restructuring (dependency reduction, loop unrolling)
    • Timing and statistical comparison of performance of c programs
  • Introduction to parallel programming (2)
    • Principles of parallel algorithms
    • Effects of synchronization and communication latencies
    • Overview of physical and logical communication topologies
    • Using MPE for parallel graphical visualization (parallel libraries)
  • Introduction to message passing paradigm (.5)
    • Principles of message-passing programming
    • The building blocks of message passing
  • Programming in MPI (3)
    • Introduction to MPI: The Message Passing Interface
    • MPI Fundamentals
    • Partitioning data versus partitioning control
    • Blocking communications and parallelism
    • MPI communication models
    • Blocking vs. non-blocking communication and impacts of parallelism
    • Developing MPI programs that exchange derived data types
    • Create MPI programs that use structure derived data types
    • Review of portability and interoperability issue
  • Performance profiling (1)
    • Using software tools for performance profiling
    • Performance profiling of MPI programs
    • Speedup anomalies in parallel algorithms
  • Collective communications (2)
    • Introduction to collective communications
    • Distributed debugging
    • Introduction to MPI scatter/gather operations
    • Exploring the complete collective communication operations in MPI
  • Scalability and performance (1)
    • Understanding notions of scalability and performance
    • Metrics of scalability and performance
    • Asymptotic analysis of scalability and performance
  • Exams/Reviews (1)

Learning Outcomes:

  1. Identify various forms of parallelism, their application, advantages, and drawbacks
    • Describe the spectrum of parallelism available for high performance computing
    • Compare and contrast the different form of parallelism
    • Identify applications that can take advantage of a given type of parallelism and vice versa
    • Identify suitable hardware platforms on which the various forms of parallelism can be effectively realized
    • Describe the concept of semantic gap as it pertains to high level languages and HPDC platforms
  2. Effectively utilize instruction level parallelism
    • Describe the concept of instruction level parallelism
    • Identify the sources of hazards that impact instruction level parallelism using a contemporary high level programming language
    • Apply source-code level software transformations to minimize hazards and improve instruction level parallelism
    • Compare performance effects of various source-code level software transformations using a performance profiler
  3. Effectively utilize multi-core CPUs and multithreading
    • Describe the concept of multi-core architectures
    • Describe the concepts of threads and distinguish between processes & threads
    • Demonstrate the creating threads using OpenMP compiler directives
    • Demonstrate the process of converting a serial program to a data parallel application
    • Demonstrate the process of converting a serial program to a task parallel application
    • Describe race conditions and side effects
    • Demonstrate the process of resolving race condition using OpenMP critical sections
    • Describe the performance tradeoff of using critical sections
    • Describe the process of identifying and using multiple independent critical sections
    • Measure performance gains of multithreading
  4. Specify, trace, and implement parallel and distributed programs using the Message Passing Interface (MPI) that solves a stated problem in a clean, robust, efficient, and scalable manner
    • Describe the SPMD programming model
    • Trace and create, compile, and run an MPI parallel program on a contemporary supercomputing cluster using PBS
    • Describe, trace, and implement programs that uses MPI's point-to- point blocking communications
    • Describe, trace, and implement programs that uses MPI's non-blocking communications
    • Describe, trace, and implement programs that uses collective communications
    • Describe, trace, and implement programs that use derived data types including vector derived data type and structure derived data types
    • Be able to use 3rd party libraries compatible with MPI to develop programs
  5. Describe and empirically demonstrate concepts of parallel efficiency and scalability
    • Describe the concepts of speedup, efficiency and scalability
    • Describe the analytical metrics of speedup, efficiency, and scalability
    • Identify efficient and scalable parallel programs (or algorithms) using asymptotic time complexities
    • Use a performance profiler to empirically measure and compare efficiency, scalability, and speedup metrics of parallel programs
    • Use profile data to improve speedup, efficiency, and scalability of a parallel program

Graduate students:

Students taking the course for graduate credit will be expect​ed to apply ​course ​concepts to solve computationally demanding problems​, analyze experimental results​,​ and draw inferences to verify hypotheses.