CS 395T: Real-Time Graphics Architectures,
Algorithms, and Programming Systems

Spring 2003


Tuesdays and Thursdays, 2:00-3:30pm


PAI 5.60

UT Unique #: 52089


Bill Mark, email:
Office Hours: Th 3:30-4:45 in ACES 2.118






Choose Presentation Topics

Jan. 28th


Benchmarking Graphics Hardware

Feb. 25th


Choose Final Project Topic

March 20- list tentative topic(s)
March 27- one page project summary


Final Project

April 15 - informal status report
April 29 or May 1 - presentation to class
May 2 - final report


Reading and discussion oriented, with one assignment and a final project.


An undergraduate computer architecture course.  A previous computer graphics course would be very helpful, but is not required (I will quickly review the relevant material).  Good programming skills or a very tolerant project partner are a must.  I will attempt to adjust the course somewhat to match the typical background of attendees, but given the variety of material covered in the course, it is expected that students will take initiative on their own to fill gaps in background knowledge.


The raw compute performance of a PC’s graphics chip (GPU) is now greater than that of the CPU.  The GPU has recently become programmable, so it’s now reasonable to think of the GPU as a programmable single-chip parallel computer.  This seminar will cover recent developments in graphics architectures and programming systems, and will explore related topics from general-purpose parallel computation.  The seminar will also examine the connection between the algorithms used for real-time graphics, and the architectures that are chosen to support them.

Topics to be covered include but are not limited to:

• Review of basic rendering algorithms – Z-buffer and ray tracing

• Architecture of standard 3D graphics hardware pipeline, and relevant algorithms

Programmable graphics hardware – current architectures and programming systems

• Algorithms for shadow generation

• Ray tracing architectures

• Single-chip ‘general-purpose’ parallel architectures –
stream processing, MIMD multiprocessor, etc.

• Parallel programming languages and systems – Cilk, stream languages, etc.

The seminar covers topics from a broad variety of areas, but there is a strong unifying theme:  We are discussing the ideas and topics that the instructor believes will be important for the design of future real-time graphics systems, and quite possibly for the design of more general-purpose highly-parallel single-chip computation engines.

Note that one topic this course won’t cover is the unpublished details of NVIDIA’s graphics architectures.  Although I used to work at NVIDIA, I can’t provide any information that isn’t public already.

All students in this course will be expected to read all the assigned research papers. In addition, each student will be required to present one or more (depending on enrollment) topics to the class. This will typically require presenting background material on the topic, summarizing a small number of papers, and leading a class discussion on the topic. Students will also complete one small assignment on probing the capabilities of graphics hardware.

Finally, there will be an open-ended project that can relate to any of the material covered in this course (or really any material at all with instructor's permission). The project may be done in small groups. Although these projects need not show new results, they will be "publication quality"; that is, students will be expected to explore a topic in sufficient depth (including a high quality write-up) that it could be published.  Each group will present its results to the class at the end of the semester. Ambitious students are encouraged to use this project as an opportunity to begin new research projects, and I will be happy to assist any group wishing to continue their project after the semester ends and submit their work for publication.




There are a variety of software packages that might be useful starting places for projects.
I'm pointing you to these packages for informational purposes only; you're not required to use them or learn about them.


All code and documentation should be entirely your own work. You may consult with other students about high-level design strategies, but you many not copy code or use the structure or organization of another students program. Said another way, you may talk with one another about your programs, but you cannot ever look at another student's code nor let another student look at your own code.  Obviously, you may collaborate freely with your project partner.


Grading will be determined based on four criteria: class participation, the benchmarking assignment, presentation quality/discussion leadership, and the final project.  Of these, your project is the most important, followed by your presentation.  Extra credit will be awarded depending on how "novel" your project is.  Risky projects will be rewarded, but remember that you still have to turn in a writeup at the end of the semester, even if the writeup is called "Several reasons why our fantastic new algorithm for cache management was a terrible idea".


Most class meeting have two required readings, typically published papers.
The readings will be put online (usually just as hyperlinks) or handed out in class at least one week in advance of the class in which they are discussed.
Most of the online readings are just links to papers in the ACM or IEEE digital libraries. UT has a subscription to these
services, so if you are using an on-campus computer, you should have no trouble getting the PDF's of the papers.
I f you using an off-campus computer, see this web page for information on how to configure your browser for digital library access.




Introduction and Background

January 14

(Bill Mark)


January 16

Background – Z-buffer rendering #1
(Bill Mark)

Prosie, "How Computer Graphics Work",
pp. 107-133, 148-9. [Handout]

Watt & Watt,
"Advanced Animation & Rendering Techniques", pp. 3-29.
(for "clipping algorithm" and "phong shading", just skim)

Akenine-M. & Haines, "Real-Time Rendering", pp. 117-144.
(focus on section 5.2) [Handout]

January 21

Background – Z-buffer rendering #2
(Bill Mark)

" Pyramidal Parametrics", Lance Williams, SIGGRAPH 83.

"A Language for Shading and Lighting Calculations",
Hanrahan and Lawson, SIGGRAPH 90.

January 23

Background – VLSI trends
(Bill Mark)

Excerpt from Chapter 1 of "Digital Systems Engineering",
Dally and Poulton, pp. 12-21. [Handout]

"Billion Transistor Architectures",
Burger and Goodman, IEEE Computer, Sept. 1997.

"Will Physical Scalability Sabotage Performance Gains?",
Matzke, IEEE Computer, Sept. 1997.

Mainstream Z-buffer Architectures and Algorithms

January 28

Sort-middle Z-buffer architecture
(Bill Mark)

"InfiniteReality: A real-time graphics system",
Montrym, Baum, Dignam, and Migdal, SIGGRAPH 1997.

"Neon: A Single-Chip 3D Workstation Graphics Accelerator",
McCormack, McNamara, Gianos, and Jouppi, GrHW 1998.

Chapter 15 from "Real-Time Rendering", by Moller and Haines,
pp 669-708. [Handout]

Background reading (not required, but possibly useful):
"Reality Engine graphics", Kurt Akeley

January 30

Texture caching and latency
(Paul Navratil)

"The Design and Analysis of a Cache
Architecture for Texture Mapping
Hakura and Gupta, ISCA 1997.

"Prefetching in a Texture Cache Architecture",
Igehy, Eldridge and Proudfoot, GrHW 1998.

Background readings (not required):
"Texture mapping polygons in perspective", Heckbert.
"Interpolation for polygon texture mapping and shading",
Heckbert and Moreton.

February 4

Texture compression
(Qiu Wu)

"Rendering from Compressed Textures",
Beers, Agrawala and Chaddha, SIGGRAPH 1996.

"DXT1 Texture Compression", Imagination Technologies,
web page. (note: not an 'original' reference, but has good explanation).

February 6

The framebuffer
(Aaron Smith)

"A Configurable Pixel Cache for Fast Image Generation",
Gorris et al., IEEE Computer Graphics & Applications, 1987.

Moller and Haines ("Real-Time Rendering", 2nd edition),
Section 4.4, "Aliasing and Anti-Aliasing",
pp. 84-101. [Handout]

February 11

Programmable hardware #1:
Basic hardware capabilities
(Jason Dale)

"Parallel computers for graphics applications",
Levinthal et al., ASPLOS 1987, pp. 193-198

"A user-programmable vertex engine",
Lindholm et al., SIGGRAPH 2001, pp. 149-158.

February 13

Programmable hardware #2:
Current programming languages
(Navendu Jain)

"Real-Time Programmable Shading", excerpt from "Texturing
and Modeling: A Procedural Approach", Ebert et al, pp. 97-121.

"Introduction to the Cg Language", pp. 1-18 (PDF pp. 21-38),
and intro to "Using the Cg Runtime Library", pp. 29-32 (PDF pp. 49-52)
from "Cg toolkit user's manual", NVIDIA, December 2002.

Hanrahan and Lawson paper from Jan 21, if you didn't
read it already.

February 18

Surface displacement, tesselation,
and subdivision
(Ikrim Elhassan)

"The Reyes image rendering architecture", Cook et al.,

"Curved PN triangles", Vlachos, Peters, Boyd, and Mitchell,
Symposium on Interactive 3D Graphics, 2001

February 20

Shadow algorithms
(Ikrim Elhassan)

"Shadow algorithms for computer graphics", Frank Crow,
(focus on section titled "The Third Class: Projected Shadow Volumes)

"Real Shadows Real Time", Heidmann, IRIS Universe, 1991

"Casting curved shadows on curved surfaces", Lance Williams,

February 25

Sorting taxonomy and
alternative architectures
(Aaron Smith)

"A sorting classification of parallel rendering", Molnar et al.,
IEEE Computer Graphics and Applications, July 1994.

Talisman SIGGRAPH96 paper (PDF).

Background readings (not required):
"Architectural implications of hardware-accelerated bucket
rendering on the PC
", Cox and Bhandari, Graphics Hardware 1997.

"Pomegranate: A Fully Scalable Graphics Architecture",
Eldridge, Igehy, and Hanrahan, SIGGRAPH 2000.

Introduction to parallel programming

February 27

Overview of parallel programming
and languages
(Bill Mark)

"Introduction to Parallel Processing",
Chapter 1 from "Practical Parallel Rendering",
Chalmers, David, and Reinhard (eds), A.K. Peters, 2002. [Handout]

"Parallel Languages",
Chapter 29 from "Parallel and Distributed Computing Handbook",
Zomaya (ed), McGraw-Hill, 1996. [Handout]

"Models and languages for parallel computation", Skillicorn and Talia,
ACM Computing Surveys, June 1998.
only read sections 1 through 4.0
(pp. 123-136 in the journal's page numbering)

March 4

Models of parallel computation
(Jason Chaw)

"Models of parallel computation: a survey and synthesis" , Maggs et al,
HICSS 1995.

"LogP: towards a realistic model of parallel computation", Culler et al.,
ASPLOS 1993.

March 6

Parallel graphics APIs
(Greg Johnson)

"IRIS performer: A high performance multiprocessing
toolkit for real-time 3D graphics
Rohlf and Helman, SIGGRAPH 1994.

"The design of a parallel graphics interface", Igehy, Stoll, and Hanrahan,

March 11-13



Parallel computer architectures and languages

March 18

Stream processors
(Jason Dale)

"The Harvest System", Herwitz and Pomerene, [HANDOUT]
Proc. Western Joint Computer Conference, 1960, pp. 23-32
only need to read first 4 pages, up to "programming harvest" heading.

"Imagine: Media processing with streams", Khailany, Dally, et al,
IEEE Micro, March/April 2001

"Efficient conditional operations for data-parallel architectures",
Kapasi, Dally, et al., proc. MICRO 2000.

March 20

Graphics on a stream processor
(Peter Djeu)

"Polygon rendering on a stream architecture",
Owens, Dally, et al., GrHW 2000.

"Comparing Reyes and OpenGL on a stream architecture",
Owens, Dally, et al., GrHW 2002

March 25

Network processors:
stream/multithread hybrids
(Bill Mark,
in place of cancelled guest lecture)

"The next generation of Intel IXP network processors" ,
Adiletta et al., Intel Technology Journal, August 2002.

"EPF Sees More iFlow Info:
Silicon Access Details iPP Packet Processor",
Microprocessor Report, May 2002, pp. 14-15.

"Inside the iFlow 20Gbps Packet Processor," Mike O'Connor,
in Proceedings of 2002 Embedded Processor Forum,
San Jose, CA, May 2002. [Handout]

March 27

Brook data-parallel language
(Guest lecture by Ian Buck)

"Data parallel algorithms", Hillis and Steele,
Comm. ACM, December 1986.

"Data Parallel Computation on Graphics Hardware",
Ian Buck and Pat Hanrahan,
conference submission, Jan 2003. [Handout; do not distribute]

Additional reading (not required):
"ZPL's WYSIWYG Performance Model", Chamberlain et al.,
3rd Intl. Workshop on High-Level Parallel Programming Models
and Supportive Environments, pp. 50-61, 1998.

April 1

Shared memory
(Jason Chaw)

"Cache-coherent distributed shared memory: perspectives
on its development and future challenges
" , Hennessy and Gupta,
Proc. of IEEE, March 1999.

"Broadcom Calisto:
a multi-channel multi-service communications platform
Nickolls et al., Hot Chips 14, July 2002.

April 3

plus Hillis and Steele paper
(Greg Johnson)

"The Tera computer system", Alverson et al.,
Proc. Supercomputing 1990.

"Exploiting heterogeneous parallelism on a multithreaded processor",
Alverson et al., Proc. Supercomputing 1992.

"Two fundamental limits on dataflow multiprocessing", Culler et al.,
Proc. IFIP Conf. on arch and comp. techniques for ... parallelism, 1993

April 8

M-Machine and GRIDS
(Navendu Jain)

"Exploiting fine-grain thread level parallelism on the MIT multi-ALU processor",
Keckler, Dally, et al., MICRO 1998.

"A design space evaluation of grid processor architectures",
Nagarajan et al., MICRO 2001.

Background reading (not required):
"The M-machine multicomputer", Marco et al., MICRO 1995.

algorithms and architectures

April 10

Background – raytracing algorithms
(Bill Mark)

Excerpts from "An Introduction to Ray Tracing", Glassner (ed), 1989.
Chapter 1 (Overview) -- pp. 1-29 (focus on pp. 1-17).
Portions of Chapter 6 (Acceleration Techniques)-- pp. 201-211, 217-226.

April 15

Memory management and
(Paul Navratil)

"Rendering complex scenes with memory-coherent ray tracing",
Pharr et al., SIGGRAPH 1997.

pp. 99-109 from "Practical Parallel Rendering", Chalmers et al.,
A.K. Peters, 2002. [Handout]

April 17

Interactive raytracing #1
(Jason Dale)

"SaarCOR -- A hardware architecture for ray tracing",
Schmittler, Wald, and Slusallek, Graphics Hardware 2002.

"Ray tracing on programmable graphics hardware",
Purcell et. al., SIGGRAPH 2002.

April 22

Interactive raytracing #2
(Peter Djeu)

"Interactive ray tracing", Parker et al., Proc. I3D 1999.

"State of the art in interactive ray tracing",
Wald and Slusallek, Eurographics 2001.

April 24

Dynamic scenes
(Paul Navratil)

"Towards rapid reconstruction for animated ray tracing",
Lext and Akenine-Moller, Eurographics 2001.

"Parallel tree buildling on a range of shared address space multiprocessors:
algorithms and application performance
", Shan and Singh,
Proc. IPPS/SPDP 1998.

Related reading (not required):
"A simple and practical method for interactive ray tracing of dynamic scenes",
Wald, Benthin and Slusallek, Technical Report, 2002.

Final project presentations

April 29

Project Presentations

1) Greg Johnson and Ikrima Elhassan
2) Aaron Smith "MPEG decoder" [PPT]
3) Jason Dale , "Considerations for an 'UltraDisplay'" [PS]

May 1

Project Presentations

1) Paul Navratil, "Compiler Assisted Optimization for Graphics"
2) Navendu Jain and Jason Chaw, "Enhancing GPU for Scientific Computing"
3) Peter Djeu


While assembling the reading list for this class, I drew in part on syllabi assembled by Pat Hanrahan and Kurt Akeley for courses on graphics architectures and advanced rendering, by Greg Humphreys for a class on "Big data in computer graphics", and by Bill Dally for a class on stream processor architectures. I based the format of the syllabus document on a format developed by Greg Humphreys and David Luebke.


© 2003, William R. Mark