CS 473: Topics in Analysis of Algorithms (Fall 2003)

WF 2:00–3:15, 203 Transportation Building

Instructor: Jeff Erickson

Announcements · Overview · Schedule · Administrivia



More and more frequently, computer systems are called on to handle massive amounts of data, much more than can fit into memory. In this context, the cost of an algorithm is more aptly measured in terms of the amount of data transferred between slow memory (disks, for example) and fast memory, rather than the number of executed instructions. Similar (although somewhat less severe) issues arise in designing algorithms and data structures to exploit caches. In extreme cases, the data is not only too large to store, but also too large to read more than once.

This course will survey ongoing research into algorithms and data structures for massive data. The course material will be roughly divided into three categories:

Despite the relative novelty of this research area, there is already more than enough material to cover three or four courses. Students are strongly encouraged to suggest topics for future lectures or discussion!


Date Subject Refs Due
Fri 28 Aug Introduction: the standard external-memory model; upper and lower bounds for scanning (Θ(n)), searching (Θ(logB n) via B-trees), and sorting (Θ(n logm n) via mergesort); external comparison trees [AV88, BM72]
Wed 03 Sep Distribution sort; (a,b)-trees; buffer trees (insertion only) [AV88Arge03, BM72, Mehlhorn84]
Fri 05 Sep Buffered insertion sort [Arge03]
Wed 10 Sep
No class today
Fri 12 Sep Orthogonal line segment intersection in O(n logm n + k) I/Os: buffered dynamic range trees, distribution sweeping [Arge03, GTVV93]
Wed 17 Sep External segment trees and interval trees [AV96, KRVV96]
Fri 19 Sep Extended segment trees; red-blue segment intersection [AVV95]
Wed 24 Sep External priority search trees and 2d range trees [ASV99] hw1
Fri 26 Sep Lower bounds for 2d orthogonal range searching; pointer machines [Chazelle, Hellerstein et al.]
Wed 01 Oct Randomized external quicksort (hw1#3), Chernoff bounds, list ranking [Motwani+Raghavan]
Fri 03 Oct Optimal list ranking in O(n logm n) I/Os, general PRAM simulation, time-forward processing [CGGTVV95, Arge03]
Wed 08 Oct Undirected breadth-first search in O(V + e logm v) I/Os, directed breadth- and depth-first search in O(V lg v + e lg v + e logm e) I/Os [MR99, BGVW00]
Fri 10 Oct Minimum spanning trees in O(e logm e lg lg B) I/Os [KS96, ABT00]
Wed 15 Oct
Optimal planar graph blocking, planar separator theorem
[AAMVV99, LT79, Fred87]
Fri 17 Oct Contour-line extraction, persistence via node copying [AAMVV99, DSST89]
Wed 22 Oct No class this week—Jeff is at an external-memory workshop
Fri 24 Oct
Wed 29 Oct Intro to cache-oblivious algorithms: linear scan in O(n) I/Os, median selection in O(n) I/Os, static binary search in O(logB n) I/Os via the van Emde Boas layout, LRU vs optimal offline paging [FLPR99, D02, ST85]
Fri 31 Oct Funnelsort: Cache-oblivious sorting in O(n logm n) I/Os [FLPR99, BF02] hw2
Wed 05 Nov Funnel heaps: O((1/B) logm n) amortized I/Os per Insert or DeleteMin [ABDHM02, BF02]
Fri 07 Nov
No class today
Wed 12 Nov
Fri 14 Nov
Wed 19 Nov
Fri 21 Nov
Wed 26 Nov
Thanksgiving break
Fri 28 Nov
Wed 01 Dec
Fri 03 Dec
Wed 08 Dec
Fri 10 Dec
??? ?? Dec
Take-home final exams due


Whenever possible, the title of each work links to a free online copy. Journal/proceedings data links to the publisher's official web page for the paper; an institutional subscription may be required to access the paper itself.

*Stars indicate authors who were graduate students when that paper was written.
**Double stars indicate undergraduate authors.

Pankaj K. Agarwal, Lars Arge, T. M. Murali*, Kasturi R. Varadarajan*, and Jeffrey S. Vitter. I/O-efficient algorithms for contour-line extraction and planar graph blocking. Proc. ACM-SIAM Symposium on Discrete Algorithms, 117-126, 1999.
Lars Arge, Gerth Stølting Brodal, and Laura Toma*. On external-memory MST, SSSP, andd multi-way planar graph separation. Proc. Scandinavian Workshop on Algorithms and Theory, 433-447, 2000.
Lars Arge*. The buffer tree: A new technique for optimal I/O algorithms. Proc. 4th Workshop on Algorithms and Data Structures, 334-345, 1995. Lecture Notes in Computer Science 955, Springer-Verlag, Berlin.
Introduces buffer trees.
Lars Arge. The buffer tree: A technique for designing batched external data structures. Algorithmica 37:1-24, 2003.
Full version of [Arge95].
Lars Arge,Vasilis Samoladas*, and Jeffrey Scott Vitter. On two-dimensional indexability and optimal range search indexing. Proc. ACM Symposium on Principles of Database Systems, 346-357, 1999.
External priority search trees and 2d range trees.
Alok Aggarwal and Jeffrey Scott Vitter. The input/output complexity of sorting and related problems. Communications of the ACM 31(9):1116-1127, 1988.
Introduces the standard external-memory model; proves tight upper and lower bounds for searching, scanning, sorting, FFTs, matrix inversion, and other problems.
Lars Arge* and Jeffrey Scott Vitter. Optimal dynamic interval management in external memory. Proc. IEEE Symposium on Foundations of Computer Science, 560-569, 1996.
Introduces external interval trees and external segment trees; the main contribution is the idea of multislab lists.
Lars Arge*, Darren E. Vengroff*, and Jeffrey Scott Vitter. External-memory algorithms for processing line segments in geographic information systems. Proc. ESA '95. Algorithmica, to appear.
Optimal algorithms for endpoint-dominance, red-blue segment intersection, batched planar point location, and the like, using a combination of distribution sweeping and buffer trees.
Adam Buchsbaum, Michael Goldwasser, Suresh Venkatasubramanian, and Jeffrey R. Westbrook. On external-memory graph traversal. Proc. ACM-SIAM Symposium on Discrete Algorithms, 859-860, 2000.
Rudolf Bayer and Edward M. McCreight. Organization and maintenance of large ordered indexes. Acta Informatica 1:173-189, 1972.
Introduces B-trees.
Yi-Jen Chiang*, Michael T. Goodrich, Eddie F. Grove, Roberto Tamassia, Darren E. Vengroff*, and Jeffrey S. Vitter. External-memory graph algorithms. Proc. ACM-SIAM Symposium on Discrete Algorithms, 139--149, 1995.
Michael T. Goodrich, J.-J. Tsay*, Darren E. Vengroff*, and Jeffrey Scott Vitter. External-memory computational geometry. Proc. IEEE Symposium on Foundations of Computing, 714-723, 1993.
Introduces distribution sweeping.
Donald E. Knuth. The Art of Computer Programming, Volume 3: Sorting and Searching, second edition. Addison-Wesley, 1998.
Chapter 5.4 discusses external sorting in Knuth's usual exquisite/excruciating detail, but not in our standard external-memory model. Chapters 6.2.3 and 6.2.4 discuss B-trees and their variants: B+-trees, B*-trees, (a,b)-trees, etc.
Paris C. Kannelakis, Sridhar Ramaswamy, Darren E. Vengroff*, and Jeffrey Scott Vitter. Indexing for data models with constraints and classes. Journal of Computer and System Sciences 52(3):589-612, 1996.
Introduces metablock trees, which are used as underflow structures in external segment and interval trees in [AV96].
Vijay Kumar and Eric Schwabe. Improved algorithms and data structures for solving graph problems in external memory. Proc. IEEE Symposium on Parallel and Distributed Processing, 167-177, 1996.
Kurt Mehlhorn. Data Structures and Algorithms 1: Sorting and Searching. EATCS Monographs on Theoretical Computer Science, Springer-Verlag, 1984.
Chapter III 5.2 describes and analyzes (a,b)-trees.
Kamesh Munagala and Abhiram Ranade. I/O-complexity of graph algorithms. Proc. ACM-SIAM Smposium on Discrete Algorithms, 687-694, 1999.


Instructor: Jeff Erickson (jeffe@cs.uiuc.edu)
Tentative office hours: Thursday 1:30-3:30, at Green Street Coffee House
Course website: http://www.cs.uiuc.edu/~jeffe/teaching/473/
Course newsgroup: uiuc.class.cs473
This is an advanced graduate-level algorithms class, aimed primarily (but not exclusively!) at students interested in algorithms research. Students are expected to have a solid grounding in basic algorithm design and analysis techniques. A minimal prerequisite is an introductory graduate-level algorithms course like CS 373. Mathematically and/or algorithmically mature undergraduates are welcome.
Grades will be based on 4-5 homeworks and a take-home final. Students can substitute a survey/implementation/research project for the take-home final with my permission and the understanding that my standards for course projects are unreasonable.
Reading Material:
There is no textbook for this class. The lecture material will be taken almost entirely from recent conference and journal papers; I will provide pointeds to electronic copies of these papers whenever possible. However, the following survey papers are highly recommended:
See also course syllabi/notes/slides by Lars Arge (I/O efficient algorithms), Gerth Stølting Brodal (External memory algorithms and data structures), S. Muthukrishnan and Graham Cormode (Seminar on processing massive data sets), and the EEF summer school on massive data sets.

Jeff Erickson (jeffe@cs.uiuc.edu) 17 Sep 2003