More and more frequently, computer systems are called on to handle massive amounts of data, much more than can fit into memory. In this context, the cost of an algorithm is more aptly measured in terms of the amount of data transferred between slow memory (disks, for example) and fast memory, rather than the number of executed instructions. Similar (although somewhat less severe) issues arise in designing algorithms and data structures to exploit caches. In extreme cases, the data is not only too large to store, but also too large to read more than once.
This course will survey ongoing research into algorithms and data structures for massive data. The course material will be roughly divided into three categories:
Despite the relative novelty of this research area, there is already more than enough material to cover three or four courses. Students are strongly encouraged to suggest topics for future lectures or discussion!
Date | Subject | Refs | Due |
Fri 28 Aug | Introduction: the standard external-memory model; upper and lower bounds for scanning (Θ(n)), searching (Θ(logB n) via B-trees), and sorting (Θ(n logm n) via mergesort); external comparison trees | [AV88, BM72] | |
Wed 03 Sep | Distribution sort; (a,b)-trees; buffer trees (insertion only) | [AV88, Arge03, BM72, Mehlhorn84] | |
Fri 05 Sep | Buffered insertion sort | [Arge03] | |
Wed 10 Sep |
No class today
|
||
Fri 12 Sep | Orthogonal line segment intersection in O(n logm n + k) I/Os: buffered dynamic range trees, distribution sweeping | [Arge03, GTVV93] | |
Wed 17 Sep | External segment trees and interval trees | [AV96, KRVV96] | |
Fri 19 Sep | Extended segment trees; red-blue segment intersection | [AVV95] | |
Wed 24 Sep | External priority search trees and 2d range trees | [ASV99] | hw1 |
Fri 26 Sep | Lower bounds for 2d orthogonal range searching; pointer machines | [Chazelle, Hellerstein et al.] | |
Wed 01 Oct | Randomized external quicksort (hw1#3), Chernoff bounds, list ranking | [Motwani+Raghavan] | |
Fri 03 Oct | Optimal list ranking in O(n logm n) I/Os, general PRAM simulation, time-forward processing | [CGGTVV95, Arge03] | |
Wed 08 Oct | Undirected breadth-first search in O(V + e logm v) I/Os, directed breadth- and depth-first search in O(V lg v + e lg v + e logm e) I/Os | [MR99, BGVW00] | |
Fri 10 Oct | Minimum spanning trees in O(e logm e lg lg B) I/Os | [KS96, ABT00] | |
Wed 15 Oct |
Optimal planar graph blocking, planar separator theorem
|
[AAMVV99, LT79, Fred87] | |
Fri 17 Oct | Contour-line extraction, persistence via node copying | [AAMVV99, DSST89] | |
Wed 22 Oct | No class this weekJeff is at an external-memory workshop | ||
Fri 24 Oct | |||
Wed 29 Oct | Intro to cache-oblivious algorithms: linear scan in O(n) I/Os, median selection in O(n) I/Os, static binary search in O(logB n) I/Os via the van Emde Boas layout, LRU vs optimal offline paging | [FLPR99, D02, ST85] | |
Fri 31 Oct | Funnelsort: Cache-oblivious sorting in O(n logm n) I/Os | [FLPR99, BF02] | hw2 |
Wed 05 Nov | Funnel heaps: O((1/B) logm n) amortized I/Os per Insert or DeleteMin | [ABDHM02, BF02] | |
Fri 07 Nov |
No class today
|
||
Wed 12 Nov | |||
Fri 14 Nov | |||
Wed 19 Nov | |||
Fri 21 Nov | |||
Wed 26 Nov |
Thanksgiving break
|
||
Fri 28 Nov | |||
Wed 01 Dec | |||
Fri 03 Dec | |||
Wed 08 Dec | |||
Fri 10 Dec | |||
??? ?? Dec |
Take-home final exams due
|
Whenever possible, the title of each work links to a free online copy. Journal/proceedings data links to the publisher's official web page for the paper; an institutional subscription may be required to access the paper itself.
*Stars indicate authors who were graduate students when that paper was written.
**Double stars indicate undergraduate authors.