More and more frequently, computer systems are called on to handle massive amounts of data, much more than can fit into memory. In this context, the cost of an algorithm is more aptly measured in terms of the amount of data transferred between slow memory (disks, for example) and fast memory, rather than the number of executed instructions. Similar (although somewhat less severe) issues arise in designing algorithms and data structures to exploit caches. In extreme cases, the data is not only too large to store, but also too large to read more than once.
This course will survey ongoing research into algorithms and data structures for massive data. The course material will be roughly divided into three categories:
Despite the relative novelty of this research area, there is already more than enough material to cover three or four courses. Students are strongly encouraged to suggest topics for future lectures or discussion!
Date  Subject  Refs  Due 
Fri 28 Aug  Introduction: the standard externalmemory model; upper and lower bounds for scanning (Θ(n)), searching (Θ(log_{B} n) via Btrees), and sorting (Θ(n log_{m} n) via mergesort); external comparison trees  [AV88, BM72]  
Wed 03 Sep  Distribution sort; (a,b)trees; buffer trees (insertion only)  [AV88, Arge03, BM72, Mehlhorn84]  
Fri 05 Sep  Buffered insertion sort  [Arge03]  
Wed 10 Sep 
No class today


Fri 12 Sep  Orthogonal line segment intersection in O(n log_{m} n + k) I/Os: buffered dynamic range trees, distribution sweeping  [Arge03, GTVV93]  
Wed 17 Sep  External segment trees and interval trees  [AV96, KRVV96]  
Fri 19 Sep  Extended segment trees; redblue segment intersection  [AVV95]  
Wed 24 Sep  External priority search trees and 2d range trees  [ASV99]  hw1 
Fri 26 Sep  Lower bounds for 2d orthogonal range searching; pointer machines  [Chazelle, Hellerstein et al.]  
Wed 01 Oct  Randomized external quicksort (hw1#3), Chernoff bounds, list ranking  [Motwani+Raghavan]  
Fri 03 Oct  Optimal list ranking in O(n log_{m} n) I/Os, general PRAM simulation, timeforward processing  [CGGTVV95, Arge03]  
Wed 08 Oct  Undirected breadthfirst search in O(V + e log_{m} v) I/Os, directed breadth and depthfirst search in O(V lg v + e lg v + e log_{m} e) I/Os  [MR99, BGVW00]  
Fri 10 Oct  Minimum spanning trees in O(e log_{m} e lg lg B) I/Os  [KS96, ABT00]  
Wed 15 Oct 
Optimal planar graph blocking, planar separator theorem

[AAMVV99, LT79, Fred87]  
Fri 17 Oct  Contourline extraction, persistence via node copying  [AAMVV99, DSST89]  
Wed 22 Oct  No class this week—Jeff is at an externalmemory workshop  
Fri 24 Oct  
Wed 29 Oct  Intro to cacheoblivious algorithms: linear scan in O(n) I/Os, median selection in O(n) I/Os, static binary search in O(log_{B} n) I/Os via the van Emde Boas layout, LRU vs optimal offline paging  [FLPR99, D02, ST85]  
Fri 31 Oct  Funnelsort: Cacheoblivious sorting in O(n log_{m} n) I/Os  [FLPR99, BF02]  hw2 
Wed 05 Nov  Funnel heaps: O((1/B) log_{m} n) amortized I/Os per Insert or DeleteMin  [ABDHM02, BF02]  
Fri 07 Nov 
No class today


Wed 12 Nov  
Fri 14 Nov  
Wed 19 Nov  
Fri 21 Nov  
Wed 26 Nov 
Thanksgiving break


Fri 28 Nov  
Wed 01 Dec  
Fri 03 Dec  
Wed 08 Dec  
Fri 10 Dec  
??? ?? Dec 
Takehome final exams due

Whenever possible, the title of each work links to a free online copy. Journal/proceedings data links to the publisher's official web page for the paper; an institutional subscription may be required to access the paper itself.
*Stars indicate authors who were graduate students when that paper was written.
**Double stars indicate undergraduate authors.