Posted by Azriel Rosenfeld (ar@cfar.umd.edu) to compgeom-discuss, May 20, 1996:

Comments on Section 5 ("Computer Vision") of the CG Impact Task Force report

by Azriel Rosenfeld (ar@cfar.umd.edu)

Avis and Fukuda recently posted some critical comments on the report, particularly as regards the unrealisticness of the theoretical model used in CG. This impels me to make a few comments about the section of the report that deals with applications in my field - computer vision. I would encourage readers in related fields (image processing, GIS, etc.) to consider doing likewise.

Computer vision (CV) does indeed make heavy use of pattern matching techniques, both in model-based object recognition and in 3D reconstruction (stereopsis, structure from motion). Obviously, CG has a lot to tell CV about the matching of geometric (e.g., point) patterns; but the scope of applicability of CG algorithms is limited by the fact that in CV, the patterns are of bounded sizes - relatively sparse sets of "feature points" detected in digital images of fixed dimensions, on the order of 1000 x 1000 pixels. (Image sensor resolution grows very slowly over the years; TV stayed at 500 lines for a long time. Note that the human eye too has limited spatial resolution, and the high-acuity part of the retina produces images of essentially TV resolution.

This also limits the sizes of other representations of an image; the boundary of a region (e.g., represented by a "chain code") typically has (integer) length proportional to the region's diameter, which is bounded by the image diameter. The numbers get bigger if we deal with (discretely sampled, e.g. 30 frames/second) time-varying imagery, which might provide challenges for dynamic CG; but here too, frame rates don't grow arbitrarily, since the current rates are good enough to capture/mimic "continuous" motion as far as humans are concerned.

Image representation is a topic common to image processing and computer vision; the examples given in this subsection are illustrations of the first, not of the second. Incidentally, digital images are obtained by regular sampling at grid points, not by sampling at "arbitrary" points; the "weighted Voronoi diagram" suggestion is an overkill for standard bilinear interpolation of the grid. Perhaps the suggested solution is worthy of the problem; "resolution enhancement" of a single digital image is a thankless task - you can't create information that isn't there. "Magnification" might have been a more realistic term.

Optimization methods of image segmentation are also quite extensively used in CV; possibly the CG community has ideas in this area that haven't occurred to us, but the reverse is even more likely. In any case, segmentation by optimization is a solution in search of a problem; as stated in the first sentence of this subsection, a real image needs to be segmented into parts that correspond to different objects in the scene, not into an "optimal" set of regions.

In my opinion, there are good reasons for the often-lamented lack of interaction between CG and CV. The CG community is welcome to look for geometric problems that CV appears to suggest, but they shouldn't expect that the CV community will rush to adopt their solutions.