Introducing Computer Vision.

...studies how to reconstruct, interpret and understand a 3D scene from its 2D images...

Step on Computer Vision

The technology concerned with computational understanding and use of the information present in visual images. In part, computer vision is analogous to the transformation of visual sensation into visual perception in biological vision. For this reason the motivation, objectives, formulation, and methodology of computer vision frequently intersect with knowledge about their counterparts in biological vision. However, the goal of computer vision is primarily to enable engineering systems to model and manipulate the environment by using visual sensing.

Computer vision begins with the acquisition of images. A camera produces a grid of samples of the light received from different directions in the scene. The position within the grid where a scene point is imaged is determined by the perspective transformation. The amount of light recorded by the sensor from a certain scene point depends upon the type of lighting, the reflection characteristics and orientation of the surface being imaged, and the location and spectral sensitivity of the sensor.

One central objective of image interpretation is to infer the three-dimensional (3D) structure of the scene from images that are only two-dimensional (2D). The missing third dimension necessitates that assumptions be made about the scene so that the image information can be extrapolated into a three-dimensional description. The presence in the image of a variety of three-dimensional cues is exploited. The two-dimensional structure of an image or the three-dimensional structure of a scene must be represented so that the structural properties required for various tasks are easily accessible. For example, the hierarchical two-dimensional structure of an image may be represented through a pyramid data structure which records the recursive embedding of the image regions at different scales. Each region's shape and homogeneity characteristics may themselves be suitably coded. Alternatively, the image may be recursively split into parts in some fixed way (for example, into quadrants) until each part is homogeneous. This approach leads to a tree data structure. Analogous to two dimensions, the three-dimensional structures estimated from the imaged-based cues may be used to define three-dimensional representations. The shape of a three-dimensional volume or object may be represented by its three-dimensional axis and the manner in which the cross section about the axis changes along the axis. Analogous to the two-dimensional case, the three-dimensional space may also be recursively divided into octants to obtain a tree description of the occupancy of space by objects.

A second central objective of image interpretation is to recognize the scene contents. Recognition involves identifying an object based on a variety of criteria. It may involve identifying a certain object in the image as one seen before. A simple example is where the object appearance, such as its color and shape, is compared with that of the known, previously seen objects. A more complex example is where the identity of the object depends on whether it can serve a certain function, for example, drinking (to be recognized as a cup) or sitting (to be recognized as a chair). This requires reasoning from the various image attributes and the derivative three-dimensional characteristics to assess if a given object meets the criteria of being a cup or a chair. Recognition, therefore, may require extensive amounts of knowledge representation, reasoning, and information retrieval. Visual learning is aimed at identifying relationships between the image characteristics and a result based thereupon, such as recognition or a motor action.

In manufacturing, vision-based sensing and interpretation systems help in automatic inspection, such as identification of cracks, holes, and surface roughness; counting of objects; and alignment of parts. Computer vision helps in proper manipulation of an object, for example, in automatic assembly, automatic painting of a car, and automatic welding. Autonomous navigation, used, for example, in delivering material on a cluttered factory floor, has much to gain from vision to improve on the fixed, rigid paths taken by vehicles which follow magnetic tracks prelaid on the floor. Recognition of symptoms, for example, in a chest x-ray, is important for medical diagnosis. Classification of satellite pictures of the Earth's surface to identify vegetation, water, and crop types, is an important function. Automatic visual detection of storm formations and movements of weather patterns is crucial for analyzing the huge amounts of global weather data that constantly pour in from sensors.

This page is a citation from McGraw-Hill Concise Encyclopedia of Engineering. © 2002 by The McGraw-Hill Companies, Inc.