Visual Scene Analysis

Anderson (1990) noted, "A host of concepts has been taken from computer science and used in psychological theories." This is certainly true in the area of visual scene analysis . It is a prime example of artificial intelligence research influencing scientists' understanding of human cognitive processes.

Visual scene analysis is a computer's version of visual perception. To analyze a visual scene, the computer must identify objects and relationships between objects, labeling each correctly. It must be able to answer questions about the scene. If provided with a robotic arm or graphic equivalent, it must be able to manipulate objects in response to commands such as, "Place the triangular block on the long, thin block." These performances (labeling, answering questions, manipulating) serve as behavioral proof that the program "understands" a scene.

When do researchers say a computer "understands" a scene?

What was the "block world"?

How does a computer analyze a scene? In classic 1960s work, a team at MIT (Massachusetts Institute of Technology) tried to teach a computer to recognize various arrangements of a simple block world. This world consisted of a computer representation of blocks on a table. To interpret the scene, the computer had to assign a meaning or interpretation to each line in the scene ; for example, it had to know that one line represented the edge of a shadow, another represented the edge of a block facing the viewer, and so forth. When every line in the picture was accurately labeled, the picture was said to be understood (Waltz, 1975).

The MIT "block world"

Initially researchers tried to ignore shadows, figuring they were an unnecessary complication to the task. However, as it turned out that shadows provided important clues. Shadows helped to identify objects and relative positions of objects. So shadows were included in the block world.

Why did the program include shadows? What other features proved to be critically important?

The MIT team found lines, edges, and corners to be critical features of a visual scene. First the computer isolated lines and edges (areas of sudden contrast in the visual scene). Then the computer followed lines to corners where they intersected or bumped into other lines. Each line segment could be interpreted 11 different ways (as an outer edge of an object, the edge of a shadow, and so forth). Before the computer could successfully interpret the scene, it had to pick one of the 11 meanings to assign to each line segment in the scene.

What is an "arrow" vertex? A "fork"?

Four types of vertex identified by Guzman (1969)

Junctions of two or more lines are called vertexes (or vertices). Each vertex represents a corner of an object or a place where one object (or a shadow) cuts in front of another. The figure above shows four types of vertexes identified by Guzman (1969).

To interpret a visual scene, the computer has to assign a meaning to each vertex as well as each line segment. Consider the vertex called an arrow. In the following diagram are two arrows, each marked with a dot at its tip.

Upward and downward arrow vertices

An arrow can be an upward pointing inner corner (first diagram) or a downward pointing outer corner, (second diagram). But it cannot be both at once. If the computer decides (based on other information such as shadows) that the arrow on the right is an outer edge of a small block sitting on a larger block, then the arrow can only be a downward pointing outer corner. Once the decision is made, this vertex is interpreted. That helps the computer interpret other parts of the scene.

Write to Dr. Dewey at

Don't see what you need? Psych Web has over 1,000 pages, so it may be elsewhere on the site. Do a site-specific Google search using the box below.

Custom Search

Copyright © 2007 Russ Dewey