In situ Behavior of Deep-Sea Animals: 3-D Videography from Submersibles and Motion Analysis

by Peggy P. Hamner and William M. Hamner

Movements of animals in three-dimensional (3-D) space cannot be measured accurately with standard two-dimensional (2-D) photographic systems. To understand the behavior of individual animals and their interactions in a three-dimensional medium, it is absolutely essential to use three-dimensional videography. We describe a 3-D video system for use on either manned submersibles or ROVs, and provide an example from the deep sea that demonstrates the necessity for three-dimensional measurements. Animals recorded simultaneously on two videotapes as they swim within a calibrated volume of water in front of the submersible are automatically tracked though time and in three-dimensional space with a commercially available 3-D tracking system.

Rationale for 3-D Measurement

One obviously important facet of every organism's existence is its patterns of movement within its habitat, both in space and through time. Ethologists routinely record animal behavior optically in the field for later analysis. However, the usual single camera view provides only a two-dimensional image of behavioral phenomena that often occur in three-dimensional space, particularly if the animal moves through the air or through water. Tracking undisturbed animals in 3-D provides far more accurate information about movements than do estimates derived from 2-D measurements. We have developed and tested a two-camera video system which can be mounted on a submersible to record objects in the deep sea for subsequent automated 3-D analysis of behavior.

Video System for a Submersible

We use two monochrome video cameras corrected for underwater parallax (Walton, 1988). The cameras are fitted with fixed-focus lenses and wide angle adapters and are permanently mounted in custom housings. Wide angle adapters are necessary because the cameras on the submersible toe in toward each other so that their fields of view intersect approximately 2 m in front of the submersible. Each camera's field of view must be as wide as possible because images outside the volume of water viewed by both cameras are recorded on only one videotape, and are useless for 3-D analysis.

The camera housings are rigidly mounted on either side of the submersible because the positions of the cameras relative to each other must remain constant to track an object accurately. A control box remotely powers both cameras and electrically controls the irises. When operated inside a manned submersible, the control box is connected to the cameras via a cable which penetrates the submersible's hull to transmit video and control signals (Fig. 1). Video recording is controlled from two VCRs inside the submersible. Sequences on the two videotapes must be synchronized for subsequent identification of identical frames. We use a time-code generator which lays down a time-code signal simultaneously on one audio track of each VCR. When the original tapes are dubbed onto work tapes, the auditory time code is transformed into a visual code in minutes, seconds, and frame numbers at 30 fps, that appears on every frame. The greatest advantage of the time code over periodic synchronizing signals, such as a strobe flash or tone, is that the continuous visible code allows the user to be sure that identical sequences of frames on multiple videotapes are selected for digitizing.

We use Super-VHS or Hi8 videocassette recorders because of their high video resolution. For recording from the Johnson Sea-Link , where space was limited, we used 2 portable VCRs. The model we chose had 2 audio inputs, allowing the observer to record commentary on the second audio track while the time code was recorded on the first. A portable monitor attached to the outputs of the two VCRs displayed what was actually recorded and permitted the operator to hear the time-code signal being recorded on the audio tracks.

Photogrammetric Data Collection

In order to track identified targets, the spatial dimensions of the volume under observation must be known. For our automated tracking system, discussed below, each camera view is calibrated with a minimum of 6 non-coplanar targets within the fields of view of both cameras (Fig. 2). Calibration provides information about spatial relationships between the cameras and the recorded targets which is used to compute the 3-D trajectories of objects moving through the calibrated volume.

For dives with the Johnson Sea-Link we calibrated the cameras with a "stick" box made of 12 black rods, each 71.0 cm long, inserted into 8 white plastic cubes which served as calibration targets on each corner of the box (Fig. 2). The cameras on the Johnson Sea-Link were positioned so that the grid filled approximately 3/4 of the monitor screen in each camera's view when it was held in front of the submersible. The two cameras were focused approximately 2 m in front of the submersible's sphere. Once the cameras were positioned, the motionless calibration grid was recorded with both cameras. The cameras were calibrated at depth from the submersible by holding the stick box in front of the cameras with the claw. Calibration at depth is easier than calibration on deck because of the uniform black background beyond the targets. We recorded animal behavior with the 3-D video system from the Johnson Sea-Link both in midwater and while the submersible rested on the ocean floor. One example is presented below.

Data Analysis

Once multiple, synchronized images are recorded, they must be analyzed photogrammetrically. Manual computations for such temporal sequences, particularly of individuals living in groups, is tedious. We found a company, Motion Analysis Corporation, Santa Rosa, California, that sells a three-dimensional tracking system (ExpertVision) that digitizes multiple target images in real time, automatically computes x,y,z coordinates from their positions on synchronized frames of multiple tapes (Table 1), and plots their 3-D trajectories over time, using the digitized views of the calibration cubes (Fig. 2) for spatial reference.

Data Example

One example of target tracking, from the LEO750 project in September 1990, is presented to illustrate the importance of using a 3-D rather than two-dimensional system for measurements of motion. On dive no. 2257 we set the Johnson Sea-Link on the bottom at about 710 m and recorded the interaction between a sergestid shrimp and a fish when their swimming paths converged. The sergestid swam into view from the port side and the fish from the starboard side. The sergestid was slightly above the fish when they met, and from the original video it appears that at least one of the sergestid's long, trailing antennae touched the fish. The encounter startled both animals and they both abruptly changed course at increased speeds.

The data presented encompass 45 frames of video, or 1.5 seconds. The paths of both the sergestid and the fish initially continued in the same directions they had been swimming since they first appeared. Fig. 3 shows the 2-D paths of the sergestid and the fish as they were computed from each individual camera angle. Note that the computed path directions of the animals are strikingly different in the two views. Nonetheless, after the x and y coordinates for these two views were combined to track the animals in three-dimensional space, the true paths and other motion-related parameters could be calculated.

Fig. 4 is a plot of the sergestid's speed over 1.5 seconds, calculated by the ExpertVision system from these coordinates. For the first 17 frames, the sergestid's speed averaged 12.2 cm s-1. Its lobstertail response to the proximity of the fish reached a maximum speed of 140.0 cm s-1 in 3 frames (0.1 sec). In the same sequence, the fish averaged 15.6 cm s-1 for the first 17 frames. In response to the sergestid, the fish abruptly changed course and accelerated to a maximum speed of 90.8 cm s-1 in 5 frames (0.7 sec). By frame 33, the last frame in which its image could be digitized, the fish had decelerated to 53.0 cm s-1.

In this example of three-dimensional behavioral quantification, the inaccuracy of relying on 2-D measurements is clear. When the swimming sergestid and fish converged, the fish and shrimp reacted at almost the same time, changing speeds and directions. While each camera view individually suggests that interaction affected the two animals, taken separately they provide conflicting information. Fig. 3 shows the paths of the sergestid and fish calculated by the EV3D program for each view. Because of the camera angles, the sergestid appears to dart upward vertically along 2 different diagonals, and the fish actually appears to swim in opposite directions. Any behavioral measurements based on either one of these 2-dimensional views alone would be wrong.

Recommendations for Future Developments

To obtain quantitative in situ behavioral information about animals that live in a three-dimensional medium, 3-D sampling tools are absolutely essential. In the deep sea optical sampling is limited to short distances because of lighting difficulties; but where it is applicable, it is the most accurate method available for recording behavioral phenomena.

Correct lighting is important for subsequent digitization of recorded images. The digitizing computer outlines target images by "thresholding", that is, setting the grey level transition that defines the edge of the targets against the background. If lighting is uneven, the threshold level will change from frame to frame because contrast between the target and the background changes and the computer is unable to digitize the target throughout the sequence. Broad-beam lights are essential because the light field must be as even as possible.

Cameras on the submersible should be spaced as far apart as possible. Increased separation improves the accuracy of measuring coordinates along the axis perpendicular to the plane of the cameras. When possible, additional cameras should be used to increase the probability of the target remaining in more than one view. With additional cameras, two could be aligned side by side to provide a stereo view which could be projected in 3-D. Humans see the world with stereo optics, and the ability to review three-dimensional events in 3-D would assist us in understanding complex behaviors.

In an earlier report (Hamner et al. 1988) we discussed three-dimensional videography and described several systems of 3-D video projection. Once the elements of three-dimensional image collection, viewing, and computer analysis are combined, we will have an extremely powerful new tool for analyzing behavior in the deep sea. (Peggy and Bill Hamner are scientists in the Department of Biology at UCLA)


We thank the crews of the Johnson Sea-Link submersibles for their able and indispensable assistance in operating the 3-D video system. We thank Ms. Sadie Harrison for her help in analyzing behavioral sequences. The work described here was funded by NSF grant OCE 86-16487, NOAA/NURP contract NA88AA-H-UR020, and NOAA/NURP subcontract SC 02791.


Hamner, W. M., C. T. Prewitt, and E. Kristof. 1988. Quantitative analysis of the abundance, swimming behavior, and interactions of midwater organisms. P. 307-317 in: Global Venting, Midwater, and Benthic Ecological Processes (M. P. DeLuca and I. Babb, eds.). NURP Res. Rep. 88-4.

Walton, J. S. 1988. Underwater tracking in three dimensions using the Direct Linear Transformation and a video-based motion analysis system. in: Underwater Imaging (D. J. Holloway, ed.). SPIE Proceedings 980, San Diego, CA, 18 Aug, 1988. 3 pp. (unnumbered).

homepage contents previous article next article