The labeled fishes in the wild image dataset is provided by NOAA Fisheries (National Marine Fisheries Service) to encourage development, testing, and performance assessment of automated image analysis algorithms for unconstrained underwater imagery.
The dataset includes images of fish, invertebrates, and the seabed that were collected using camera systems deployed on a remotely operated vehicle (ROV) for fisheries surveys. Annotation data are included in accompanying data files (.dat, .vec, and .info) that describe the locations of the marked fish targets in the images.
The manuscript (Cutter et al., 2015) demonstrates methods for automated detection of fish based on classifiers developed using the training image dataset, and evaluated using the test set. This dataset is offered for further development of detection of fish or invertebrates in complex environments; tracking of multiple animal targets in video image sequences; recognition and classification of animal species; measurement of animals in stereo image pairs; and characterization of seabed habitats.
Recommended citation: Cutter, G.; Stierhoff, K.; Zeng, J. (2015) "Automated detection of rockfish in unconstrained underwater videos using Haar cascades and a new image dataset: labeled fishes in the wild," IEEE Winter Conference on Applications of Computer Vision Workshops, pp. 57-62.
The NOAA scientists who are stewards of these data may have archives of images that can provide additional opportunities for collaboration to apply and assess algorithms. Credit for use of these datasets should be provided in publications, as described in the “how-to-cite.txt” documents included in the dataset archive or as shown above.
Labeled Fishes in the Wild image dataset (v. 1.1) (Download 423 MB).
Labeled fishes in the wild has three components: a training and validation positive image set (verified fish), a negative image set (non-fish), and a test image set. The training and test sets have accompanying annotation data that define the location and extent of each marked fish target object in the images. These represent bounding rectangles defined by expert analysts, and are in the format of .dat files used by OpenCV.
Training and validation positive image set: contains images of rockfish (Sebastes spp.) and other associated species near the seabed, collected using a forward-oblique-looking digital still camera deployed on a remotely operated vehicle (ROV) by the Southwest Fisheries Science Center during surveys of rocky seabed environments offshore of southern California. Still frames from these cameras represent instances during a survey where the ROV was moving slowly, and motion effects are not a factor. The training set comprises 929 image files, containing 1005 marked fish with associated annotations (their marked locations and bounding rectangles). The marks define fish of various species, sizes, and ranges to the camera, and includes portions of different background composition.
Training and validation negative image set: includes 3167 images. The 147 seabed negative images provided in the downloadable archive were extracted from the labeled fishes in the wild training and test image sets (regions containing no fish were extracted). The remaining 3020 images are available from the tutorial on OpenCV HaarTraining, and available from the data negatives directory.
Test image set: contains an image sequence collected using the ROV’s high-definition (HD; 1080i) video camera during a near-seabed survey of fish. The test imagery for detection comprises video footage from ROV surveys. The video clip (“TEST_VIDEO_ROV10.mp4”; 210 frames at 3 frames per second (fps)) used to evaluate detectors for this study represents every 10th frame of the original video sequence (2-minute duration, approximately 30 fps). All fish targets are annotated for the 210-frame, 3fps test video. Annotations of fish in the test video include a descriptor, “verified” or “apparent,” where verified indicates that a video analyst could identify the fish as such, and apparent objects were believed to be fish, but were not verifiable based on attributes visible in a single frame. These apparent fish may appear as faint blobs in the distance. These distinctions are made in the annotation data because we believe that some classifiers will detect these apparent fish, but we do not expect the classifier to do so; nor do we necessarily want the detector to do so. That is, if a classifier is detecting those apparent fish, then it is probably detecting many other non-fish targets in the images, thereby making it inefficient and impractical. A total of 2061 fish objects were marked in the annotated frames of the dataset test video. Of those, 1008 were verified fish, and 1053 were apparent fish. During the sequence the ROV is moving; the background appears to be moving and is illuminated from different directions (as the ROV moves and rotates); small particles in the water current stream past; fish are still or moving at various speeds; fish are oriented in many directions; some fish are hidden partially behind rocks or in crevices; some indistinct fish-like objects appear in the distance.
The original Labeled fishes in the wild dataset (v1.0, Dec. 2014) contained only the decimated test video sequence ("Test_ROV_video_h264_decim.mp4") that contained only the marked frames from the original video. One tenth of the frames of the full frame-rate video were marked for locations of fish targets. This version of the dataset (v1.1, Jan. 2015) also contains the full test video sequence ("Test_ROV_video_h264_full.mp4"). Both the full and decimated videos have accompanying text files with analyst marks (following OpenCV .dat file conventions). Generally, for m marks, the format is: Video-filename(frame#) #-of-marks x1 y1 w1 h1 x2 y2 w2 h2 ... xm ym wm hm. For example, in the case of two marks, the final eight values define the bounding rectangles: Test_ROV_video_h264_full.mp4(fr_14) 2 1021 362 94 63 953 289 90 61. The marks file for the decimated video ("Test_ROV_video_h264_decim_marks.dat") indicates the frame number for the decimated and full sequence, e.g. Test_ROV_video_h264_decim.mp4(fr_1)(fullfr_14) 2 1021 362 94 63 953 289 90 61. There are 2101 frames in the full video and 210 frames in the decimated video, but 206 frames were marked; i.e. a few of the examined frames did not contain fish.