TL;DR: This work describes a new visual fiducial system that uses a 2D bar code style “tag”, allowing full 6 DOF localization of features from a single image, incorporating a fast and robust line detection system, a stronger digital coding system, and greater robustness to occlusion, warping, and lens distortion.
Abstract: While the use of naturally-occurring features is a central focus of machine perception, artificial features (fiducials) play an important role in creating controllable experiments, ground truthing, and in simplifying the development of systems where perception is not the central objective. We describe a new visual fiducial system that uses a 2D bar code style “tag”, allowing full 6 DOF localization of features from a single image. Our system improves upon previous systems, incorporating a fast and robust line detection system, a stronger digital coding system, and greater robustness to occlusion, warping, and lens distortion. While similar in concept to the ARTag system, our method is fully open and the algorithms are documented in detail.
TL;DR: A parametric model for a computer-controlled moveable camera on a pan-tilt head that expresses the transform relating object space to image space as a function of the control variables of the camera is developed.
Abstract: : The report developes a parametric model for a computer-controlled moveable camera on a pan-tilt head. The model expresses the transform relating object space to image space as a function of the control variables of the camera. We constructed a calibration system for measuring the model parameters which has a demonstrated accuracy more than adequate for our present needs. We have also identified the major source of error in model measurement to be undesired image motion and have developed means of measuring and compensating for some of it and eliminating other parts of it. The system can measure systematic image distortions if they become the major accuracy limitation. It has been shown how to generalize the model to handle small systematic errors due to aspects of pan-tilt head geometry not presently accounted for. The report demonstrates the model's application in stereo vision and have shown how it can be applied as a predictive device in locating objects of interest and centering them in an image. (Author)
TL;DR: The computer program described here, the WALKER model, maps images into a description in which a person is represented by the series of hierarchical levels, i.e. a person has an arm which has a lower-armWhich has a hand.
TL;DR: A general computational treatment of how mammals are able to deal with visual objects and environments that tries to cover the entire range from behavior and phenomenological experience to detailed neural encodings in crude but computationally plausible reductive steps.
Abstract: This paper presents a general computational treatment of how mammals are able to deal with visual objects and environments The model tries to cover the entire range from behavior and phenomenological experience to detailed neural encodings in crude but computationally plausible reductive steps The problems addressed include perceptual constancies, eye movements and the stable visual world, object descriptions, perceptual generalizations, and the representation of extrapersonal spaceThe entire development is based on an action-oriented notion of perception The observer is assumed to be continuously sampling the ambient light for information of current value The central problem of vision is taken to be categorizing and locating objects in the environment The critical step in this process is the linking of visual information to symbolic object descriptions; this is called indexing, from the analogy of identifying a book from index terms The system must also identify situations and use this knowledge to guide movement and other actions in the environment The treatment focuses on the different representations of information used in the visual systemThe four representational frames capture information in the following forms: retinotopic, head-based, symbolic, and allocentric The functional roles of the four frames, the communication among them, and their suggested neurophysiological realization constitute the core of the paper The model is perforce crude, but appears to be consistent with all relevant findings