TL;DR: A new definition of scale-space is suggested, and a class of algorithms used to realize a diffusion process is introduced, chosen to vary spatially in such a way as to encourage intra Region smoothing rather than interregion smoothing.
Abstract: A new definition of scale-space is suggested, and a class of algorithms used to realize a diffusion process is introduced. The diffusion coefficient is chosen to vary spatially in such a way as to encourage intraregion smoothing rather than interregion smoothing. It is shown that the 'no new maxima should be generated at coarse scales' property of conventional scale space is preserved. As the region boundaries in the approach remain sharp, a high-quality edge detector which successfully exploits global information is obtained. Experimental results are shown on a number of images. Parallel hardware implementations are made feasible because the algorithm involves elementary, local operations replicated over the image. >
TL;DR: It is shown that the remaining residual generalization error can be reduced by invoking ensembles of similar networks, which helps improve the performance and training of neural networks for classification.
Abstract: Several means for improving the performance and training of neural networks for classification are proposed Crossvalidation is used as a tool for optimizing network parameters and architecture It is shown that the remaining residual generalization error can be reduced by invoking ensembles of similar networks >
TL;DR: The use of natural symmetries (mirror images) in a well-defined family of patterns (human faces) is discussed within the framework of the Karhunen-Loeve expansion, which results in an extension of the data and imposes even and odd symmetry on the eigenfunctions of the covariance matrix.
Abstract: The use of natural symmetries (mirror images) in a well-defined family of patterns (human faces) is discussed within the framework of the Karhunen-Loeve expansion This results in an extension of the data and imposes even and odd symmetry on the eigenfunctions of the covariance matrix, without increasing the complexity of the calculation The resulting approximation of faces projected from outside of the data set onto this optimal basis is improved on average >
TL;DR: A systematic reconstruction-based method for deciding the highest-order ZERNike moments required in a classification problem is developed and the superiority of Zernike moment features over regular moments and moment invariants was experimentally verified.
Abstract: The problem of rotation-, scale-, and translation-invariant recognition of images is discussed. A set of rotation-invariant features are introduced. They are the magnitudes of a set of orthogonal complex moments of the image known as Zernike moments. Scale and translation invariance are obtained by first normalizing the image with respect to these parameters using its regular geometrical moments. A systematic reconstruction-based method for deciding the highest-order Zernike moments required in a classification problem is developed. The quality of the reconstructed image is examined through its comparison to the original one. The orthogonality property of the Zernike moments, which simplifies the process of image reconstruction, make the suggest feature selection approach practical. Features of each order can also be weighted according to their contribution to the reconstruction process. The superiority of Zernike moment features over regular moments and moment invariants was experimentally verified. >
TL;DR: An interpretation of image texture as a region code, or carrier of region information, is emphasized and examples are given of both types of texture processing using a variety of real and synthetic textures.
Abstract: A computational approach for analyzing visible textures is described. Textures are modeled as irradiance patterns containing a limited range of spatial frequencies, where mutually distinct textures differ significantly in their dominant characterizing frequencies. By encoding images into multiple narrow spatial frequency and orientation channels, the slowly varying channel envelopes (amplitude and phase) are used to segregate textural regions of different spatial frequency, orientation, or phase characteristics. Thus, an interpretation of image texture as a region code, or carrier of region information, is emphasized. The channel filters used, known as the two-dimensional Gabor functions, are useful for these purposes in several senses: they have tunable orientation and radial frequency bandwidths and tunable center frequencies, and they optimally achieve joint resolution in space and in spatial frequency. By comparing the channel amplitude responses, one can detect boundaries between textures. Locating large variations in the channel phase responses allows discontinuities in the texture phase to be detected. Examples are given of both types of texture processing using a variety of real and synthetic textures. >
TL;DR: A description of the transferable belief model, which is used to quantify degrees of belief based on belief functions, is given and a set of axioms justifying Dempster's rule for the combination of belief functions induced by two distinct evidences is presented.
Abstract: A description of the transferable belief model, which is used to quantify degrees of belief based on belief functions, is given. The impact of open- and closed-world assumption on conditioning is discussed. The nature of the frame of discernment on which a degree of belief will be established is discussed. A set of axioms justifying Dempster's rule for the combination of belief functions induced by two distinct evidences is presented. >
TL;DR: The state of the art of online handwriting recognition during a period of renewed activity in the field is described, based on an extensive review of the literature, including journal articles, conference proceedings, and patents.
Abstract: This survey describes the state of the art of online handwriting recognition during a period of renewed activity in the field. It is based on an extensive review of the literature, including journal articles, conference proceedings, and patents. Online versus offline recognition, digitizer technology, and handwriting properties and recognition problems are discussed. Shape recognition algorithms, preprocessing and postprocessing techniques, experimental systems, and commercial products are examined. >
TL;DR: The proper way to apply the scale-space theory to discrete signals and discrete images is by discretization of the diffusion equation, not the convolution integral.
Abstract: A basic and extensive treatment of discrete aspects of the scale-space theory is presented. A genuinely discrete scale-space theory is developed and its connection to the continuous scale-space theory is explained. Special attention is given to discretization effects, which occur when results from the continuous scale-space theory are to be implemented computationally. The 1D problem is solved completely in an axiomatic manner. For the 2D problem, the author discusses how the 2D discrete scale space should be constructed. The main results are as follows: the proper way to apply the scale-space theory to discrete signals and discrete images is by discretization of the diffusion equation, not the convolution integral; the discrete scale space obtained in this way can be described by convolution with the kernel, which is the discrete analog of the Gaussian kernel, a scale-space implementation based on the sampled Gaussian kernel might lead to undesirable effects and computational problems, especially at fine levels of scale; the 1D discrete smoothing transformations can be characterized exactly and a complete catalogue is given; all finite support 1D discrete smoothing transformations arise from repeated averaging over two adjacent elements (the limit case of such an averaging process is described); and the symmetric 1D discrete smoothing kernels are nonnegative and unimodal, in both the spatial and the frequency domain. >
TL;DR: In this paper, a method for recovery of compact volumetric models for shape representation of single-part objects in computer vision is introduced, where the model recovery is formulated as a least-squares minimization of a cost function for all range points belonging to a single part.
Abstract: A method for recovery of compact volumetric models for shape representation of single-part objects in computer vision is introduced. The models are superquadrics with parametric deformations (bending, tapering, and cavity deformation). The input for the model recovery is three-dimensional range points. Model recovery is formulated as a least-squares minimization of a cost function for all range points belonging to a single part. During an iterative gradient descent minimization process, all model parameters are adjusted simultaneously, recovery position, orientation, size, and shape of the model, such that most of the given range points lie close to the model's surface. A specific solution among several acceptable solutions, where are all minima in the parameter space, can be reached by constraining the search to a part of the parameter space. The many shallow local minima in the parameter space are avoided as a solution by using a stochastic technique during minimization. Results using real range data show that the recovered models are stable and that the recovery procedure is fast. >
TL;DR: A method that combines region growing and edge detection for image segmentation is presented and is thought that the success in the tool images is because the objects shown occupy areas of many pixels, making it is easy to select parameters to separate signal information from noise.
Abstract: A method that combines region growing and edge detection for image segmentation is presented. The authors start with a split-and merge algorithm wherein the parameters have been set up so that an over-segmented image results. Region boundaries are then eliminated or modified on the basis of criteria that integrate contrast with boundary smoothness, variation of the image gradient along the boundary, and a criterion that penalizes for the presence of artifacts reflecting the data structure used during segmentation (quadtree in this case). The algorithms were implemented in the C language on a Sun 3/160 workstation running under the Unix operating system. Simple tool images and aerial photographs were used to test the algorithms. The impression of human observers is that the method is very successful on the tool images and less so on the aerial photograph images. It is thought that the success in the tool images is because the objects shown occupy areas of many pixels, making it is easy to select parameters to separate signal information from noise. >
TL;DR: A recursive filtering structure is proposed that drastically reduces the computational effort required for smoothing, performing the first and second directional derivatives, and carrying out the Laplacian of an image.
Abstract: A recursive filtering structure is proposed that drastically reduces the computational effort required for smoothing, performing the first and second directional derivatives, and carrying out the Laplacian of an image. These operations are done with a fixed number of multiplications and additions per output point independently of the size of the neighborhood considered. The key to the approach is, first, the use of an exponentially based filter family and, second, the use of the recursive filtering. Applications to edge detection problems and multiresolution techniques are considered, and an edge detector allowing the extraction of zero-crossings of an image with only 14 operations per output element at any resolution is proposed. Various experimental results are shown. >
TL;DR: The method of Fourier descriptors is extended to produce a set of normalized coefficients which are invariant under any affine transformation (translation, rotation, scaling, and shearing) and allows considerable robustness when applied to images of objects which rotate in all three dimensions.
Abstract: The method of Fourier descriptors is extended to produce a set of normalized coefficients which are invariant under any affine transformation (translation, rotation, scaling, and shearing). The method is based on a parameterized boundary description which is transformed to the Fourier domain and normalized there to eliminate dependencies on the affine transformation and on the starting point. Invariance to affine transforms allows considerable robustness when applied to images of objects which rotate in all three dimensions, as is demonstrated by processing silhouettes of aircraft maneuvering in three-space. >
TL;DR: A method for the determination of camera location from two-dimensional to three-dimensional (3-D) straight line or point correspondences is presented and results can be obtained in the presence of noise if more than the minimum required number of correspondences are used.
Abstract: A method for the determination of camera location from two-dimensional (2-D) to three-dimensional (3-D) straight line or point correspondences is presented. With this method, the computations of the rotation matrix and the translation vector of the camera are separable. First, the rotation matrix is found by a linear algorithm using eight or more line correspondences, or by a nonlinear algorithm using three or more line correspondences, where the line correspondences are either given or derived from point correspondences. Then, the translation vector is obtained by solving a set of linear equations based on three or more line correspondences, or two or more point correspondences. Eight 2-D to 3-D line correspondences or six 2-D to 3-D point correspondences are needed for the linear approach; three 2-D to 3-D line or point correspondences for the nonlinear approach. Good results can be obtained in the presence of noise if more than the minimum required number of correspondences are used. >
TL;DR: By performing real-time measurements of the time durations between the keystrokes when a password is entered and using pattern-recognition algorithms, three online recognition systems were devised and tested.
Abstract: An approach to securing access to computer systems is described. By performing real-time measurements of the time durations between the keystrokes when a password is entered and using pattern-recognition algorithms, three online recognition systems were devised and tested. Two types of passwords were considered: phrases and individual names. A fixed phrase was used in the identification system. Individual names were used as a password in the verification system and in the overall recognition system. All three systems were tested and evaluated. The identification system used 10 volunteers and gave an indecision error of 1.2%. The verification system used 26 volunteers and gave an error of 8.1% in rejecting valid users and an error of 2.8% in accepting invalid users. The overall recognition system used 32 volunteers and gave an error of 3.1% in rejecting valid users and an error of 0.5% in accepting invalid users. >
TL;DR: A coding scheme is presented based on a single fixed binary encoded illumination pattern, which contains all the information required to identify the individual strikes visible in the camera image and a prototype measurement system based on this coding principle is presented.
Abstract: The problem of strike identification in range image acquisition systems based on triangulation with periodically structured illumination is discussed. A coding scheme is presented based on a single fixed binary encoded illumination pattern, which contains all the information required to identify the individual strikes visible in the camera image. Every sample point indicated by the light pattern is made identifiable by means of a binary signature, which is locally shared among its closest neighbors. The applied code is derived from pseudonoise sequences, and it is optimized so that it can make the identification fault-tolerant to the largest extent. A prototype measurement system based on this coding principle is presented. Experimental results obtained with the measurement system are also presented. >
TL;DR: In this article, a method for distinguishing metal and dielectric material surfaces from the polarization characteristics of specularly reflected light is introduced, which is completely passive and requires only the sensing of transmitted radiance of reflected light through a polarizing filter positioned in multiple orientations in front of a camera sensor.
Abstract: A computationally simple yet powerful method for distinguishing metal and dielectric material surfaces from the polarization characteristics of specularly reflected light is introduced. The method is completely passive, requiring only the sensing of transmitted radiance of reflected light through a polarizing filter positioned in multiple orientations in front of a camera sensor. Precise positioning of lighting is not required. An advantage of using a polarization-based method for material classification is its immunity to color variations, which so commonly exist on uniform material samples. A simple polarization-reflectance model, called the Fresnel reflectance model, is developed. The fundamental assumptions are that the diffuse component of reflection is completely unpolarized and that the polarization state of the specular component of reflection is dictated by the Fresnel reflection coefficients. The material classification method presented results axiomatically from the Fresnel reflectance model, by estimating the polarization Fresnel ratio. No assumptions are required about the functional form of the diffuse and specular components of reflection. The method is demonstrated on some common objects consisting of metal and dielectric parts. >
TL;DR: Direct analytical methods are discussed for solving Poisson equations of the general form Delta u=f on a rectangular domain and experiments indicate that results comparable to those using multigrid can be obtained in a very small number of iterations.
Abstract: Direct analytical methods are discussed for solving Poisson equations of the general form Delta u=f on a rectangular domain. Some embedding techniques that may be useful when boundary conditions (obtained from stereo and occluding boundary) are defined on arbitrary contours are described. The suggested algorithms are computationally efficient owing to the use of fast orthogonal transforms. Applications to shape from shading, lightness and optical flow problems are also discussed. A proof for the existence and convergence of the flow estimates is given. Experiments using synthetic images indicate that results comparable to those using multigrid can be obtained in a very small number of iterations. >
TL;DR: A system using two Polaroid transducers is described that correctly discriminates between corners and planes for inclination angles within +or-10 degrees of the transducer orientation, allowing the system to operate over an extended range.
Abstract: A multitransducer, pulse/echo-ranging system is described that differentiates corner and plane reflectors by exploiting the physical properties of sound propagation. The amplitudes and ranges of reflected signals for the different transmitter and receiver pairs are processed to determine whether the reflecting object is a plane or a right-angle corner. In addition, the angle of inclination of the reflector with respect to the transducer orientation can be measured. Reflected signal amplitude and range values, as functions of inclination angle, provide the motivation for the differentiation algorithm. A system using two Polaroid transducers is described that correctly discriminates between corners and planes for inclination angles within +or-10 degrees of the transducer orientation. The two-transducer system is extended to a multitransducer array, allowing the system to operate over an extended range. An analysis comparing processing effort to estimation accuracy is performed. >
TL;DR: A two-stage method of image segmentation based on gray level cooccurrence matrices that robustly segments an image into homogeneous areas and generates an edge map is described and extends easily to general edge operators.
Abstract: A two-stage method of image segmentation based on gray level cooccurrence matrices is described. An analysis of the distributions within a cooccurrence matrix defines an initial pixel classification into both region and interior or boundary designations. Local consistency of pixel classification is then implemented by minimizing the entropy of local information, where region information is expressed via conditional probabilities estimated from the cooccurrence matrices, and boundary information via conditional probabilities which are determined a priori. The method robustly segments an image into homogeneous areas and generates an edge map. The technique extends easily to general edge operators. An example is given for the Canny operator. Applications to synthetic and forward-looking infrared (FLIR) images are given. >
TL;DR: An algorithm is presented to recognize and locate partially distorted 2D shapes without regard to their orientation, location, and size and works reasonably well in the presence of a moderate amount of noise.
Abstract: An algorithm is presented to recognize and locate partially distorted 2D shapes without regard to their orientation, location, and size. The algorithm first calculates the curvature function from the digitized image of an object. The points of local maxima and minima extracted from the smooth curvature are used as control points to segment the boundary and to guide the boundary-matching procedure. The boundary-matching procedure considers two shapes at a time, one shape from the template databank, and the other from the object being classified. The procedure tries to match the control points in the unknown shape to those of a shape from the template databank, and estimates the translation, rotation, and scaling factors to be used to normalize the boundary of the unknown shape. The chamfer 3/4 distance transformation and a partial distance measurement scheme constitute the final step in measuring the similarity between the two shapes. The unknown shape is assigned to the class corresponding to the minimum distance. The algorithm has been successfully tested on partial shapes using two sets of data, one with sharp corners and the other with curve segments. This algorithm not only is computationally simple, but also works reasonably well in the presence of a moderate amount of noise. >
TL;DR: A precise definition of digital skeletons and a mathematical framework for the analysis of a class of thinning algorithms, based on morphological set transformation, are presented and an algorithm based on this condition is developed.
Abstract: A precise definition of digital skeletons and a mathematical framework for the analysis of a class of thinning algorithms, based on morphological set transformation, are presented. A particular thinning algorithm (algorithm A) is used as an example in the analysis. Precise definitions and analyses associated with the thinning process are presented, including the proof of convergence, the condition for one-pixel-thick skeletons, and the connectedness of skeletons. In addition, a necessary and sufficient condition for the thinning process in general is derived, and an algorithm (algorithm B) based on this condition is developed. Experimental results are used to compare the two thinning algorithms, and issues involving noise immunity and skeletal bias are addressed. >
TL;DR: An alternative to multigrid relaxation that is much easier to implement and more generally applicable is presented and the relationship of this approach to other multiresolution relaxation and representation schemes is discussed.
Abstract: An alternative to multigrid relaxation that is much easier to implement and more generally applicable is presented. Conjugate gradient descent is used in conjunction with a hierarchical (multiresolution) set of basis functions. The resultant algorithm uses a pyramid to smooth the residual vector before the direction is computed. Simulation results showing the speed of convergence and its dependence on the choice of interpolator, the number of smoothing levels, and other factors are presented. The relationship of this approach to other multiresolution relaxation and representation schemes is also discussed. >
TL;DR: The authors give a detailed account of a system environment for treating general problems of image and speech understanding by providing a framework for representing declarative and procedural knowledge based on a suitable definition of a semantic network.
Abstract: The authors give a detailed account of a system environment for treating general problems of image and speech understanding. A framework for representing declarative and procedural knowledge based on a suitable definition of a semantic network is provided. The syntax and semantics of the network are clearly defined. The pragmatics of the network in its use for pattern understanding are defined by several rules which are problem-independent. This allows one to formulate problem-independent control algorithms. Complete software environments are available to handle the described structures. The general applicability of the network system is demonstrated by short descriptions of three applications from different task domains. >
TL;DR: A method for extracting manufacturing shape features from the boundary representation of a polyhedral object by combining topologic and geometric evidences, and uses a combination of Dempster-Shafer decision theory and clustering techniques to reach its conclusions.
Abstract: A method for extracting manufacturing shape features from the boundary representation of a polyhedral object is presented. In this approach, the depressions of the part are represented as cavity graphs, which are in turn used as a basis for hypothesis generation-elimination. The proposed cavity graphs are an extended representation in which the links reflect the concavity of the intersection between two faces, and the node labels reflect the relative orientation of the faces comprising the depression. Because previous methods have limited success in handling interactions, emphasis is put on automatic analysis of depressions which are formed by the interactions of primitive features. It is shown that although there is a unique subgraph for each primitive feature, every cavity graph does not correspond to a unique set of primitive features. Consequently, since the cavity graph of a depression may not be the union of the representations for the involved primitives, the concept of virtual links for the formal analysis of the depressions based on cavity graphs is introduced. Finally, a suitable method for automatic determination of the virtual links is presented. This method is based on combining topologic and geometric evidences, and uses a combination of Dempster-Shafer decision theory and clustering techniques to reach its conclusions. Experimental results are presented for a number of examples. >
TL;DR: It is shown that for polyhedral objects there are two fundamental visual events: (1) the projections of an edge and a vertex coincide; and (2) the projection of three nonadjacent edges intersect at a point.
Abstract: An algorithm for computing the aspect graph for polyhedral objects is described. The aspects graph is a representation of three-dimensional objects by a set of two-dimensional views. The set of viewpoints on the Gaussian sphere is partitioned into regions such that in each region the qualitative structure of the line drawing remains the same. At the boundaries between adjacent regions are the accidental viewpoints where the structure for the line drawing changes. It is shown that for polyhedral objects there are two fundamental visual events: (1) the projections of an edge and a vertex coincide; and (2) the projections of three nonadjacent edges intersect at a point. The geometry of the object is reflected in the locus of the accidental viewpoints. The algorithm computes the partition together with a representative view for each region of the partition. >
TL;DR: An improved application of probabilistic relaxation to edge labeling is presented, which uses a dictionary to represent permitted labelings of the entire context-conveying neighborhood of each pixel.
Abstract: An improved application of probabilistic relaxation to edge labeling is presented. The improvement derives from the use of a representation of the edge process that is internally consistent and which utilizes a more complex description of edge structure. The application uses a dictionary to represent permitted labelings of the entire context-conveying neighborhood of each pixel. Details are given of the dictionary approach and the related representation of the edge process. A comparison with other edge-postprocessing strategies is provided. >
TL;DR: A fast pixel-based algorithm is developed that uses careful code optimization and selective processing to achieve fast extraction of lines for use in vision-guided mobile robot navigation.
Abstract: There are two basic ways to improve the speed of a low-level vision algorithm: careful code optimization and selective processing. Reducing the computational effort expended on each pixel reduces the time required to process an image by a constant factor. Selective processing on a limited portion of an image using a focus of attention can decrease overall computation by orders of magnitude. A fast pixel-based algorithm is developed that uses these principles to achieve fast extraction of lines for use in vision-guided mobile robot navigation. It builds upon an algorithm for extracting lines by grouping pixels with similar gradient orientation. It allows parametric control of computational resources required to extract lines with particular characteristics. >
TL;DR: Overall, the expected-outcome model of two-player games is shown to be precise, accurate, easily estimable, efficiently calculable, and domain-independent.
Abstract: The expected-outcome model, in which the proper evaluation of a game-tree node is the expected value of the game's outcome given random play from that node on, is proposed. Expected outcome is considered in its ideal form, where it is shown to be a powerful heuristic. The ability of a simple random sampler that estimates expected outcome to outduel a standard Othello evaluator is demonstrated. The sampler is combined with a linear regression procedure to produce efficient expected-outcome estimators. Overall, the expected-outcome model of two-player games is shown to be precise, accurate, easily estimable, efficiently calculable, and domain-independent. >
TL;DR: A method of recognizing partially occluded objects is presented in which each object is represented by a set of landmarks and it is shown that any invariant function under a similarity transformation is a function of the sphericity.
Abstract: A method of recognizing partially occluded objects is presented in which each object is represented by a set of landmarks. Given a scene consisting of partially occluded objects, a model object in the scene is hypothesized by matching the landmarks of the model with those in the scene. A measure of similarity between two landmarks is needed to perform this matching. A local shape measure, sphericity, is introduced. It is shown that any invariant function under a similarity transformation is a function of the sphericity. To match landmarks between the model and the scene, a table of compatibility is constructed. A technique, known as hopping dynamic programming, is described to guide the landmark matching through the compatibility table. The location of the model in the scene is estimated with a least-squares fit among the matched landmarks. A heuristic measure is then computed to decide if the model is in the scene. >
TL;DR: It is shown that locating the FOE precisely is difficult when displacement vectors are corrupted by noise and errors, and a more robust performance can be achieved by computing a 2D region of possible FOE locations (termed the fuzzy FOE) instead of looking for a single-point FOE.
Abstract: The computation of sensor motion from sets of displacement vectors obtained from consecutive pairs of images is discussed The problem is investigated with emphasis on its application to autonomous robots and land vehicles The effects of 3D camera rotation and translation upon the observed image are discussed, particularly the concept of the focus of expansion (FOE) It is shown that locating the FOE precisely is difficult when displacement vectors are corrupted by noise and errors A more robust performance can be achieved by computing a 2D region of possible FOE locations (termed the fuzzy FOE) instead of looking for a single-point FOE The shape of this FOE region is an explicit indicator of the accuracy of the result It has been shown elsewhere that given the fuzzy FOE, a number of powerful inferences about the 3D sense structure and motion become possible Aspects of computing the fuzzy FOE are emphasized, and the performance of a particular algorithm on real motion sequences taken from a moving autonomous land vehicle is shown >