TL;DR: It is shown that when the environment is piecewise linear, it provides a powerful constraint on the kind of matches that exist between two images of the scene when the camera motion is unknown, and that this constraint can be recovered from an estimate of the matrix of this collineation.
Abstract: We show in this article that when the environment is piecewise linear, it provides a powerful constraint on the kind of matches that exist between two images of the scene when the camera motion is unknown. For points and lines located in the same plane, the correspondence between the two cameras is a collineation. We show that the unknowns (the camera motion and the plane equation) can be recovered, in general, from an estimate of the matrix of this collineation. The two-fold ambiguity that remains can be removed by looking at a second plane, by taking a third view of the same plane, or by using a priori knowledge about the geometry of the plane being looked at. We then show how to combine the estimation of the matrix of collineation and the obtaining of point and line matches between the two images, by a strategy of Hypothesis Prediction and Testing guided by a Kalman filter. We finally show how our approach can be used to calibrate a system of cameras.
TL;DR: In this paper, a review of feature selection for multidimensional pattern classification is presented, and the potential benefits of Monte Carlo approaches such as simulated annealing and genetic algorithms are compared.
Abstract: We review recent research on methods for selecting features for multidimensional pattern classification. These methods include nonmonotonicity-tolerant branch-and-bound search and beam search. We describe the potential benefits of Monte Carlo approaches such as simulated annealing and genetic algorithms. We compare these methods to facilitate the planning of future research on feature selection.
TL;DR: This paper describes the computing alogrithms for the tree distance that can be applied to any problems including pattern recognition, syntactic tree comparison and classification, and tree comparison whose structures are important in structure preserving mapping.
Abstract: This paper describes the computing alogrithms for the tree distance based on the structure preserving mapping. The distance is defined as the minimum sum of the weights of edit operations needed to transform tree Tα to tree Tβ under restriction of the structure preserving mapping. The edit operations allow substituting a vertex of a tree to another, deleting a vertex of a tree and inserting a vertex to a tree. Proposed algorithms determine the distance between Tα and Tβ in time O(NαNβLα) or O(NαNβLβ), and in space O(NαNβ), where Nα, Nβ, Lα and Lβ are the number of vertices of Tα, Tβ, the number of’ leaves of Tα and Tβ, respectively. The time complexity is close to the unapproachable lowest bound O(NαNβ). Improved algorithms are presented. This tree distance can be applied to any problems including pattern recognition, syntactic tree comparison and classification, and tree comparison whose structures are important in structure preserving mapping.
TL;DR: The Dynamic Pyramid as discussed by the authors is a model to solve the correspondence problem of image sequences, where a robust estimation of local displacements is combined with controlled continuity constraints, and the displacement term of the functional is based on robust local binary correlations derived from the signs of the bandpass filtered images.
Abstract: The Dynamic Pyramid is a model to solve the correspondence problem of image sequences. A robust estimation of local displacements is combined with controlled continuity constraints. At the heart of the model is the functional of an elastic membrane whose elastic constants are subject to variation. The continuity control function is derived from the tension in the displacement vector field at grayvalue edges. The displacement term of the functional is based on robust local binary correlations derived from the signs of the bandpass filtered images. The basic representation of the model is the pyramid: The original images are converted into Laplacian pyramids, the signs of which are the features to determine the local displacements as well as the continuity control function. The vector field is built up as a pyramid from coarse to fine, giving the final displacement vector field at the finest level.
TL;DR: A generalization of subgraph isomorphism for the fault-tolerant interpretation of disturbed line images has been achieved and constrained continuous optimization techniques, such as relaxation labeling and neural network strategies, solve recognition problems within a reasonable time, even in rather complex relational structures where heuristics can fail.
Abstract: A generalization of subgraph isomorphism for the fault-tolerant interpretation of disturbed line images has been achieved. Object recognition is effected by optimal matching of a reference graph to the graph of a distorted image. This optimization is based on the solution of linear and quadratic assignment problems. The efficiency of the procedures developed for this objective has been proved in practical applications. NP-complete problems such as subgraph recognition need exhaustive computation if exact (branch-and-bound) algorithms are used. In contrast to this, heuristics are very fast and sufficiently reliable for less complex relational structures of the kind investigated in the first part of this paper. Constrained continuous optimization techniques, such as relaxation labeling and neural network strategies, solve recognition problems within a reasonable time, even in rather complex relational structures where heuristics can fail. They are also well suited to parallelism. The second part of this paper is devoted exclusively to them.
TL;DR: An effective method for signature separation from nonhomogeneous noisy background is introduced and a solution to the problem of simulated signature verification in off-line systems is introduced.
Abstract: This paper introduces an effective method for signature separation from nonhomogeneous noisy background. It also introduces a solution to the problem of simulated signature verification in off-line systems. Extraction of shape and density features and the effectiveness of using each and both of them are discussed in the light of experimental results.
TL;DR: This article proposes an approach to identify the layout of a document page by dividing it recursively into nested rectangular areas and uses it as a basis for a document layout model, which is able to control an automatic interpretation mechanism for deriving a high level representation of the contents of a documents.
Abstract: The realization of the paper-free office seems to be difficult that expected. Therefore, good paper-computer interfaces are necessary to transform paper documents into an electronic form, which allows the use of a filing and retrieval system. An electronic document page is an optically scanned and digitized representation of a printed page. Document analysis is the problem of interpreting and labeling the constitutents of the document. Although there are very reliable optical character recognition (OCR) methods, the process could be very inefficient. To prune the search space and to become more efficient, some search supporting methods have to be developed. This article proposes an approach to identify the layout of a document page by dividing it recursively into nested rectangular areas. The procedure is used as a basis for a document layout model, which is able to control an automatic interpretation mechanism for deriving a high level representation of the contents of a document. We have implemented our method in Common Lisp on a Symbolies 3640 Workstation and have run it for a large population of office documents. The results obtained have been very encouraging and have convincingly confirmed the soundness of our approach.
TL;DR: A unified approach to feature extraction for segmentation purposes by means of the rank-order filtering of grey values in a neighbourhood of each pixel of a digitized image is outlined.
Abstract: The aim of this paper is to outline a unified approach to feature extraction for segmentation purposes by means of the rank-order filtering of grey values in a neighbourhood of each pixel of a digitized image. In the first section an overview of rank-order filtering for image processing is given, and a fast histogram algorithm is proposed. Section 2 deals with the extraction of a “locally most representative grey value”, defined as the maximum of the local histogram density function. In Section 3 several textural features are described, which can be extracted from the local histogram by means of rank-order filtering, and their properties are discussed. Section 4 formulates some general requirements to be met by the process of image segmentation, and describes a method based upon the features introduced in the former sections. In the last section some experimental results applied to aerial views obtained with the segmentation method of Sect. 4 are reported. These test images have been analyzed within the scope of an investigation centered on terrain recognition for agricultural and ecological purposes.
TL;DR: This paper surveys the applications of vision to fish sorting, fish fillet sorting and detection of surface and sub-surface defects (such as worms and bones) and stresses the specific implementation context, needed performances, illumination, detection principle, as well as vision algorithms.
Abstract: This paper surveys the applications of vision to fish sorting, fish fillet sorting and detection of surface and sub-surface defects (such as worms and bones). It stresses the specific implementation context, needed performances, illumination, detection principle, as well as vision algorithms. Also analyzed are the optical properties of fish. Examples of results are given.
TL;DR: An original knowledge representation scheme named KRP based on Petri net theory is proposed, and the inference procedure similar to "intersection search" in semantic networks is given.
Abstract: An original knowledge representation scheme named KRP based on Petri net theory is proposed The formal description of the scheme, and the inference procedure similar to "intersection search" in semantic networks, are given
TL;DR: A new approach to the recognition of multi-font printed Chinese characters by encoding a character in terms of two pre-defined stroke relations, namely, relative position relation and relative direction relation.
Abstract: This paper describes a new approach to the recognition of multi-font printed Chinese characters. The basic idea is to encode a character in terms of two pre-defined stroke relations, namely, relative position relation and relative direction relation. The code-mapping method chosen in our system possesses two main advantages: the first is that the tree-like data base can be easily extended, and the second is that the processing time is independent of the amount of data base. Since the stability of the extracted strokes greatly affects the coding results, a new stroke merging method, which has been experimentally proven to extract strokes more steadily, is also proposed.
TL;DR: Three architectures of current interest—the pyramid, the linear array and the n-cube—are examined in relation to their ability to perform iconic and symbolic image processing and it is concluded that the success of a particular architecture depends crucially upon the type of application which is addressed.
Abstract: Three architectures of current interest—the pyramid, the linear array and the n-cube—are examined in relation to their ability to perform iconic and symbolic image processing. For each architecture an appropriate problem is identified and a matching implementation selected. An assessment of the characteristics of each implementation is given, together with an estimation of the performance on a number of fundamental operations. The relative merits of the designs are discussed but it is concluded that the success of a particular architecture depends crucially upon the type of application which is addressed, and that therefore no universally applicable ranking can be derived in the absence of a comprehensive benchmarking exercise.
TL;DR: The design of a multifont character recognizer which uses a binary decision tree to classify a character on the basis of 197 geometric features is described, which was highly sensitive to typeface and error rates varied between 10 percent and 0.1 percent.
Abstract: An optical character reader for processing typeset documents must be able to handle proportional spacing, the presence of touching characters and a wide variety of type fonts. This paper describes the design of a multifont character recognizer which uses a binary decision tree to classify a character on the basis of 197 geometric features. The algorithm for designing the decision tree is based upon an entropy minimization procedure, and makes no assumptions on the distribution or independence of the binary features. The decision tree classifier provides confidence measures which may be used to reduce the substitution error rate at the expense of higher rejection rates. Methods of reducing the overall error rate by combining the decision tree classifier with other classifiers were examined. In particular, the paper evaluates the performance of a classifier using a combination of multiple decision trees, template matching and contextual post-processing. Error rates were highly sensitive to typeface and varied between 10 percent and 0.1 percent. Computer processing times for the various stages of the system are presented.
TL;DR: With extensive training, it can be proven that this formalism may provide a very promising result even in handling erroneous writing such as missing a stroke, wrong writing sequence etc.
Abstract: The target of this recognition system is the set of handwritten Chinese characters input from tablet devices with stroke-sequence and stroke-count being free but within the constraint of normal writing. A formalism based upon an initial stroke-sequence decision tree and position matching has been developed for recognizing handwritten Chinese characters. This formalism has the advantages of using the features of strokes, stroke-sequence, and geometric relations but avoids the disadvantages caused by the instability of all of the above features. With extensive training, it can be proven that this formalism may provide a very promising result even in handling erroneous writing such as missing a stroke, wrong writing sequence etc.
TL;DR: The design of the SIL-ICON Compiler is described and an application example to design a text editor using the Heidelberg Icon Set is presented in detail.
Abstract: The SIL-ICON Compiler is a software system for the specification, interpretation, prototyping and generation of icon-oriented systems. In this paper, the design of the SIL-ICON Compiler is described. An application example to design a text editor using the Heidelberg Icon Set is presented in detail.
TL;DR: The work in the development of the speech understanding and dialog system EVAR is described, containing the relevant knowledge bases containing the raw linguistic knowledge and the preprocessors converting this to the specialized form needed by the processing algorithms.
Abstract: This article describes the work in the development of the speech understanding and dialog system EVAR. The relevant knowledge bases containing the raw linguistic knowledge and the preprocessors converting this to the specialized form needed by the processing algorithms are treated. Processing so far covers the level of the speech signal up to the level of pragmatic analysis. Some topics of ongoing and future work are mentioned briefly.
TL;DR: A high level process built to find the main plane structures of the scene and to label them with a semantic description using the interpreter CLASSIC, and the advantages of using a knowledge based system are discussed.
Abstract: We are interested in the semantic interpretation of 3-D data obtained by a stereovision algorithm, in the context of indoor scenes. This paper presents a high level process built to find the main plane structures of the scene and to label them with a semantic description. This process is made up of three stages: first, the sorting and enhancement of the input data (3-D segments), second, the generation of affine planes from the sorted 3-D segments and finally, the semantic description of these planes by a classification expert system using the interpreter CLASSIC. This process has been tested on various stereo images and the quality of the results, in robustness and accuracy, are quite good. The advantages of using a knowledge based system are discussed.
TL;DR: A new method recognizing and locating partially occluded two-dimensional parts is presented, integrated within a vision system of a Flexible Assembly Workcell to accomplish the automatic assembly of partially overlapping parts.
Abstract: A new method recognizing and locating partially occluded two-dimensional parts is presented. The objects are described by a set of segments derived form the polygonal coding of their contours, and by the geometrical relationships between the segments. Rewriting rules are used to improve the stability of the polygonal coding. The identification process utilizes a robust hypothesis generator, from which the segment research assisted by the spatial relationship is propagated into the scene. The originality of this method relies mainly on the use of structural relationships between the segments to select the robust initialization hypotheses, and the use of structural research to achieve hypothesis propagation. These last two points with the use of a hash-coding technique to improve the location of predicted segments, greatly reduce the combinatories and make the algorithm particularly rapid and effective. This approach is integrated within a vision system of a Flexible Assembly Workcell to accomplish the automatic assembly of partially overlapping parts.
TL;DR: This paper gives a framework for solving the scene analysis problem in a parallel processing environment, using split-level relaxation, and shows that it is indeed advantageous to use multiprocessors to solve this problem.
Abstract: The goal of high level vision is to identify a set of regions in a given image. This has been called by various names: the scene labeling problem’, the consistent labeling problem2, the constraint satisfaction problem3, Waltz filtering4, the satisfying assignment problem5, etc. There are several approaches to solve this problem, including backtracking, graph matching and relaxation. A new method called split-level relaxation, which is based on discrete relaxation was proposed in Ref. 6. It takes care of multiple semantic constraints by considering each of them independently. The problem is known to be NP-complete, so it takes a long time to solve. With the advent of multiprocessors, it is now imperative to see if the problem can be solved faster in the average case. In this paper we give a framework for solving the scene analysis problem in a parallel processing environment, using split-level relaxation. Experiments done on a multiprocessor show that it is indeed advantageous to use multiprocessors to solve this problem.
TL;DR: A character recognition method for learning Kanji by CAL that can identify characters whose stroke order or stroke connection is incorrect, and identify which of the strokes has been written incorrectly.
Abstract: We have developed a character recognition method for learning Kanji by CAL. Using this method, students can identify characters whose stroke order or stroke connection is incorrect, and identify which of the strokes has been written incorrectly. The system can recognize 99.7% of 120,000 test patterns with stroke order errors, stroke connection errors. or both. For correctly recognized characters. the error detection ratio was greater than 99.9%. The processing time was only 0.5 second/character when run on a 16-bit personal computer.
TL;DR: In this work the performance and computer time requirements of 15 classifiers are compared in images modeled by two-dimensional Gaussian Markov random fields which are represented by a causal autoregressive model of the second order.
Abstract: In this work the performance and computer time requirements of 15 classifiers are compared in images modeled by two-dimensional Gaussian Markov random fields which are represented by a causal autoregressive model of the second order. The per-pixel classifier and the object classifier directly or indirectly utilizing spectral-spatial characteristies of images are among them. The probability of misclassification (PMC) calculated analytically and experimentally on modeled data was used as a measure of a classifier performance. The influence of such factors as the object size and form, the inadequacy of a classifier and data models, the accuracy of spatial correlation estimation on the PMC is investigated. The following main results are obtained. The performance of object classifiers is much better than that of per-pixel classifiers. The PMC of object classifiers decreases rapidly with the increase of the size of an object. The performance of object classifiers indirectly incorporating spatial characteristics of an object (OCIND) and that of object classifiers directly incorporating spatial characteristics (OCDIR) is similar for the linear decision rule. The performance of OCDIR is much better than that of OCIND for the quadratic decision rule. Computationally, averaging object classifiers are fastest, next are OCIND and finally OCDIR. The cross-shaped object classifier is better and faster than the square-shaped object classifier for the same number of pixels.
TL;DR: It is proved that, in the viewpoint of computational logic, resolution and paramodulation mechanisms are complete and sound for fuzzy logic with equality and embody the fuzzy equality to the theory of this computation system.
Abstract: The concept of fuzzy equality and its related contents to the first order predicate calculus are discussed. It is proved that, in the viewpoint of computational logic, resolution and paramodulation mechanisms are complete and sound for fuzzy logic with equality. Term rewriting system, that is the set of left to right directional equations, provides an essential computational paradigm for word problems in universal algebra. We embody the fuzzy equality to the theory of this computation system and give an algorithmic solution to the word problems in fuzzy algebra.
TL;DR: This paper presents recent progress on some of the main avenues of object-based methods, which make use of contour-texture modeling, new results in neurophysiology and psychophisics and scene analysis, and second generation techniques.
Abstract: The digital representation of an image requires a very large number of bits. The goal of image coding is to reduce this number, as much as possible, and to reconstruct a faithful duplicate of the original picture. Early efforts in image coding, solely guided by information theory, led to a plethora of methods. The compression ratio reached a plateau around 10: 1 a couple of years ago. Recent progress in the study of the brain mechanism of vision and scene analysis has opened new vistas in picture coding. Directional sensitivity of the neurones in the visual pathway combined with the separate processing of contours and textures has led to a new class of coding methods capable of achieving compression ratios as high as 100: 1. This paper presents recent progress on some of the main avenues of object-based methods. These second generation techniques make use of contour-texture modeling, new results in neurophysiology and psychophisics and scene analysis.
TL;DR: It is found that Chinese characters are actually not only artistically elegant and culturally rich but also semantically meaningful and intelligently sound.
Abstract: This article discusses some intelligence aspects of Chinese characters. Some basic concepts of two-dimensional pattern representation and artificial intelligence such as semantic networks, forward chaining, deduction and the resolution principle are used to analyze and interpret the syntactic structure, representation, semantics and evolution of Chinese characters. The concept of degrees of ambiguity and the principle of new characters are investigated. It is found that Chinese characters are actually not only artistically elegant and culturally rich but also semantically meaningful and intelligently sound. Finally some topics for future research such as intelligent pattern recognition for Chinese characters, automatic learning and translation, and knowledge-based Chinese language understanding are discussed.
TL;DR: By using the hashing function designed in this paper, 1303 Mandarin phonetic symbol transcriptions will be hashed to 1303 locations in the way of one-to-one correspondence.
Abstract: In this paper, we consider the problem of how to design a minimal perfect hashing function which is suitable for the Mandarin Phonetic Symbols system. Our main idea is inspired by Chang’s letter-oriented minimal perfect hashing scheme. By using our hashing function, 1303 Mandarin phonetic symbol transcriptions will be hashed to 1303 locations in the way of one-to-one correspondence.
TL;DR: Numerical results give evidence that the estimator which is a simple extension of the scalar median has an overall performance that is the same or better than the other two proposed estimators.
Abstract: Three different kinds of median type estimators for use in applications where the underlying probability distributions are multivariate are proposed and analyzed. The numerical complexity and the statistical characteristics of the estimators are studied and discussed. Numerical results give evidence that the estimator which is a simple extension of the scalar median has an overall performance that is the same or better than the other two proposed estimators.
TL;DR: The umbra transform serves as a connection between gray-scale morphology and the classical two-valued morphology of G. Matheron and H. Hadwiger and thus applies to both morphological image and signal processing.
Abstract: The umbra transform serves as a connection between gray-scale morphology and the classical two-valued morphology of G. Matheron and H. Hadwiger. From a general set-theoretic perspective, the umbra transform of an image (or signal) results in an infinite set, even in the discrete case. By employing bound matrix image representation it is possible to represent the umbra by a finite data structure, the result being an approach that is both intuitive and computational. Moreover, the method is essentially dimensionally independent and thus applies to both morphological image and signal processing.
TL;DR: A differential geometric view is taken in defining orientation selection and algorithms for actually doing it are developed, which are formulated in mathematical terms as the inference of a vector field of tangents to the contours.
Abstract: Orientation selection is the inference of orientation information out of images. It is one of the foundations on which other visual structures are built, since it must precede the formation of contours out of pointillist data and surfaces out of surface markings. We take a differential geometric view in defining orientation selection and develop algorithms for actually doing it. The goal of these algorithms is formulated in mathematical terms as the inference of a vector field of tangents (to the contours), and the algorithms are studied in both abstract and computational forms. They are formulated as matching problems, and algorithms for solving them are reduced to biologically plausible terms. We show that two different matching problems are necessary, the first for 1-dimensional contours (which we refer to as Type I processes) and the second for 2-dimensional flows (or Type II processes).
TL;DR: All the three approaches to recognize the finals of Mandarin syllables are found to be very efficient in terms of relatively high recognition rate and short computation time, and the MSVQ Approach providing the highest recognition rate at the shortest computation time is most attractive.
Abstract: A long-term research project toward Mandarin speech recognition techniques for very large vocabulary and unlimited text is considered. By carefully examining the special structures of Chinese language, the first-stage goal is set to be the design of efficient techniques to recognize the finals of Mandarin syllables. In this paper, three special approaches to do this are proposed. The Segmental Model Approach defines the final models by dividing the finals into several segments according to the acoustic structures of the speech signals. The Three-pass Approach uses three consecutive passes to classify the finals into small sets and improve the recognition efficiency. The Multi-section Vector Quantization (MSVQ) Approach, on the other hand, significantly reduces the necessary computation time by incorporating the branch-and-bound algorithm and common codebook concept with the MSVQ techniques. Extensive computer simulations are performed first to optimize each approach by choosing the best set of parameters then to compare the performance of the three approaches. It was found that all the three approaches are very efficient in terms of relatively high recognition rate and short computation time, and the MSVQ Approach provides the highest recognition rate at the shortest computation time, thus it is most attractive.
TL;DR: This study purposes a method for recognizing the lexical tones in Mandarin speech based on Vector Quantization (VQ) and Hidden Markov Models (HMM), which shows that the tone of the second syllable may be affected by the preceding syllable.
Abstract: This study purposes a method for recognizing the lexical tones in Mandarin speech. The method is based on Vector Quantization (VQ) and Hidden Markov Models (HMM). The pitch periods are extracted to derive the feature vectors which represent pitch height and pitch contour slope. One HMM is trained by the feature vectors of monosyllables for each tone. Then the HMMs are used to recognize the tone of monosyllables and disyllables. For the monosyllables, the accuracy rate can be 93.75% for speaker-independent cases. For the disyllables, the accuracy rates are 93% for the first syllables and 90% for the second syllables. It shows that the tone of the second syllable may be affected by the preceding syllable. This degradation also reveals the fact of tone variation in Mandarin speech.