TL;DR: This approach allows an efficient and natural way to construct iconic indexes for pictures and proves the necessary and sufficient conditions to characterize ambiguous pictures for reduced 2D strings as well as normal 2-D strings.
Abstract: In this paper, we describe a new way of representing a symbolic picture by a two-dimensional string. A picture query can also be specified as a 2-D string. The problem of pictorial information retrieval then becomes a problem of 2-D subsequence matching. We present algorithms for encoding a symbolic picture into its 2-D string representation, reconstructing a picture from its 2-D string representation, and matching a 2-D string with another 2-D string. We also prove the necessary and sufficient conditions to characterize ambiguous pictures for reduced 2-D strings as well as normal 2-D strings. This approach thus allows an efficient and natural way to construct iconic indexes for pictures.
TL;DR: A general purpose string solver, called Z3-str, is developed as an extension of the Z3 SMT solver through its plug-in interface, which treats strings as a primitive type, thus avoiding the inherent limitations observed in many existing solvers that encode strings in terms of other primitives.
Abstract: Analyzing web applications requires reasoning about strings and non-strings cohesively. Existing string solvers either ignore non-string program behavior or support limited set of string operations. In this paper, we develop a general purpose string solver, called Z3-str, as an extension of the Z3 SMT solver through its plug-in interface. Z3-str treats strings as a primitive type, thus avoiding the inherent limitations observed in many existing solvers that encode strings in terms of other primitives. The logic of the plug-in has three sorts, namely, bool, int and string. The string-sorted terms include string constants and variables of arbitrary length, with functions such as concatenation, sub-string, and replace. The int-sorted terms are standard, with the exception of the length function over string terms. The atomic formulas are equations over string terms, and (in)-equalities over integer terms. Not only does our solver have features that enable whole program symbolic, static and dynamic analysis, but also it performs better than other solvers in our experiments. The application of Z3-str in remote code execution detection shows that its support of a wide spectrum of string operations is key to reducing false positives.
TL;DR: In this paper, a device for automatically identifying the language of a text from a plurality of languages extracts words from the text and constructs all of the character strings contained in each extracted word.
Abstract: After prestoring first character strings that occur frequently in words of languages and second character strings that are a typical therein, a device for automatically identifying the language of a text from a plurality of languages extracts words from the text and constructs all of the character strings contained in each extracted word. Each string in an extracted word is compared to the first and second strings of a particular language. If the word contains a first string, a score of the language is increased by a coefficient depending in particular on the position of the first string in the word. If the word contains a second string, the score is decreased by a coefficient associated with the second string. The highest of the scores corresponding to the predetermined languages identifies the language of the text.
TL;DR: Punycode is an instance of Bootstring that uses particular parameter values specified by this document, appropriate for IDNA, and allows a string of basic code points to uniquely represent any string of code points drawn from a larger set.
Abstract: Punycode is a simple and efficient transfer encoding syntax designed for use with Internationalized Domain Names in Applications (IDNA) It uniquely and reversibly transforms a Unicode string into an ASCII string ASCII characters in the Unicode string are represented literally, and non-ASCII characters are represented by ASCII characters that are allowed in host name labels (letters, digits, and hyphens) This document defines a general algorithm called Bootstring that allows a string of basic code points to uniquely represent any string of code points drawn from a larger set Punycode is an instance of Bootstring that uses particular parameter values specified by this document, appropriate for IDNA
TL;DR: In this article, a method for rewriting source text includes receiving source text including a source text string in a first natural language, and then automatically rewriting the source string in the second natural language.
Abstract: A method for rewriting source text includes receiving source text including a source text string in a first natural language. The source text string is translated (S208) with a machine translation system to generate a first target text string in a second natural language. A translation confidence for the source text string is computed (S210), based on the first target text string. At least one alternative text string is generated (S216), where possible, in the first natural language by automatically rewriting the source string. Each alternative string is translated (S218) to generate a second target text string in the second natural language. A translation confidence is computed (S220) for the alternative text string based on the second target string. Based on the computed translation confidences, one of the alternative text strings may be selected as a candidate replacement for the source text string and may be proposed to a user on a graphical user interface.