TL;DR: In this paper, the authors propose a markup language according to the SGML standard in which document type definitions are created under which electronic documents are divided into blocks that are associated with logical fields specific to the type of block.
Abstract: The invention includes a markup language according to the SGML standard in which document type definitions are created under which electronic documents are divided into blocks that are associated with logical fields that are specific to the type of block. Each of many different types of electronic documents can have a record mapping to a particular environment, such as a legacy environment of a banking network, a hospital's computer environment for electronic record keeping, a lending institution's computer environment for processing loan applications, or a court or arbitrator's computer system. Semantic document type definitions for various electronic document types (including, for example, electronic checks, mortgage applications, medical records, prescriptions, contracts, and the like) can be formed using mapping techniques between the logical content of the document and the block that is defined to include such content. Also, the various document types are preferably defined to satisfy existing customs, protocols and legal rules.
TL;DR: The XRANK system is presented, designed to handle the novel features of XML keyword search, which naturally generalizes a hyperlink based HTML search engine such as Google and can be used to query a mix of HTML and XML documents.
Abstract: We consider the problem of efficiently producing ranked results for keyword search queries over hyperlinked XML documents. Evaluating keyword search queries over hierarchical XML documents, as opposed to (conceptually) flat HTML documents, introduces many new challenges. First, XML keyword search queries do not always return entire documents, but can return deeply nested XML elements that contain the desired keywords. Second, the nested structure of XML implies that the notion of ranking is no longer at the granularity of a document, but at the granularity of an XML element. Finally, the notion of keyword proximity is more complex in the hierarchical XML data model. In this paper, we present the XRANK system that is designed to handle these novel features of XML keyword search. Our experimental results show that XRANK offers both space and performance benefits when compared with existing approaches. An interesting feature of XRANK is that it naturally generalizes a hyperlink based HTML search engine such as Google. XRANK can thus be used to query a mix of HTML and XML documents.
TL;DR: The need for an XML query language for Extensible Markup Language (XML) data sources is explained, a tutorial overview of XQuery is provided, and includes several examples of its use.
Abstract: The World Wide Web Consortium has convened a working group to design a query language for Extensible Markup Language (XML) data sources. This new query language, called XQuery, is still evolving and has been described in a series of drafts published by the working group. XQuery is a functional language comprised of several kinds of expressions that can be nested and composed with full generality. It is based on the type system of XML Schema and is designed to be compatible with other XML-related standards. This paper explains the need for an XML query language, provides a tutorial overview of XQuery, and includes several examples of its use.
TL;DR: This encyclopedia is composed of millions of articles in different languages and anyone can edit an article using a wiki markup language that offers a simplified alternative to HTML.
Abstract: Wikipedia is a well known free content, multilingual encyclopedia written collaboratively by contributors around the world. Anybody can edit an article using a wiki markup language that offers a simplified alternative to HTML. This encyclopedia is composed of millions of articles in different languages.
TL;DR: In this article, a system and method for translating a document from one language to another language using different translation resources depending on the document or portion of the document being translated is presented, where the information contained within the document used to indicate different sections is encoded using Standard Generalized Markup Language (SGML) tags.
Abstract: A system and method for translating a document from one language to another language using different translation resources depending on the document or portion of the document being translated. The original document which is to be translated contains information indicating the dictionary or translation rules which are to be utilized for the translation. The information contained within the document used to indicate different sections is encoded using Standard Generalized Markup Language (SGML) tags. Documents which have been previously translated can be used to train the translation system. Also, a side-by-side display of the original document and the translated document is presented to allow the user to compare both the original and translated document.