TL;DR: On-going work on a system that transforms page-oriented document images into “reflowable document images”, representations of the page image in HTML format that allows it to adapt to display devices of different sizes while preserving the original appearance of the image as much as possible and avoiding OCR errors is described.
Abstract: The paper describes on-going work on a system that transforms page-oriented document images into “reflowable document images”, representations of the page image in HTML format that allows it to adapt to display devices of different sizes while preserving the original appearance of the image as much as possible and avoiding OCR errors. The approach to document layout analysis used by the system is outlined and the strengths and limitations of HTML for this application are discussed.
TL;DR: A tool used for automatically testing the typesetting effect of the same reflowable document in a variety of office software was designed and both the test efficiency and the accuracy of testing results are better than that performed by manual testing.
Abstract: In order to test the typesetting effect of the same reflowable document in a variety of office software, and point out the difference between the typesetting effect, a tool used for automatically testing the typesetting effect of the reflowable document was designed. With the testing tool, a document to be tested is firstly transformed into an image conforming to test requirements. Next, an automated color marking method is designed for the document in the format of OOXML to look up a reverse correlation between layout objects and typesetting elements by means of the editable feature of the reflowable document, while regularization analysis is utilized to analyze documents in other formats based on the image features of typesetting elements. At last, the effect of typesetting elements is analyzed at the pixel level. According to the experimental result, the test contents using the tool designed in this paper can reach 70% of the level-1 common function points. What's more, both the test efficiency and the accuracy of testing results are better than that performed by manual testing.
TL;DR: In this paper, a computer-based method for creating a high fidelity page layout document is presented, which includes assigning an identifier to each element of a plurality of elements in a reflowable document.
Abstract: A computer-based method for creating a high fidelity page layout document is provided. The method includes assigning an identifier to each element of a plurality of elements in a reflowable document, creating a fixed page layout document, including the identifiers, from the reflowable document, parsing the fixed page layout document into a plurality of elements based on the identifiers, each element being associated with an identifier, linking the elements of the reflowable document to the elements of the fixed page layout document based on the identifiers, and creating a final document based on the reflowable document, the fixed page layout document and the identifiers, each element of the final document having a fixed position on a page.