About: TUGboat is an academic journal published by The TeX Users Group. The journal publishes majorly in the area(s): Computer science & Medicine. It has an ISSN identifier of 0896-3207. Over the lifetime, 132 publications have been published receiving 93 citations. The journal is also known as: Tug boat.
TL;DR: This work analyzes how the semantic enrichment of formulae improves the format conversion process and shows that considering the textual context of formULae reduces the error rate of such conversions.
Abstract: Mathematical formulae represent complex semantic information in a concise form. Especially in Science, Technology, Engineering, and Mathematics, mathematical formulae are crucial to communicate information, e.g., in scientific papers, and to perform computations using computer algebra systems. Enabling computers to access the information encoded in mathematical formulae requires machine-readable formats that can represent both the presentation and content, i.e., the semantics, of formulae. Exchanging such information between systems additionally requires conversion methods for mathematical representation formats. We analyze how the semantic enrichment of formulae improves the format conversion process and show that considering the textual context of formulae reduces the error rate of such conversions. Our main contributions are: (1) providing an openly available benchmark dataset for the mathematical format conversion task consisting of a newly created test collection, an extensive, manually curated gold standard and task-specific evaluation metrics; (2) performing a quantitative evaluation of state-of-the-art tools for mathematical format conversions; (3) presenting a new approach that considers the textual context of formulae to reduce the error rate for mathematical format conversions. Our benchmark dataset facilitates future research on mathematical format conversions as well as research on many problems in mathematical information retrieval. Because we annotated and linked all components of formulae, e.g., identifiers, operators and other entities, to Wikidata entries, the gold standard can, for instance, be used to train methods for formula concept discovery and recognition. Such methods can then be applied to improve mathematical information retrieval systems, e.g., for semantic formula search, recommendation of mathematical content, or detection of mathematical plagiarism.
TL;DR: This talk presents a report on work-in-progress, aimed at developing the primitive commands for pdfTEX needed to support the production of fully tagged PDF documents, and writing appropriate TEX and LTEX macros to make effective use of the new primitives.
Abstract: Recently PDF has been accepted as a standard for production of electronic documents, as ISO 32000-1:2008, with an acronym of PDF/UA (for “Universal Accessibility”). The second draft ISO 32000-2:2009 is to include specifications for including MathML tagging of mathematical environments and expressions. This talk presents a report on work-in-progress, aimed at: developing the primitive commands for pdfTeX needed to support the production of fully tagged PDF documents; writing appropriate TeX and LaTeX macros to make effective use of the new primitives; authoring changes to internal LaTeX structures to use these macros automatically at appropriate places within the existing code-base for LaTeX. This is work that is being undertaken together with Han Thế Thanh, author of pdfTeX [Thanh, Han Thế; Thesis – pdfTeX, published as: TUGboat, 21:4, (2000). http://www.tug.org/TUGboat/Contents/contents21-4.html], who has added some new primitive commands to an experimental version of this software tool.
TL;DR: The background to this multi-year project to enhance L A TEX to fully and naturally support the creation of structured document formats, in particular the “ tagged PDF ” format as required by accessibility standards such as PDF/UA is outlined.
TL;DR: There is quite some room left for improvement in the proposed solution to the challenges of a production system for mathematical serials with both an electronic and paper version.
Abstract: We present the recent development of a production system for mathematical serials with both an electronic and paper version. The challenges were many: (i) no house style layout should be imposed, as the journals come from different publishing houses and may have very different typographical options; (ii) produce screen-optimised and printer-friendly output at once; (iii) avoid any duplication of information so that every aspect of the publications are always in sync (Web site metadata, table of contents. . . ), thus (iv) generate on the fly article's page numbers, XML metadata at the published volume level from one master LATEX source file tree. Using available technology (pdflatex, pdfpages.sty and \write18), the proposed solution to these problems appeared amazingly simple and easy to use. However, we'll show that there is quite some room left for improvement.