TUG 2015 – Day 3 – first part

The day 3 was opened by Kaveh Bazargan and Jagath AR. They talked about today’s requirements of publishers who demand XML instead of mark-up within text. By the way, Kaveh Bazargan showed issues with XML. He gave examples of proper XML encoding but crazy meaning, such as embedding each letter within XML text or writing a plus-minus sign by a plus symbol with an underline tag. He reviewed the classic way of a publishing chain, from author to publisher to peer reviewer to copy editor and finally to the typesetter with possible loops. Then he showed the cloud approach, which is not so linear but more star-alike: when publishing in the cloud, the XML file is in the middle while the involved parties all directly work with that file. He showed an online editor on the River Valley Technologies platform which allows editing, reviewing and correcting with a rich online editor. There, the file is always saved in XML and rendered into HTML or PDF on-the-fly. Authors are editing XML, but TeX is used in the background. Specifically, TeX is used for pagination of XML documents and producing high-quality and even enriched PDF output with different styles from the same XML base code. Jagath AR showed some examples of enriched PDF, such as PDFs with several layers for screen color mode, black and white, and CMYK coloring, all in the same PDF file.

Joachim Schrod gave an experience report about TeX in a commercial setting:

The purpose is producing all written communication for an online bank. This means usually small documents, but lots of millions, and with legal requirements. Such document types are letters with standardized or individual content, PIN/TAN letters, account statements, credit card statements, or share notes. Some may contain forms, some contain PDF attachments of third parties. The client actually types LaTeX, but within templates: only simple LaTeX is used by the client, no math, and there are only three special characters: the backslash \, and braces {}. You can imagine that common TeX symbols can have a very different meanung in this contect: just think of $ and % in a bank. The client uses a web application basing on a reduced tinyMCE editor. Usage it has to be simple, with low latency, and it needs to be restricted for production. There are just a few special environments, tailored to the corporate identity style to ease use. The output is generated to different channels, such as to a PDF file (with letter head), printed latter (without letter head, as it’s already pre-printed on the paper), and it needs to be archived. Even after years, repeating output to a different channel without any reformatting is a legal requirement. So you need a storage strategy.

Besides manually written letters, there are jobs for automated mass production, such as producing account statements each month. Standard processing steps are

  • generate
  • format
  • output
  • archive

The engine is plugin based, using document parameters and templates. Different representations need to be produced, like draft, online with letterhead, on paper without, as mentioned above. Calibration to inhouse printer may be needed, and additionally inserted empty pages when sending to a printshop. Folding machine control has to be implemented, different archiving needs to be supported. Everything has to be done after formatting, since for the archived file is must be possible to reproduce it in every style even after 10 years on any output channel.

In this environment, they still use the classic DVI format with \special commands. Finally, there are different DVI printers for each purpose. DVI is better suited since produced documents are much smaller documents than PDF files. Storage costs money. The interactive preview has to be fast, small jobs have to be processed quickly in high amounts. So they use a TeX fmt file with preprocessed macros. There is not even a document class selection , it’s preloaded, no standard packages are used as they are preloaded with the fmt as well. Actually, packages are very short, there’s no code but just selection. Compiling a document goes down from 1.5 s to 0.06 s per document, for example, which is a factor of about 25. This is a big thing in mass production. A sample requirement is to generate and format 400.000 documents in a determined time.

To make production even more efficient, tabular material is typeset using \vbox and \hbox instead of LaTeX tabular environments. So, TeX is very fast in this regard. Jobs can be parallelized. Codes are actually piped into a TeX process. Instead of running TeX on each small file, large container files ar egenerated with like 50.000 documents for a single TeX run. The DVI file then gets split in the postprocessing. Every file, all used graphics, and all used fonts get a timestamp for storing and reproducing.

The whole process has to be robust and reliable, fast, and it should use low resources such as memory and storage. The whole setting shows, that traditional TeX is still useful today, even if it’s not in academia or publishing industry.

The next talk by S.K. Venkatesan presented TeX as a 3-stage rocket: the stages are

  • line breaking paragraphs
  • making a single long scroll page
  • cutting it in pages using a cookie-cutter algorithm

In infinitely long pages, he placed footnote simply after the paragraph.

He compared paragraph creation with CSS in HTML for browsers and generated by TeX. Furthermore, he spoke about coexistence of TeX and HTML.

After a break, Joseph Wright followed with a talk about the \parshape primitive command. He made a live demo instead of slides. The LaTeX3 team developed a new interface to \parshape based on three different concepts: margins, measure, and cutouts. He demonstrated setting margins to absolute values, and to values relative to the previous paragraph. He showed indenting lines differently within a paragraph, shaping paragraphs and produced cutouts with the new interface. An open challenge is, that it’s still line based but not based on heights of objects, or lengths.

Julien Cretel made the next talk. It was about functional data structures in TeX. At first, he explained, what Haskell is, a purely functional language. He gave a quicksort example as a demo. He aid that he wanted to do algorithmic things within TeX. One could delegate it to an external program, but we often like to use TeX no matter if this would actually be the best choice. Many of us like to solve things in TeX, instead of calling Matlab or so. At least it’s a nice intellectual pursuit.

Julien Cretel wants to implement a semantic like this in TeX

data Tree a = Empty | Node (Tree a) a (Tree a)

He asked the audience for feedback. For example, if TeX should be chosen for the implementation, or it should be done with LaTeX3. He plans to focus on a subset, wants to write algorithms in Haskell and then translate to TeX or LaTeX code.

So it was more an open discussion than a presentation. Maybe we can see an implementation next year. Comments from the audience were:

  • It’s easier to implement haskell in TeX than to implement TeX in Haskell
  • Why to implement it in in TeX, if you have Haskell already? Well, as above, for the challenge.
  • Arthur Reutenauer suggested, to work with TeX trees which are used for implementing hyphenation
  • Somebody said, TeX is Turing complete… we know, but it doesn’t help in the *how*.
  • And LuaTeX? Lua is imperative, not functional.

More talks followed, and the banquet in the evening, so I still have something to tell. Just for today I need to make a break, and I send this first part already to the news portal. A lot to speak about, interesting things still to come in the next post soon.