Go to the first, previous, next, last section, table of contents.

Format conversions

Conversion between (La)TeX and others

troff
troff-to-latex (available as support/troff-to-latex), written by Kamal Al-Yahya at Stanford University (California, USA), assists in the translation of a troff document into LaTeX format. It recognises most -ms and -man macros, plus most eqn and some tbl preprocessor commands. Anything fancier needs to be done by hand. Two style files are provided. There is also a man page (which converts very well to LaTeX...). The program is copyrighted but free. An enhanced version of this program, tr2latex, is available in support/tr2latex

The DECUS TeX distribution (see sources of software) also contains a program which converts troff to TeX.

WordPerfect
wp2latex (available as support/wp2latex) is a PC program written in Turbo Pascal by R. C. Houtepen at the Eindhoven University in the Netherlands. It converts WordPerfect 5.0 documents to LaTeX. Pascal source is included. Users find it ``helpful'' and ``decent'' in spite of some limitations. It gets high marks for handling font changes, but cannot make indices, tables of contents, margins or graphics, and can't handle features new in WordPerfect version 5.1, in particular the equation formatter. The program is copyrighted but free.

Glenn Geers of the University of Sydney, Australia (glenn@qed.physics.su.oz.au) is translating wp2latex into C and adding some WordPerfect 5.1 features, in particular its equation handling. His work is in the glenn subdirectory of support/wp2latex

PC-Write
pcwritex.arc, from support/pcwritex, is a print driver for PC-Write that ``prints'' a PC-Write V2.71 document to a TeX-compatible disk file. It was written by Peter Flynn at University College, Cork, Republic of Ireland.
runoff
Peter Vanroose's (vanroose@esat.kuleuven.ac.be) conversion program is written in VMS Pascal. The sources and a VAX executable are available from support/rnototex
refer/tib
There are a few programs for converting bibliographic data between BibTeX and refer/tib formats. They are in biblio/bibtex/utils/refer-tools

In spite of the directory name, it also contains a shell script to convert BibTeX to refer as well. The collection is not maintained.

RTF
A program for converting Microsoft's Rich Text Format to TeX is available in support/rtf2tex, which was written and is maintained by Robert Lupton (rhl@astro.princeton.edu). There is also a convertor to LaTeX by Erwin Wechtl, in support/rtf2latex

Translation to RTF may be done (for a somewhat constrained set of LaTeX documents) by TeX2RTF, which can produce ordinary RTF, Windows Help RTF (as well as HTML, conversion to HTML). TeX2RTF is supported on various Unix platforms and under Windows 3.1; it is available from support/tex2rtf

Microsoft Word
A rudimentary program for converting MS-Word to LaTeX is wd2latex, for MS-DOS (dviware/wd2latex); a better idea, however, is to convert the document to RTF format and use one of the RTF converters mentioned above.

A FAQ that deals specifically with conversions between TeX-based formats and word processor formats is regularly posted to comp.text.tex, is available via http://www.kfa-juelich.de/isr/1/texconv/texconv.html and is archived as help/wp-conv/wp-conv.zip

A group at Ohio State University (USA) is working on a common document format based on SGML, with the ambition that any format could be translated to or from this one. FrameMaker provides ``import filters'' to aid translation from alien formats (presumably including TeX) to Framemaker's own.

Conversion from (La)TeX to plain ASCII

The aim here is to emulate the Unix nroff, which formats text as best it can for the screen, from the same input as the Unix typesetting program troff.

Ralph Droms (droms@bucknell.edu) has a style file and a program that provide the LaTeX equivalent of nroff, though it doesn't do a good job with tables and mathematics. The software is available in support/txt; the original dvi2tty often does an acceptable job and is available in dviware/dvi2tty

Another possibility is to use screen.sty (available as macros/latex209/contrib/misc/screen.sty). Use a dvi2tty program of some kind; you might try dviware/crudetype as well. Another possibility is to use the LaTeX-to-ASCII conversion program, l2a (support/l2a), although this is really more of a de-TeXing program.

The canonical de-TeXing program is detex (support/detex), which removes all comments and control sequences from its input before writing it to its output. Its original purpose was to prepare input for a dumb spelling checker.

Conversion from SGML or HTML to TeX

SGML is a very important system for document storage and interchange, but it has no formatting features; its companion ISO standard DSSSL (http://www.jclark.com/dsssl/) is designed for writing transformations and formatting, but this has not yet been widely implemented. Some SGML authoring systems (e.g., SoftQuad Author/Editor) have formatting abilities, and there are high-end specialist SGML typesetting systems (e.g., Miles33's Genera). However, the majority of SGML users probably transform the source to an existing typesetting system when they want to print. TeX is a good candidate for this. There are three approaches to writing a translator:

  1. Write a free-standing translator in the traditional way, with tools like yacc and lex; this is hard, in practice, because of the complexity of SGML.
  2. Use a specialist language designed for SGML transformations; the best known are probably Omnimark and Balise. They are expensive, but powerful, incorporating SGML query and transformation abilities as well as simple translation.
  3. Build a translator on top of an existing SGML parser. By far the best-known (and free!) parser is James Clark's nsgmls, and this produces a much simpler output format, called ESIS, which can be parsed quite straightforwardly (one also has the benefit of an SGML parse against the DTD). Two good public domain packages use this method: Both of these allow the user to write `handlers' for every SGML element, with plenty of access to attributes, entities, and information about the context within the document tree.

    If these packages don't meet your needs for an average SGML typesetting job, you need the big commercial stuff.

Since HTML is simply an example of SGML, we do not need a specific system for HTML. However, Nathan Torkington (Nathan.Torkington@vuw.ac.nz) developed html2latex from the HTML parser in NCSA's Xmosaic package. The program takes an HTML file and generates a LaTeX file from it. The conversion code is subject to NCSA restrictions, but the whole source is available as support/html2latex

Michel Goossens and Janne Saarela published a very useful summary of SGML, and of public domain tools for writing and manipulating it, in TUGboat 16(2).

(La)TeX conversion to HTML

TeX is a typesetting language, not a markup system. With properly-used LaTeX, you may be luckier, but don't expect a free lunch. Remember that a) if you want a really good Web document, you had better redesign it from scratch, and b) HTML (even HTML3) has pretty poor `typesetting' facilities, and anything beyond the trivial will probably need to end up a graphic.

LaTeX2HTML (support/latex2html) is a package by Nikos Drakos (mostly of perl scripts) that breaks up a LaTeX document into one or more components, and links them together so that they can be read over the World-Wide Web as an hypertext document. It defines a mapping between LaTeX intra-document references and hyperlinks, and extends the mechanisms to permit reference to other (possibly remote) documents and other Internet resources. It translates LaTeX accented and other characters (as best it can) to things that World-Wide Web browsers can display, and translates mathematics (and other things that browsers can't deal with) to images that can be loaded in-line into the hypertext document.

LaTeX2HTML needs Perl, the PBM utilities, dvips, GhostScript, and other sundries; it assumes it is running on a Unix system. Michel Goossens and Janne Saarela published a detailed discussion of LaTeX2HTML, and how to tailor it, in TUGboat 16(2).

There are two alternative strategies:

  1. Free-standing LaTeX to HTML translations. Hard, but not impossible. Julian Smart's latex2rtf (available from support/latex2rtf) does a plausible job on a subset of LaTeX;
  2. Writing an HTML-output backend in LaTeX itself. See Sebastian Rahtz' paper in TUGboat 16(3) for a discussion of how to go about this for the general case of SGML.

Making hypertext documents from TeX

If you want on-line hypertext with a (La)TeX source, probably on the World Wide Web, consider four technologies (which overlap):

  1. Try direct LaTeX conversion to HTML; see (La)TeX conversion to HTML;
  2. Rewrite your document using Texinfo (see Texinfo macro package), and convert that to HTML;
  3. Look at Adobe Acrobat, an electronic delivery system guaranteed to preserve your typesetting perfectly. See Making Acrobat documents from LaTeX;
  4. Invest in the hyperTeX conventions (standardised \special commands); there are supporting macro packages for plain TeX and LaTeX).

The HyperTeX project aims to extend the functionality of all the LaTeX cross-referencing commands (including the table of contents) to produce \special commands which are parsed by DVI processors conforming to the HyperTeX guidelines; it provides general hypertext links, including those to external documents.

The HyperTeX specification says that conformant viewers/translators must recognize the following set of \special commands:

href:
html:<a href = "href_string">
name:
html:<a name = "name_string">
end:
html:</a>
image:
html:<img src = "href_string">
base_name:
html:<base href = "href_string">

The href, name and end commands are used to do the basic hypertext operations of establishing links between sections of documents.

Further details are available on http://xxx.lanl.gov/hypertex/; there are two commonly-used implementations of the specification, a modified xdvi and (recent releases of) dvips. Output from the latter may be used in recent releases of GhostScript or Acrobat Distiller.

Making Acrobat documents from LaTeX

There are now two general routes to Acrobat output: Adobe's original `distillation' route, and the use of PDFTeX (see the PDFTeX project).

For simple documents (with no hyper-references), you can either

(Note that the PDFwriter route is a dead end: it can only be used in this simple mode, as it cannot create hyperlinks.)

To translate all the LaTeX cross-referencing into Acrobat links, you need a LaTeX package to suitably redefine the internal commands. There are two of these for LaTeX2e, both based on the HyperTeX specification (see Making hypertext documents from TeX): Sebastian Rahtz's hyperref (available from macros/latex/contrib/supported/hyperref), and Michael Mehlich's hyper (available from macros/latex/contrib/supported/hyper). Hyperref can operate using PDFTeX primitives rather than the hyperTeX conventions. You can use dvips or Y&Y's \PROGNAME|DVIPSONE| to translate the DVI into PostScript acceptable to Distiller.

Sadly, there is no free implementation of all of Distiller's functionality, but GhostScript (version 4.00 onwards) provides some restricted distilling capability, and Distiller itself is now remarkably cheap (for academics at least).

Adobe's Acrobat Reader is available for a very wide range of platforms. For those still omitted, GhostScript (versions 3.51 onwards) can display and print PDF files.

Work on a DVI to PDF translator is in progress, but shows no sign of immediate release (Sergey Lesenko spoke about the work at TUG'96 and again at TUG'97).


Go to the first, previous, next, last section, table of contents.