3. The qwertz Document Type Definition

All of the qwertz document "styles", except bibliographies, are defined in a single SGML document type definition (DTD), called qwertz. It is essentially a SGML reconstruction of Lamport's LaTeX [Lamport86]. We have not attempted to include every feature of LaTeX in this DTD, but have included the features we use regularly. Others may of course find that something they deem important is missing. We welcome suggestions for improvements or extensions.

We will be making use of several parameter entities in this DTD:

<!entity % emph
" em | it | bf | sf | sl | tt " >

<!entity % xref
" label | ref | pageref | cite | ncite " >

<!entity % inline
" (#pcdata | f | x | %emph; | sq | %xref)* " >

<!entity % list
" list | itemize | enum | descrip " >

<!entity % par
"  %list; | comment | lq " >

<!entity % mathpar " dm | eq " >

<!entity % thrm
" def | prop | lemma | coroll | proof | theorem " >

<!entity % litprog " code | verb " >

<!entity % sectpar
" %par; | figure | tabular | table | %mathpar; |
%thrm; | %litprog; ">


These are just macros used in the definitions of various elements, to avoid retyping and to ease maintenance. The emph parameter lists the various kinds of emphasis. The inline parameter is for the elements which may be used anywhere within the document. The list parameter is for various kinds of lists. par lists several basic kinds of elements at the level of paragraphs. The mathpar parameter includes the elements for displayed mathematical formulas. The thrm parameter is for the set of elements used to represent such things as definitions, theorems and proofs. The litprog parameter is for literate programming elements. Finally, the sectpar parameter lists the elements which may occur at the level of paragraphs within sections (or chapters). Notice that this parameter uses other parameters.

Several kinds of documents may be written using LaTeX: articles, reports, books, letters and slide (or transparency) presentations. The qwertz DTD includes two others as well: notes, for documents such as notes to yourself which do not require a title, sections, footnotes and the like; and manpage, for Unix manual pages.

<!element qwertz o o
(sect | chapt | article | report |
book | letter | telefax | slides | notes | manpage ) >


Notice that sections (sect) and chapters (chapt) may also be processed separately, before being put together into an article, report or book.

LaTeX also includes BibTeX, a program for creating bibliographies whose entries can be easily cited in LaTeX documents. The qwertz document type for this purpose is described in Chapter 5.

3.1 General Purpose Entities and Elements

This section describes the SGML entities and elements available in all qwertz documents.

<!entity % general system -- general purpose characters -- >
%general;


3.1.1 Characters Entities

Most characters are created just by typing the character wanted on the keyboard. This simple method does not suffice when the character wanted isn't in the character set available, or at least not associated with a key on the keyboard, or when the character currently has special meaning to SGML or, perhaps, TeX. In this section, a fairly large number of general purpose character entities will be presented. Symbols and characters which may be used only in mathematical formulas will be discussed separately, in section math .

When may it be necessary to use of an entity reference to produce some character? There are three cases to watch out for:

SGML Concrete Syntax Delimiters.

Although the SGML standard allows alternative concrete syntaxes to be defined, we use the so-called reference concrete syntax in the qwertz document types. In this reference syntax, < is the start tag open character, and </ is the end tag open delimiter. The other SGML delimiter authors should be aware of is &, the entity reference open delimiter of the reference syntax.

The appropriate entity to use to generate these characters depends on the context. Normally, use lt to represent < and amp to get &, when these appear in strings which might otherwise be interpreted as starting tags or entity references. However, within the code or verb elements for literate programming, described in section litprog , use the ero entity to represent & and the etago entity for the sequence </.

<!entity   lt    sdata "<" >
<!entity   amp   sdata "&" >
<!entity   ero   sdata "&" >
<!entity   etago sdata "</" >


SGML Short Reference Delimiters.

In SGML document types short reference maps may be defined which allow single characters to be interpreted as arbitrarily complex sequences of characters, including SGML tags and entity references. Thus, to know precisely when a certain character will be interpreted literally or as a short reference (i.e. macro) for something else, one has to know which map is in effect in the context of the current element. Just about all punctuation characters which are not used as delimiters in the concrete syntax can be used as short reference delimiters:

" # % ' ( ) * + , - : ; = @ [ ] ^ _ { | } ~


For each of these characters, there is an SGML entity which may be used to generate the ASCII character in the printed document, listed in table GPC . Usually, it will not be necessary to use these entities; the character can simply be typed and will be interpreted literally. However, if the results are not as expected, check to see if there is a map in effect at that point in the document in which the character has been redefined. As maps are associated with elements, the section in this manual describing an element will also direct you to a description of the applicable map, if there is one.

As it turns out, one important use of character maps is to generate exactly the character typed in the printed document. That is, the map is used to hide the special meaning of the character to the underlying formatter (e.g. TeX), replacing the character with the formatting instructions for generating the character. This has been the main use of maps in our qwertz document type definitions.

<!entity   dquot sdata "&quot;" >
<!entity   num  sdata "#"    >
<!entity   percnt sdata "%" >
<!entity   quot sdata "'"  >
<!entity   lpar sdata "&lpar;"   >
<!entity   rpar sdata "&rpar;"   >
<!entity   ast  sdata "&ast;"    >
<!entity   plus sdata "&plus;"   >
<!entity   comma sdata "&comma;" >
<!entity   hyphen sdata "&hyphen;"  >
<!entity   colon sdata "&colon;" >
<!entity   semi  sdata "&semi;"  >
<!entity   equals sdata "&equals;" >
<!entity   commat sdata "&commat;" >
<!entity   lsqb  sdata "["  >
<!entity   rsqb  sdata "]"  >
<!entity   circ sdata "&circ;"   >
<!entity   lowbar sdata "_" >
<!entity   lcub  sdata "{"     >
<!entity   verbar sdata "|" >
<!entity   rcub  sdata "}"    >
<!entity   tilde sdata "~"   >


TeX Special Characters.

Ideally, it should be possible to hide the conventions of the underlying formatting system completely. In fact, SGML parsers which implement the full ISO standard have a feature which makes this possible. However, the SGML parser we are using does not include this feature: the only characters which can serve as short references are the characters allowed for this purpose by the reference concrete syntax. Unfortunately, this reference syntax does not allow &, $and \ to be used as short references, which are all special TeX characters. Thus, the entities for these three characters (amp, dollar and bsol) must usually be used to produce them. (The$ and \ characters may be used directly within the verb and code elements, discussed below in section litprog . Also, within these elements use the ero entity to represent & in strings which might otherwise be interpreted as entity references.)

<!entity   bsol sdata "\" >
<!entity   dollar sdata "$" >  3.1.2 Spacing, Dashes and Ellipsis The meaning of the ordinary space character is context sensitive. Sometimes there is a space within a single word. Such spaces can be typed using the nonbreakable space (nbsp) entity to avoid breaking the word at that point at the end of line. There are also contexts where one wants a certain amount of space to appear, without it being regarded by the formatter as being space which may be shrunk in order to clean-up the arrangement of words or characters on the line. There are three entities for this purpose: emsp denotes the amount of horizontal space required for the character "M". An ensp is just half as wide as an emsp, and a thin space (thinsp) is 1/6 of an emsp. Notice that these are relative amounts, depending on the font being used. There are also three different kinds of dashes: hyphen, which was already mentioned above, is to be used for intra-word dashes, as in the word "intra-word". However, the hyphen entity was not actually necessary here, as the - character was not being used in this context as a short reference. ndash is to be used for number ranges, such as "23–56", and mdash is an alternative delimiter for parenthetical comments — certainly you've seen them used this way — perhaps to avoid too frequent use of commas or parentheses. <!entity nbsp sdata "~" > <!entity emsp sdata "&emsp;" > <!entity ensp sdata "&ensp;" > <!entity thinsp sdata "&thinsp;" > <!entity mdash sdata "&mdash;" > <!entity ndash sdata "&ndash;" > <!entity hellip sdata "&hellip;" >  3.1.3 Foreign Languages There are a large set of entities for other Western European languages. Altogether, there are entities for almost all of the foreign language characters in ISO 8859, the Latin 1 character set for Western European languages. Only the four Icelandic characters are missing. Conveniently, these entities are all available in the usual Adobe PostScript fonts, as well as in TeX. Thus, all of the entities defined here can be printed in TeX, on PostScript printers, or displayed on any Latin 1 device. Depending on the computer and editor, it may also be possible to type these Latin 1 characters directly, instead of having to use these entities. A simple filter could translate Latin 1 files into ASCII files, replacing non-ASCII characters by entity references. The entity names chosen here for these characters conform to the SGML standard.  <!entity aacute sdata '&aacute;' > <!entity Aacute sdata '&Aacute;' > <!entity acirc sdata '&acirc;' > <!entity Acirc sdata '&Acirc;' > <!entity agrave sdata '&agrave;' > <!entity Agrave sdata '&Agrave;' > <!entity aring sdata '&aring;' > <!entity atilde sdata '&atilde;' > <!entity Atilde sdata '&Atilde;' > <!entity auml sdata '&auml;' > <!entity Auml sdata '&Auml;' > <!entity aelig sdata '&aelig;' > <!entity AElig sdata '&AElig;' > <!entity ccedil sdata '&ccedil;' > <!entity Ccedil sdata '&Ccedil;' > <!entity eacute sdata '&eacute;' > <!entity Eacute sdata '&Eacute;' > <!entity ecirc sdata '&ecirc;' > <!entity egrave sdata '&egrave;' > <!entity Egrave sdata '&Egrave;' > <!entity euml sdata '&euml;' > <!entity Euml sdata '&Euml;' > <!entity iacute sdata '&iacute;' > <!entity Iacute sdata '&Iacute;' > <!entity icirc sdata '&icirc;' > <!entity Icirc sdata '&Icirc;' > <!entity igrave sdata '&igrave;' > <!entity Igrave sdata '&Igrave;' > <!entity iuml sdata '&iuml;' > <!entity Iuml sdata '&Iuml;' > <!entity ntilde sdata '&ntilde;' > <!entity Ntilde sdata '&Ntilde;' > <!entity oacute sdata '&oacute;' > <!entity Oacute sdata '&Oacute;' > <!entity ocirc sdata '&ocirc;' > <!entity Ocirc sdata '&Ocirc;' > <!entity ograve sdata '&ograve;' > <!entity Ograve sdata '&Ograve;' > <!entity oslash sdata '&oslash;' > <!entity Oslash sdata '&Oslash;' > <!entity otilde sdata '&otilde;' > <!entity ouml sdata '&ouml;' > <!entity Ouml sdata '&Ouml;' > <!entity szlig sdata '&szlig;' > <!entity uacute sdata '&uacute;' > <!entity Uacute sdata '&Uacute;' > <!entity ucirc sdata '&ucirc;' > <!entity ugrave sdata '&ugrave;' > <!entity Ugrave sdata '&Ugrave;' > <!entity uuml sdata '&uuml;' > <!entity Uuml sdata '&Uuml;' > <!entity yacute sdata '&yacute;' > <!entity Yacute sdata '&Yacute;' > <!entity yuml sdata '&yuml;' >  The qwertz document types were developed in a German research center, so we have included entities for the German characters with shorter names than the entity names used in the SGML standard. Notice that these are just synonyms for the standard entities, which are also included. <!entity Ae '&Auml;' > <!entity ae '&auml;' > <!entity Oe '&Ouml;' > <!entity oe '&ouml;' > <!entity Ue '&Uuml;' > <!entity ue '&uuml;' > <!entity sz '&szlig;' >  3.1.4 Other Symbols Finally, there are entities for a few miscellaneous symbols, such as §, ¶, (c), ¬, ÷, ±, ×, and μ. All of these entities name symbols in the Latin 1 character set. They may be used anywhere within a document. (In particular, the mathematical symbols shown here need not be within one of the formula elements described below, in section math .) The entity names for these, and all the other character entities discussed above, are listed in table GPC . A document which does not include mathematical formulas or graphics and which uses only the character entities defined in this chapter can be displayed or printed using a single Latin 1 font. <!entity gt sdata "&gt;" > <!entity sect sdata "&sect;"> <!entity para sdata "&para;"> <!entity copy sdata "(c)"> <!entity iexcl sdata "&iexcl;" > <!entity iquest sdata "&iquest;" > <!entity cent sdata "&cent;" > <!entity pound sdata "£" > <!entity not sdata "&not;" > <!entity divide sdata "&divide;" > <!entity plusmn sdata "&plusmn;" > <!entity times sdata "&times;" > <!entity mu sdata "&mu;" >  Table 3.1: General Purpose Characters AElig Æ Aacute Á Acirc Â Ae Ä Agrave À Atilde Ã Auml Ä Ccedil Ç Eacute É Egrave È Euml Ë Iacute Í Icirc Î Igrave Ì Iuml Ï Ntilde Ñ Oacute Ó Ocirc Ô Oe Ö Ograve Ò Oslash Ø Ouml Ö Uacute Ú Ue Ü Ugrave Ù Uuml Ü Yacute Ý aacute á acirc â ae ä aelig æ agrave à amp & aring å ast * atilde ã auml ä bsol \ ccedil ç cent ¢ circ ˆ colon : comma , commat @ copy (c) divide ÷ dollar$ dquot "
eacute é ecirc ê egrave è emsp
ensp equals = euml ë gt >
hellip … hyphen ‐ iacute í icirc î
iexcl ¡ igrave ì iquest ¿ iuml ï
lcub { lowbar _ lpar ( lsqb [
lt < mdash — mu μ nbsp
ndash – not ¬ ntilde ñ num #
oacute ó ocirc ô oe ö ograve ò
oslash ø otilde õ ouml ö para ¶
percnt % plus + plusmn ± pound £
quot ' rcub } rpar ) rsqb ]
sect § semi ; sz ß szlig ß
thinsp tilde ~ times × uacute ú
ucirc û ue ü ugrave ù uuml ü
verbar | yacute ý yuml ÿ

3.1.5 Sentences, Paragraphs, Emphasis and Quotations

Sentences need not be marked up with tags. There is no sentence element as such. Rather, these are marked implicitly using the usual conventions for beginning and ending sentences.

Paragraphs are delimited with the p tag. Both the starting tag and ending tag are optional.

<!element p o o ( %inline | %sectpar )+ >
<!entity ptag '<p>' >
<!entity psplit '</p><p>' >

<!shortref pmap
"&#RS;B" null
"&#RS;B&#RE;" psplit
"&#RS;&#RE;" psplit
'"' qtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar >

<!usemap pmap p>


Sentences or phrases within paragraphs can be emphasized in a number of ways. The em tag is used to choose the default form of emphasis, which is usually italic type, but depends on the style of the background text. If the background text is formatted in italics type, as it usually is in definitions, for example, than emphasized text will be formatted using a plain, roman typeface. However, various forms of emphasis can be explicitly chosen. These include: bold face (bf), italics (it), sans serif (sf), slanted (sl), and typewriter (tt) styles.

<!element em - - (%inline)>
<!element bf - - (%inline)>
<!element it - - (%inline)>
<!element sf - - (%inline)>
<!element sl - - (%inline)>
<!element tt - - (%inline)>


The tt element simulates a "typewriter". That is, with a couple of exceptions, characters are printed exactly as they appear on the display. This is useful for including small segments of computer code within paragraphs. See the section on literate programming for more information, litprog .

Sentences within paragraphs can be quoted using the short quote, (sq) tag, as in <sq>The rain in Spain falls mainly on the plain.</>, but this is usually not necessary. In most contexts where one will want to use quotations, there is a map allowing the " symbol to be used as a short reference for both the starting and ending sq tags. So one can just type: "The rain in Spain falls mainly on the plain."

Quotations extending over a number of paragraphs are marked using the long quote (lq) element. Long quotes are formatted in LaTeX by indenting the left and right margins. For example, [Lamport86, pp. xiii]:

The LaTeX document preparation system is a special version of Donald Knuth's TeX program. TeX is a sophisticated program designed to produce high-quality typesetting, especially for mathematical text. …

LaTeX represents a balance between functionality and ease of use. Since I implemented most of it myself, there was also a continual compromise between what I wanted to do and what I could do in a reasonable amount of time. …

<!element sq - - (%inline)>

<!entity   ftag     '<f>'    -- formula begin -- >
<!entity   qendtag  '</sq>'>

<!shortref sqmap
"&#RS;B" null
'"' qendtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar >

<!usemap   sqmap    sq >

<!element lq - - (p*)>


3.1.6 Lists

Four types of lists are supported, which differ according to the type of label used to mark each item in the list. Use itemize to create a list in which each item is marked with some symbol such as a dash or bullet. The enum tag is used to create an enumeration, i.e. a list in which each item is labelled with a number (or letter) indicating its rank or position in the list. The list type of list does not label the items at all. Finally, use descrip to create a list in which each item is labelled by some tag of your own choice. Lists of various types can nested. For example:

<itemize>
<item>
A level one item.
<item> Here's level two:
<enum>
<item> A level two item.
<item> Here's level three:
<enum>
<item> A level three item.
<item>Here's level four:
<descrip>
<tag/Red./  Is the color of my true love's hair.
<tag/Blue./  Is a property of some movies.
<tag/Yellow./  Characterizes some forms of journalism.
</descrip>
<item>A last level three item
</enum>
<item>A last level two item
</enum>
<item>A last level one item.
</itemize>


This is formatted by LaTeX as:

• A level one item.
• Here's level two:
1. A level two item.
2. Here's level three:
1. A level three item.
2. Here's level four:
Red.

Is the color of my true love's hair.

Blue.

Is a property of some movies.

Yellow.

Characterizes some forms of journalism.

3. A last level three item
3. A last level two item
• A last level one item.

<!element itemize - - (item+)>
<!element list - - (item+)>
<!element enum - - (item+)>
<!element descrip - - ((tag?, (%inline; | %sectpar;)*, p*)+) >
<!element item o o ((%inline; | %sectpar;)*, p*) >
<!element tag - o (%inline)>
<!usemap global (list,itemize,enum,descrip)>


For reasons having to do with our translation into LaTeX, line feeds within tag elements are translated into spaces, using the oneline short reference map:

<!entity space " ">
<!entity null "">
<!shortref oneline
"&#RS;&#RE;" null
"&#RS;B&#RE;" null
'"' qtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>
<!usemap oneline tag>


3.1.7 Figures and Tables

Figures and tables are floating elements; they may appear at a different location in the printed version of the document than in the input file. There is a location (loc) attribute, which can be used to influence the location chosen by the formatter. The value of the loc attribute is a string of up to four letters, where each letter declares a location at which the figure or table may appear, as follows:

h.

At the same relative location as it appears in the SGML input file (i.e. here).

t.

At the top of a page.

b.

At the bottom of a page.

p.

On a separate page containing only figures and tables.

The default value of the loc attribute is tbp.

A figure is a graphic combined with an optional caption. Two types of figures are currently supported. The first, and easiest, is to use the eps tag to include an Encapsulated PostScript file in the document. Encapsulated PostScript files are centered horizontally on the page. The size of the graphic is its "natural" size; i.e. the size it would have if printed directly on a PostScript printer. You need only know the name of the file containing the graphic.

Encapsulated PostScript graphics can be created using a variety of different editors. If you are using Unix with an X11-based graphical user-interface, you may want to try idraw, which stores its documents directly as Encapsulated PostScript files. Other interesting X11-based drawing program are xfig and tgif.

For example, to include the graphic contained in an Encapsulated PostScript file named issues.ps, you would type:

<figure>
<eps file="issues">
<caption>An <tt>idraw</> Drawing </>
</figure>


Which would then appear as in figure issues .

Notice that the ".ps" extension is not to be included in the file attribute of the eps element, but that the actual file must include the ".ps" extension.

The second possibility is to use the placeholder (ph) tag to leave space in which to later paste the graphic, in the old, reliable manner. For example, to leave 10 cm space for some graphic, type:

<figure>
<ph vspace="10cm">
</figure>


Be sure not to leave a space between the number and the unit of measurement used, which may be cm, mm or in.

<!element figure - - ((eps | ph ), caption?)>
<!attlist figure
loc cdata "tbp">

<!element eps - o empty  >
<!attlist eps
file cdata #required>
<!element ph - o empty >
<!attlist ph
vspace cdata #required>

<!element caption - o (%inline)>

<!usemap oneline caption>


Next, there is a tabular element. Using LaTeX, tabulars must be small enough to fit on a single page. The current tabular element has been kept quite simple. It certainly does not (yet) offer all the flexibility of LaTeX. However, it may well be that it is sufficient for most users. More complex tables can, depending on your choice of formatters, be created using LaTeX or Unix's tbl program, with the x element, or with any program capable of generating Encapsulated PostScript, which can then be included using an eps element.

A tabular consists of a number of rows, separated by the rowsep element, each of which consists of a number of columns separated by the colsep element.

The format of the tabular is controlled by the column alignment (ca) attribute. For each column in the tabular there is a letter in the ca attribute: 1) c for centered; 2) l for flush left; or 3) r for flush right. In addition, | can be used to insert vertical lines running the complete height of the table. This will be made clear in the example which is coming shortly.

First, however, let me describe the short reference map defined for tabulars. Rather than typing <colsep> and <rowsep> explicitly, one can just type | to separate columns, and @ to separate rows. Also, within tabulars, [ can be used to start a mathematical formula, and " starts short quotes as usual. (The other short references just hide any special meaning the character may have to TeX.)

<!entity % tabrow "(%inline, (colsep, %inline)*)" >
<!element tabular - -
(%tabrow, (rowsep, hline?, %tabrow)*, caption?) >

<!attlist tabular
ca cdata #required>

<!element rowsep - o empty>
<!element colsep - o empty>
<!element hline  - o empty>

<!entity rowsep "<rowsep>">
<!entity colsep "<colsep>">

<!shortref tabmap
"&#RE;" null
"&#RS;&#RE;" null
"&#RS;B&#RE;" null
"&#RS;B" null
"B&#RE;" null
"BB"  null
"&#SPACE;" null
"&#TAB;" null
"@" rowsep
"|" colsep
"[" ftag
'"' qtag
"_" thinsp
"~" nbsp
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub >

<!usemap  tabmap tabular>


The hline element can be use to draw a horizontal line along the length of the table, to separate rows.

A table element consists of a tabular followed by an optional caption. Unlikes tabulars, A table is a floating "body", like a figure. It may be moved to another (near) location within the formatted document. A tabular, however, appears at the same place in the formatted document as in the SGML source file.

<!element table   - - (tabular, caption?) >
<!attlist table
loc cdata "tbp">


Here is how table GPC was typed:

<table>
<tabular ca="ll|ll">
ae   |       &ae   | Ae   |   &Ae       @
oe   |       &oe   | Oe   |   &Oe       @
ue   |       &ue   | Ue   |   &Ue       @
sz   |       &sz   | amp  |   &amp      @
bsol |       &bsol | circ |   &circ     @
.
.
.
Dagger |       &Dagger | sect  |  &sect     @
para   |       &para   | copy  |  &copy     @
mdash  |       &mdash  | tilde |  &tilde
</tabular>
<caption><label id="GPC">
General Purpose Characters
</caption>
</table>


3.1.8 Literate Programming

The original motivation behind the development of these document types was to create an environment for literate programming in an arbitrary programming language similar to Donald Knuth's WEB system for literate programming in Pascal [Knuth84]. The basic idea is to include the source code of a program inside of its documentation, instead of the other way around: including comments within the source code.

The features offered here to support literate programming, or merely the documentation of existing programs, have been kept to a minimum. Snippets of code can be mentioned within sentences using the tt tag. These are formatted using a typewriter font suitable for program code, but the spacing and indentation of the code is not retained. Within tt elements, the only characters which may not be literally interpreted are $, \, &, and </. For the$ and \ symbols, always use the dollar and bsol entities. For the & and < symbols, use the amp and lt entities if the string in which they occur could be mistaken for an entity reference, an element start tag or an element end tag.

To include larger segments of code, retaining its line breaks, tabulation and spacing, use the code tag or the verb tag. Within these tags just about all characters are interpreted literally. The exceptions are:

1. As SGML entities may be used within verb and code elements, use the ero entity to represent the &symbol in strings which might otherwise be mistaken for entity references. (Notice that the amp entity is not used to represent &in this context.)
2. As there must be some way of ending such elements, use the etago entity to represent </ in strings which might otherwise be interpreted as end tags. (Do not use the lt entity for this purpose here.) Start tags can be typed literally in this context, without using entities.
3. Unfortunately TeX peeks through a bit here as well; The string \end{verbatim} may not occur within code or verb elements. Presumably this will not often be a problem.

For example, to include the "hello world" C program in a document, just type:

<code>
main ()
{
/* This is the famous hello world program */

printf("hello world\n");
}
</code>


When formatted, spaces and line breaks are preserved:

main ()
{
/* This is the famous hello world program */

printf("hello world\n");
}


Notice that no entities where required in this code. With few exceptions, it should be possible to just wrap verb or code tags around existing pieces of code without change.

The idea of literate programming is that the documentation is the program, so there must be some way of extracting the source code from the SGML document. Just how to do this is described in chapter , below.

The user must have a means of indicating which pieces of code are to be included in the source code, and in which order. Our solution to this problem is very simple: Only code elements are to be extracted, and they are extracted in the same order as they appear in the document. That is, verb elements are not extracted, and may be used, e.g., for examples or draft versions of the code included for explanatory or tutorial purposes.

code and verb elements may be formatted differently. Using our translation into LaTeX, for example, code elements are distinguished by being bracketed by lines the width of the page.

<!element code - - rcdata>
<!element verb - - rcdata>

<!shortref ttmap
"&#RS;B" null
'#'     num
'%'     percnt
'~'     tilde
'_'     lowbar
'^'     circ
'{'     lcub
'}'     rcub
'|'     verbar >

<!usemap ttmap  tt>


3.1.9 Mathematical Formulas

The qwertz document types include elements for describing mathematical formulas completely within SGML, similar to the system described in [daphne89]. To start, there are a fairly large number of entities for mathematical symbols. (The set of entities chosen are for the symbols available in both TeX and in the PostScript Symbol font.) Although this may be a minor irritation for seasoned TeX users, we have decided to follow the naming conventions for mathematical symbols adopted in the SGML Standard [Smith88]. The complete set of mathematical symbols currently defined, including the Greek alphabet are listed in tables mathsym and greek , in alphabetical order.

<!entity % math system -- math symbols -- >
%math;


Table 3.2: Math Symbols

Prime ″ aleph ℵ and ∧ ang ∠
ap ≈ arr ↓ bottom ⊥ bull •
cap ∩ cir ○ clubs ♣ congr &congr;
cup ∪ diams ♦ divide ÷ dot ˙
empty ∅ equiv ≡ exist ∃ forall ∀
ge ≥ hArr ⇔ harr ↔ hearts ♥
image ℑ infin ∞ isin ∈ lArr ⇐
lang ⟨ larr ← le ≤ mid ∣
minus − nabla ∇ ne ≠ nequiv ≢
not ¬ notin ∉ nsub ⊄ nsube ⊈
nsup ⊅ nsupe ⊉ nvDash ⊭ nvdash ⊬
oplus ⊕ or ∨ otimes ⊗ part ∂
plusmn ± prime ′ prop ∝ rArr ⇒
rang ⟩ rarr → real ℜ setmn ∖
spades ♠ square □ sub ⊂ sube ⊆
sup ⊃ supe ⊇ times × uArr ⇑
uarr ↑ vDash ⊨ vdash ⊢

Table 3.3: Greek Letters

alpha α beta β gamma γ
Gamma Γ delta δ Delta Δ
epsi ε zeta ζ eta η
thetas &thetas; Theta Θ iota ι
kappa κ lambda λ mu μ
nu ν xi ξ Xi Ξ
pi π Pi Π rho ρ
sigma σ sigmav ς Sigma Σ
tau τ upsi υ Upsi ϒ
phis &phis; Phi Φ chi χ
psi ψ Psi Ψ omega ω
Omega Ω

TeX symbols not in table 2 may nonetheless be generated, by defining an entity using the mc element. For example, to print the $\leadsto$ symbol, you could first define an entity, perhaps using the name adopted for this symbol in the SGML standard:

<!entity rarrw "<mc/<x/\leadsto//">


Of course, this approach is TeX dependent. But this dependency is clearly noted at the beginning of the document, and it would be an easy matter to replace the TeX command for such entities with the appropriate commands for some other formatter.

The mc tag used in this entity definition is for math characters. The entity could have been defined using only the x tag described in section misc , but it is "safer" to use the mc tag when defining entities which are only to be used within formulas, as the SGML parser will complain if they are used elsewhere. If x were used instead, such errors would first be caught by TeX.

<!element  mc  - - cdata >


There are a number of parameters for formulas. These will most likely be of little interest to most users, but are stated here for the sake of completeness.

<!entity % sppos     "tu" >
<!entity % fcs       "%sppos;|phr" >
<!entity % fcstxt    "#pcdata|mc|%fcs;" >
<!entity % fscs      "rf|v|fi" >
<!entity % limits    "pr|in|sum" >
<!entity % fbu       "fr|lim|ar|root" >
<!entity % fph       "unl|ovl|sup|inf" >
<!entity % fbutxt    "(%fbu;) | (%limits;) |
(%fcstxt;)|(%fscs;)|(%fph;)" >
<!entity % fphtxt    "p|#pcdata" >


There are three elements for representing formulas: f, for ordinary short formulas appearing "in-line"; dm for displayed formulas to be centered on a line (or lines) by themselves; and eq for displayed formulas which are to be numbered sequentially throughout the document (i.e. so-called "equations").

<!element  f        - - ((%fbutxt;)*) -(footnote) >

<!entity   fendtag  '</f>'   -- formula end -- >

<!shortref fmap
"&#RS;B" null
"&#RS;B&#RE;" null
"&#RS;&#RE;" null
"_" thinsp
"~" nbsp
"]" fendtag
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>

<!usemap   fmap     f >

<!element  dm       - - ((%fbutxt;)*) -(footnote)>
<!element  eq       - - ((%fbutxt;)*) -(footnote)>

<!shortref dmmap
"&#RE;" space
"_" thinsp
"~" nbsp
"]" fendtag
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>

<!usemap dmmap (dm,eq)>


Usually it is not necessary to type the starting and ending tags of the f element explicitly: [ and ] are short reference delimiters, allowing one to simply type, for example, [&alpha &rarr &beta], instead of <f>&alpha &rarr &beta</f> to represent α → β. TeX users will appreciate that this notation is no more verbous than TeX.

The only characters of interest in fmap are _ ~and ]. _ is a short reference for thinsp, which adds a little extra horiztonal space. ~ means nbsp, which in turn denotes a non-breaking space. TeX will not start a new line at a nbsp. Finally, ] is used to end the formula. The other characters in this map just protect us from any special meaning TeX gives them.

The dmmap is much the same as the fmap. There are just two differences: 1) ] is not a short reference for the f closing tag (and instead has its literal meaning), and 2) carriage returns and new lines are replaced by spaces, for reasons having to do with the way TeX formats formulas. Use the tu element, defined a bit later, to force line breaks in formulas.

Of course, formulas consist of more than just a string of math symbols. There are elements for representing fractions (fr), products (pr), integrals (in), sums (sum), roots (root) and arrays (ar). Each of these will be described next.

A fraction consists of a numerator (nu) and a denominator (de). For example, 12/37 can be written as:

[<fr><nu>12<de>37</fr>]


Of course, this is rather lengthy. For simple fractions such as this, you may prefer to just type [12/37], which is formatted by LaTeX in the same way. On the other hand, if you are a SGML purist, you may prefer not to do this, as it makes assumptions about the formatting system being used.

<!element  fr       - - (nu,de) >
<!element  nu       o o ((%fbutxt;)*) >
<!element  de       o o ((%fbutxt;)*) >


Products, integrals and sums all have similiar structure, consisting of a lower limit (ll), an upper limit (ul) and an optional operand (opd).

<!element  ll       o o ((%fbutxt;)*) >
<!element  ul       o o ((%fbutxt;)*) >
<!element  opd      - o ((%fbutxt;)*) >
<!element  pr       - - (ll,ul,opd?) >
<!element  in       - - (ll,ul,opd?) >
<!element  sum      - - (ll,ul,opd?) >


So, for example,

was typed as:

<dm>
<sum><ll>i=1<ul>n<opd>x<inf>i</></sum> =
<in><ll>0<ul>1<opd>f</in>
</dm>


This example also shows how to represent subscripts, using the inf tag. There is also a sup tag for superscripts.

For operators with upper and lower limits other than products, sums or integrals, use the lim element.

<!element  lim      - - (op,ll,ul,opd?) >
<!element  op       o o (%fcstxt;|rf|%fph;) -(tu) >


For example, was typed as

<!entity bigcup "<mc>&bigcup</>">
...
<dm>
<lim>&bigcup<ll>i=0<ul>n</>
<opd>{&alpha<inf>i</> &rarr &beta}</>
</lim>
</dm>

Notice that it isn't necessary to type the op tag here.

Roots can be represented using the, what else, root element. By default, root produces square roots. The n attribute of root can be used for other roots. For example, type [<root n=3/x+y/] to get .

<!element  root     - - ((%fbutxt;)*) >
<!attlist  root
n cdata "">


Arrays, or matrices, consist of a sequence of rows, each of which contains a sequence of columns. Every row in the array must contain the same number of columns. Rows are separated by the arr tag; columns by the arc tag. The array itself is delimited by the ar tag.

<!element col o o ((%fbutxt;)*) >
<!element row o o (col, (arc, col)*) >

<!element  ar       - - (row, (arr, row)*) >
<!attlist  ar
ca     cdata    #required >
<!element  arr      - o empty >
<!element  arc      - o empty >


This is a place where an SGML short reference map has proven useful:

<!entity   arr "<arr>" >
<!entity   arc "<arc>" >

<!shortref arrmap
"&#RE;" space
"@" arr
"|" arc
"_" thinsp
"~" nbsp
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub >

<!usemap   arrmap   ar >


Columns can be separated using the | character; rows with the @ character.

For example, this matrix was typed as:

<ar ca=clcr>
a+b+c  |  uv     |  x-y  |  27     @
a+b    |  u+v    |  z    |  134    @
a      |  3u+vw  |  xyz  |  2,978
</ar>


The column alignment of an array must be specified using the ca attribute, as shown in the example. For each column in the array, there is a letter in the ca attribute. There are three alternatives: 1) c for centered; 2) l for flush left; and 3) r for flush right.

There remain a few miscellaneous math elements to describe. sup and inf, for superscripts and subscripts, were mentioned above. unl and ovl can be used to underline or overline formulas. rf is used for identifiers, such as function names (e.g. cos or sin) within formulas. Similarly, phr is used to delimit phrases of ordinary text within formulas. (Both of these are necessary, as strings of characters within formulas denote sequences of variables, not words.) The v tag can be used to denote a vector, as in x. Calligraphic characters, such as L, can be denoted using the fi tag. Finally, line breaks can be inserted into formulas using the tu element.

<!element  sup      - - ((%fbutxt;)*) -(tu) >
<!element  inf      - - ((%fbutxt;)*) -(tu) >
<!element  unl - - ((%fbutxt;)*) >
<!element  ovl - - ((%fbutxt;)*) >
<!element  rf  - o (#pcdata) >
<!element  phr - o ((%fphtxt;)*) >
<!element  v   - o ((%fcstxt;)*)
-(tu|%limits;|%fbu;|%fph;) >
<!element  fi  - o (#pcdata) >
<!element  tu  - o empty >

<!usemap global (rf,phr)>


3.1.10 Definitions, Lemmas and Theorems

There are a number of elements useful for representing definitions (def), propositions (prop), lemmas (lemma), corollaries (coroll), proofs (proof), and theorems (theorem).

<!element def - - (thtag?, p+) >
<!element prop - - (thtag?, p+) >
<!element lemma - - (thtag?, p+) >
<!element coroll - - (thtag?, p+) >
<!element proof - - (p+) >
<!element theorem - - (thtag?, p+) >
<!element thtag - - (%inline)>

<!usemap global (def,prop,lemma,coroll,proof,theorem)>
<!usemap oneline thtag>


With the exception of proof, these all have the same structure: an optional thtag followed by some paragraph level elements. Here is an example:

Alexander's Theorem

Let G be a set of nontrivially achievable subgoals and < an order on G. < is abstractly indicative if and only if it is a linearization of < G * .

This was typed as:

<theorem><thtag>Alexander's Theorem</>

Let [<fi/G/] be a set of nontrivially achievable
subgoals and &lt an order on [<fi/G/].  &lt
is abstractly indicative if and only if it is a
linearization of
[<lim>&lt <ll> <fi/G/ <ul> &ast </lim>].

</theorem>


3.1.11 The global Short Reference Map

The global short reference map, which is the default map in effect within qwertz documents, allows the " symbol to be used to start a short quote (sq) and [ to start a formula (f). Also, ~ is used for non-breaking spaces. The rest of the short references just serve to hide any special meaning TeX gives these characters, allowing them to be directly typed without having to use entity references.

<!entity   qtag     '<sq>' >

<!shortref global
"&#RS;B" null  -- delete leading blanks --
'"' qtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>

<!usemap global qwertz>


3.2 Cross References

Places within a document can be marked using the label element. Labels have an id attribute for naming the label. The SGML parser will check that these identifiers are unique within the document, and that they are referenced. That is, the parser will complain if there is no reference to a label. For this reason, labels should probably be created on demand, rather than in anticipation of the need for a reference to the element.

There are two kinds of references: ref for references to the number of some element, such as a section, figure or theorem, and pageref, for references to the number of the page on which the text around the label occurs when the document is printed. Both types of references have an id attribute for stating the identifier of the label being referenced. The number of the element or page will be printed at the place of the ref or pageref.

<!element label - o empty>
<!attlist label id cdata #required>

<!element ref - o empty>
<!attlist ref
id cdata #required>

<!element pageref - o empty>
<!attlist pageref
id cdata #required>


For example, a reference to the section on miscellaneous elements of this manual, section misc , would be typed as:

... section <ref id=misc>, would be ...


The label itself was typed as:

<sect><heading><label id="misc">
Miscellaneous Elements</>


3.3Miscellaneous Elements

There are just a couple general purpose elements remaining to be discussed, which don't seem to have found a suitable home yet elsewhere in this manual.

Editorial comments and reminders to oneself can be marked with the comment tag. These comments will be printed using a different type style than the body of the text. In the qwertz mapping into TeX, they are printed using the slanted type style.

If you do not want the comment to be printed, use the standard SGML notation for comments instead: <!-- … -->.

Finally, there is an "escape" element, allowing you to include raw formatting code at any place in your document, the x element. This code will be passed on to the formatter, such as TeX, inline, at the point it appears in your document. Of course, this "feature" should be used judiciously, as it limits the formatter independence of the document.

<!element comment - - (%inline)>
<!element x - - ((#pcdata | mc)*) >
<!usemap   #empty   x >


Notice that math character (mc) elements may appear within x elements. This allows you to use SGML entity references for math characters, to help avoid having to rememember both the SGML and the formatter's names for these symbols. Other entities may also be used, so long as they expand to character data.

3.4 Articles, Reports and Books

Articles, reports and books are structurally very similar. They may be formatted differently, of course, but this is of little importance during the writing phase of primary interest to authors. Seen abstractly, each type of document consists of a title page, for such information as the title of the document, the names of the authors and so on, followed perhaps by an abstract, and then by a sequence of chapters or sections. There may be citations, which are references to documents listed at the end, in a bibliography. Perhaps there are one or more appendices. Finally, these documents may also contain footnotes.

Let us first precisely describe the overall structure of these document types, before moving on to describe their various components. The article element is defined as:

<!element article - -
toc?, lof?, lot?, p*, sect*,
(appendix, sect+)?, biblio?) +(footnote)>

<!attlist article
opts cdata "null">


The options attribute (opts) of article provides a place to state formatting options, which are passed on to LaTeX. The particular options available depends on the installation of LaTeX being used, but the following should always be available:

11pt, 12pt.

Set the "normal" font size to eleven, or twelve, point, instead of the default 10 point size.

twoside.

Formats the document for printing on both sides of a page.

twocolumn.

Formats the document with two columns per page, as is common in the proceedings of scientific conferences, for example.

titlepage.

Causes the title page and abstract to be printed on a separate page.

Other options which may be supported include:

dina4.

Formats the document for printing on DIN A4 size paper. (As this is the size paper used at our installation, this option is included automatically during the translation.)

german.

Causes the TeX hyphenation algorithm to "think German", and sections, bibliographies and such to be labelled using the appropriate German terms.

times, bookman, palatino …

Causes the "main" font to be the selected PostScript font, instead of the standard TeX font, Computer Modern, and maps all other type faces to some suitable PostScript font or type style.

For example, the starting tag for some article might be:

<article opts="bookman,11pt">


Reports are just like articles, except that they consist of a sequence of chapters (chapt), instead of sections (sect):

<!element report - -
(titlepag, header?, abstract?, toc?, lof?, lot?, p*,
chapt*, (appendix, chapt+)?, biblio?) +(footnote)>

<!attlist report
opts cdata "null">


Books are similar to reports, except that they may not include an abstract:

<!element book  - -
(titlepag, header?, toc?, lof?, lot?, p*, chapt*,
(appendix, chapt+)?, biblio?) +(footnote) >

<!attlist book
opts cdata "null">


The options attribute (opt) for report and book elements is the same as that for articles, just described, except the titlepage option, which is applicable only for articles.

The rest of this chapter describes the common elements of articles, reports and books, starting with title pages.

3.4.1 Title Pages

A title page (titlepag) consists of a title, a number of authors (author) and an optional date (date). The title may refer to a footnote and may also include a subtitle. If the date element is omitted, today's date will be printed by default. To avoid having a date printed, include an empty date element.

<!element titlepag o o (title, author, date?)>
<!element title - o (%inline, subtitle?) +(newline)>
<!element subtitle - o (%inline)>
<!usemap oneline titlepag>


The author element includes the name and, optionally, institution (inst) of the author. If there are multiple authors, these are separated with the and tag. Also, acknowledgements can be expressed using the thanks element. These are formatted by LaTeX as footnotes on the title page.The author element includes the name and, optionally, institution (inst) of the author. If there are multiple authors, these are separated with the and tag. Also, acknowledgements can be expressed using the thanks element. These are formatted by LaTeX as footnotes on the title page.

<!element author - o (name, thanks?, inst?,
(and, name, thanks?, inst?)*)>
<!element name o o (%inline) +(newline)>
<!element and - o empty>
<!element thanks - o (%inline)>
<!element inst - o (%inline) +(newline)>
<!element date - o (#pcdata)>
<!usemap global thanks>


Within the titlepag, the title, subtitle, author and inst elements can be broken into multiple lines using the newline element or, if you prefer, the nl entity.


<!element newline - o empty >
<!entity nl "<newline>">


The title page of this manual was typed as:

<title>The <tt/qwertz/ SGML Document Types
<subtitle>(Version 1.1 Reference Manual)
<author>Tom Gordon
<inst> Institute for Applied Information Technology (F3) &nl&nl
German National Research Center  &nl
for Computer Science (GMD)


Notice the titlepag tags are optional. The simplest title page would include a title and author:

<title> A Very Short Title Page
<author> Snoopy


3.4.2 Abstracts

Articles and reports, but not books, may have an abstract, which consists of one or more paragraphs, including the various kinds of lists, mathematical formulas and elements for literate programming:

<!element abstract - - (p+)>


There are three elements for stating whether or not a table of contents, list of figures or list of tables should be included in the document. These tables and lists are generated by LaTeX. Therefore the contents of these elements is empty. They are only used to specify that the list or table should be included.

<!element toc - o empty>
<!element lof - o empty>
<!element lot - o empty>


A header element specifies what should be printed at the top of each page. It consists of a left heading (lhead) and a right heading (rhead). Both elements are required, if a heading is used at all, but either may be left empty, so that the effect of having only a left or right heading can be achieved easily enough.

<!element header - - (lhead, rhead) >


As we will see, an initial header can be given after the title page. Afterwards, a new header can be given for each new chapter or section. The header printed on a page is the one which is in effect at the end of the current page. So that the header will be that of the last section starting on the page.

3.4.5 Sectioning

The naming scheme we have adopted for sections is a bit different than that of LaTeX, because the names of SGML identifiers may be at most only eight characters long. But we think the scheme we have chosen has its advantages. In books and reports, the top-level sectional unit is the chapter (chapt). In articles, it is the section (sect). The lower sectional units are sect1, sect2, sect3, and sect4, in that order.

Each section (or chapter) consists of a heading, followed by an optional header, a number of paragraphs (including such things as graphics), and then sections of the next lower level.

<!entity % sect "heading, header?, p* " >
<!element chapt - o (%sect, sect*) +(footnote)>
<!element sect  - o (%sect, sect1*) +(footnote)>
<!element sect1 - o (%sect, sect2*)>
<!element sect2 - o (%sect, sect3*)>
<!element sect3 - o (%sect, sect4*)>
<!element sect4 - o (%sect)>
<!usemap oneline (chapt,sect,sect1,sect2,sect3,sect4)>


Don't confuse the headers with headings. The heading is just the text printed at the point where the section begins, naming the section. The header changes the text printed at the top of each page.

If there are cross references to the section, put the label in the heading. For example, you could type:

<sect><heading><label id=mysect>My First Section</>


If a label isn't required, you can leave the heading tag implicit:

<sect>My First Section


The appendix element marks the begin of a sequence of appendices. These are chapters or sections, depending on whether the document is an article, report or book, and differ from ordinary chapters or sections only in the way the are numbered, and of course their placement at the end of the document.

<!element appendix - o empty >


3.4.6 Footnotes

The tag for footnotes is, simply enough, footnote. To be sure the marker for the footnote is formatted propertly, be sure not to leave a space between the character after which the footnote marker is to appear and the beginning of the footnote element itself.

<!element footnote - - (%inline)>
<!usemap global footnote>


Footnotes can appear anywhere within a section (or chapter). The usemap declaration is required to cancel the lines map used in title pages.

3.4.7 Citation

Literature references can be made using the cite and ncite elements. The only difference between them is that the ncite allows a short note to be included in the reference, for such things as page numbers.

<!element cite - o empty>
<!attlist cite
id cdata #required>

<!element ncite - o empty>
<!attlist ncite
id cdata #required
note cdata #required>


For example, one might type

        <ncite id="Bryan88" note="pg.68">

to refer to page 68 of Martin Bryan's book on SGML. This would appear, using LaTeX, as [Bryan88, pg. 68] in the printed document.

The id attribute of a cite or ncite is a reference to an identifier of a BibTeX bibliography file. There is a qwertz SGML document type for creating such bibliographies, described below.

The bibliography itself, or list of references, is generated by including a biblio element near the end of the document, before the appendix.

<!element biblio - o empty>
<!attlist biblio
style cdata "qwertz"
files cdata "">


The files attribute of biblio is a list of the names of the bibliographies used, separated by commas. The names should not include any file suffixes, such as ".bib" or ".sgml". For example, to cite publications on artificial intelligence and cognitive science, where the bibliograhies are maintained in two files, ai.sgml and cogsci.sgml, you would type:

<biblio files="ai,cogsci">


The style attribute determines how the bibliography is formatted. Five styles are supported:

plain

Entries are sorted alphabetically and labeled with numbers.

unsrt

The same as plain except the entries are ordered as they appear in the document, rather than alphabetically.

alpha

The same as plain, except that labels are made from the author's name and the year of publication.

abbrv

The same as plain except that first names, month names, and journal names are abbreviated.

qwertz

The same as plain except that all words of the entry are capitalized exactly as they appear in the source file of the bibliography. The plain style applies capitalization rules which are inappropriate, e.g., for German titles.

3.5 Slides

The slides element is for making a series of slides or, more commonly, overhead transparencies. Although you may often prefer to use some other program for preparing presentations, this approach has its advantages when you want to include parts of an existing article or book on your transparencies. You can just "cut and paste" the SGML source from an article onto a slide. You may also prefer this approach if your presentation includes mathematical formulas, to be able to take advantage of TeX's excellent mathematics typesetting.

<!element slides - - (slide*) >

<!attlist slides
opts cdata "null">


Each slide consists of an optional title, followed by one or more slpar elements:

<!element slide - o (title?, p+) >


Notice that not every element available in an article or book is also available here. In particular, there are no sectioning elements, cross references, footnotes or a bibliography. Our translation into TeX does not use SliTeX, so as to allow slides to include tables and figures.

The title element will be centered on the line. You can break up the title into multiple lines with newline elements. The various type style elements, such as em and bf, can also be used here; indeed anywhere on a slide.

3.6 Letters and Electronic Messages

The letter element is for making letters and e-mail messages. Just how a letter is formatted may depend on whether it is a business or personal letter. If it is a business letter, it may be printed to appear as if the company's letterhead stationery had been used.

The structure of a letter can be quite complex, but most the elements to be described here are optional. Using an example from [Lamport86], a simple letter would be typed like this:

<letter>
<from>
R. (Ma) Dillo
Gnu York, G.Y. 56789
<to>
Dr.~G. Nathaniel Picking
33 Swat Street &nl
Hometown, Illinois 62301

<cc> Jimmy Carter &nl
Richard M. Nixon

<opening> Dear Nat,
<p>
I'm afraid that the armadillo problem is still
with us.  I did everything ...

... and I hope we can get rid of the nasty beasts
this time.

<closing> Best regards,

</letter>


The from and to elements are for the sender's and receiver's names and addresses, respectively. The address may be either a street address, using address, or an electronic mail address, using email, or both. You may also include a telephone number, using the phone element. (If you are using your company's letterhead stationery, it may be that you should type only your extension, rather than your complete telephone number.) Finally, a telefax number can be provided, using the fax element.

Notice that in the closing you must type a comma yourself, if you want one. Also, do not type your name again after the closing; the name of the sender will be printed after the closing as expected.

There are several optional elements which may be of interest:

subject

For the purpose or, well, subject of the letter. If you would like this subject line to appear as "re: …", for example, you must type the "re: " yourself, as part of the subject.

sref, rref, rdate

These are tags for the sender's reference, receiver's reference and receiver's date where you can include whatever code is used by your, or the recipient's, company or institution to uniquely identify letters. For example, if this letter is a response to some other letter, you may use the rref and rdate elements to identify the original letter. There is no sdate tag, as the date this letter is printed will be included in the letter at some appropriate place by the formatter.

cc

This used to be an acronym for "carbon copies", which were to be sent to persons other than the principal recipient of the letter. The cc tag can be used to list these other recipients, even though the copies they receive today are perhaps printed by a laser printer on recycled paper. As in the above example, you can separate the names of these recipients with newline elements (using the nl entity if you prefer).

encl

Use this tag to list enclosures. These can also be separated with newline elements, or simply with commas, if you prefer.

ps

A postscript, not to be confused with PostScript, can be included with this tag. Any kind of element which can appear in the body of the letter (i.e. sectpar elements) can also be used here.

To summarize, here are the relevant SGML declarations:

<!entity  % addr "(address?, email?, phone?, fax?)" >

<!element letter - -
rdate?, opening, p+, closing, encl?, ps?)>

<!attlist letter
opts cdata "null">

<!element from          - o (#pcdata) >
<!element to            - o (#pcdata) >

<!usemap oneline (from,to)>

<!element address       - o (#pcdata) +(newline) >
<!element email         - o (#pcdata) >
<!element phone         - o (#pcdata) >
<!element fax           - o (#pcdata) >

<!element subject       - o (%inline;) >
<!element sref          - o (#pcdata) >
<!element rref          - o (#pcdata) >
<!element rdate         - o (#pcdata) >

<!element opening       - o (%inline;) >
<!usemap oneline opening>

<!element closing - o (%inline;) >
<!element cc - o (%inline;) +(newline) >
<!element encl - o (%inline;) +(newline) >

<!element ps - o (p+) >


3.7 Telefax Messages

The structure of a telefax message is the same as for letters and e-mail messages, except that the fax number of the recipient is, of course, required, rather than optional.

<!element telefax - -
phone?, fax, cc?, subject?,
sref?, rref?, rdate?,
opening, p+, closing, ps?)>

<!attlist telefax
opts cdata "null"
length cdata "2">


3.8 Notes

The notes element is a new top-level document "style", like articles, books and letters. It is useful for miscellaneous purposes, such as jotting down notes to oneself, where the complex structure of the other styles is unnecessary. Notes here simply a sequence of section paragraphs (i.e. paragraphs, lists, comments, long quotations, figures, tables, displayed mathematical formulas, and program code). An optional title is also available. The contents of a notes document can be copied and pasted into a section or chapter of a book or article.

<!element notes - - (title?, p+) >
<!attlist notes
opts cdata "null" >


3.9 Manual Pages

The manpage element is for Unix manual pages. Here we see again an advantage of SGML. Using this element, the very same manual page can be viewed on just about every terminal, using nroff, or be included as a section of an article, report or book to be formatted by TeX.

<!element manpage - - (sect1*)
-(sect2 | f | %mathpar | figure | tabular |
table | %xref | %thrm )>

<!attlist manpage
opts cdata "null"
title cdata ""
sectnum cdata "1" >


A manpage consists of a sequence of sections. There are two SGML attributes, for the command name and manual section number, respectively. Each section of the manual page is delimited by a sect1 element. Notice that these sections may not contain further subsections. Sections are represented as sect1 elements, rather than sect, to allow the manual page to be easily cut and pasted into a sect section of an article, report or book. (Of course, if the manual page is to be used a chapter of a book, then these sections of the manual page will need to be replaced with sect elements.)

Notice that Many elements, such as tables, figures and mathematical formulas, cannot be used within manual pages, because of limitations of ASCII terminals, or the Unix man macro package for nroff.

There is a short reference map in effect within the scope of the manpage. With the exception of [, which is not used here to start formulas, this map has the same effect as the global map.

<!shortref manpage
"&#RS;B" null
'"' qtag
"[" ftag
"~" nbsp
"_" lowbar
"#" num
"%" percnt
"^" circ
"{" lcub
"}" rcub
"|" verbar>

<!usemap manpage  manpage >


3.9.1 Manual Page Conventions

For detailed information about the conventions for Unix manual pages, see your Unix documentation. But here is a brief summary. The typical manual page has the following sections, in this order:

NAME.

The name, or list of names, by which the command or function is called, followed by a dash and then a one-line summary of its purpose.

SYNOPSIS.

For the syntax of the command and its arguments. (The Sun documentation suggests that literals be formatted using boldface type, and that variables be formatted using italics type. Use the tt and em elements, respectively, here for this purpose.)

DESCRIPTION.

An overview of the command or function's purpose, effects and use.

OPTIONS.

A list and description of all command-line options.

FILES.

A list of files associated with the command which may be of interest to users.

A comma-separated list of related Unix commands, and references to other relevant publications.

DIAGNOSTICS.

A list and explanation of any diagnositic messages the command may write to the standard error output file.

BUGS.

A description of any known bugs, problems, or limitations.

Some of you may be asking yourselves why manpage wasn't designed so that each of these conventional sections of a manual page is represented by its own SGML element. That certainly would have been possible, but on the other hand the approach taken has the advantage that users can simply cut and paste sections between manual pages and article, reports and books. Of course it would have been easy to write a filter to convert between these formats, but it was felt that the benefits of a special manpage format would be too small to warrant even this limited effort. After all, unless one is using an SGML structure editor, users must refer to the SGML document type definition to know what is expected in the manual page. It is just as easy to check this documentation to see what sections conventionally appear in manual pages. There is also a file which can be used as a template or form for writing manual pages. See the Unix Commands chapter for details.

The only reason there is a manpage document type, instead of just another translation of, say, the article document type into nroff is that the man macros used for the Unix documentation are not powerful enough to format all of the features available in our latex document type. Having this separate manpage document type provides a means of checking whether the manual page can be formatted by nroff using these man macros. Again, as this document type is designed to be a subset of the latex document type, the sections of a manual page can also be included within instances of the latex document type.

3.9.2 Manual Page Example

Here is how the manual page for the cd command could have been typed using this document type definition:

<manpage title="CD">

<sect1> NAME

<p>cd &mdash change working directory

<sect1> SYNOPSIS

<p> cd [ <em>directory</> ]

<sect1> DESCRIPTION

<p> <em>directory</> becomes the new working directory.  The process
must have execute (search) permission in <em>directory</>.  If cd is

...

<p> csh(1), pwd(1), sh(1)
</manpage>


This is the end of the qwertz document type definition.

<!-- end of qwertz dtd -->


References

• [Lamport86] LaTeX, A Document Preparation System. Addison-Wesley, 1986.
• [Knuth84] Donald E. Knuth. Literate Programming. The Computer Journal, 27(2):97-111, 1984.
• [daphne89] A. Scheller, C. Smith, C. Fuhrhop, and E. Wilde. DAPHNE: Document Application Processing in a Heterogeneous Network Environment. Technical report, Deutsches Forschungsnetz (DFN), April 1989.
• [Smith88] Joan M. Smith and Robert Stutely. SGML: the user's guide to ISO 8879. Ellis Horwood, 1988.
• [Bryan88] Martin Bryan. SGML, An Author's Guide to the Standard Generalized Markup Language. Addison-Wesley, 1988.