charconv - Converter for Extended Character Sets

Author: Burkhard Kirste

Introduction

CHARCONV is a program or filter that allows the transformation of one encoding of an extended character set (e.g., ISO Latin-1) to another (e.g., MS DOS, Macintosh). Note that the encoding of umlauts, diphthongs, or diacritics is quite different in ISO Latin-1 (Unix, MS Windows), MS DOS (code page 437) or Apple Macintosh.

Moreover, this program takes care of transcriptions used in TeX, HTML (hypertext markup language) or SGML (Standard Generalized Markup Language). Internally, a font description similar to the TeX code (but without math mode) is used. Tags and macros are removed from HTML, SGML and TeX input. (With respect to TeX, the resulting plain text file is similar to that produced by utilities such as "detex" or "unretex", but umlauts are taken into account.)

Furthermore, the program allows conversion between different end-of-line markers (Unix: LF, DOS: CRLF, Mac: CR).

However, the text is not formatted. Note also that the conversions may not be perfect, in unfavorable cases some text might get lost.

Usage

A short usage note is displayed when calling

     charconv -h

charconv [-d|-m|-u] [-f from_table] [-t to_table] [[-i] input_file [-o] output_file]

  -d - create MS DOS end-of-line (CRLF)
  -m - create Macintosh end-of-line (CR)
  -u - create Unix end-of-line (LF)
  -f, -t - 'from'/'to' character table

  a - ASCII (7 bit)
  c - transcription
  d - MS-DOS code page 437
  e - EBCDIC (only for ASCII <-> EBCDIC!)
  g - German LaTeX (cf. TeX)
  h - HTML (hypertext)
  H - HTML (keep < & >)
  l - ISO Latin 1 (Unix, ANSI, MS Windows)
  L - LaTeX long (\"{a}) (cf. TeX)
  m - Apple Macintosh
  r - RTF (Rich Text Format) (output only!)
  s - SGML (Standard Generalized Markup Language)
  S - Symbol font
  t - TeX
  z - Atari ST

Charconv can be used as a filter or by naming an input file. (Note that the -i option should be used for converting from Macintosh files.)

The -H option (for the output) is useful for generating HTML files from (ISO Latin 1) source files already containing tags.

See also the man page of charconv.

Examples

           charconv -ft myfile.tex
              (de-texify file, using umlauts of current system)

           charconv -ft -th myfile.tex myfile.html
              (convert from TeX to HTML)

           charconv -m -fd -tm dos.txt mac_txt
              (convert from "DOS" to "Macintosh")

           cat myfile.html | charconv -fh | less -r

Source Code

The source code of CHARCONV is available from ftp.fu-berlin.de in directory /unix/tools/charconv, file charconv.tar.gz.

(Current version no.: 1.12, 1996/06/14)


Burkhard Kirste, 1996/06/14