aboutsummaryrefslogtreecommitdiff
path: root/src/libutf/utf.7
diff options
context:
space:
mode:
Diffstat (limited to 'src/libutf/utf.7')
-rw-r--r--src/libutf/utf.791
1 files changed, 0 insertions, 91 deletions
diff --git a/src/libutf/utf.7 b/src/libutf/utf.7
deleted file mode 100644
index 97b7b1e7..00000000
--- a/src/libutf/utf.7
+++ /dev/null
@@ -1,91 +0,0 @@
-.TH UTF 7
-.SH NAME
-UTF, Unicode, ASCII, rune \- character set and format
-.SH DESCRIPTION
-The Plan 9 character set and representation are
-based on the Unicode Standard and on the ISO multibyte
-.SM UTF-8
-encoding (Universal Character
-Set Transformation Format, 8 bits wide).
-The Unicode Standard represents its characters in 16
-bits;
-.SM UTF-8
-represents such
-values in an 8-bit byte stream.
-Throughout this manual,
-.SM UTF-8
-is shortened to
-.SM UTF.
-.PP
-In Plan 9, a
-.I rune
-is a 16-bit quantity representing a Unicode character.
-Internally, programs may store characters as runes.
-However, any external manifestation of textual information,
-in files or at the interface between programs, uses a
-machine-independent, byte-stream encoding called
-.SM UTF.
-.PP
-.SM UTF
-is designed so the 7-bit
-.SM ASCII
-set (values hexadecimal 00 to 7F),
-appear only as themselves
-in the encoding.
-Runes with values above 7F appear as sequences of two or more
-bytes with values only from 80 to FF.
-.PP
-The
-.SM UTF
-encoding of the Unicode Standard is backward compatible with
-.SM ASCII\c
-:
-programs presented only with
-.SM ASCII
-work on Plan 9
-even if not written to deal with
-.SM UTF,
-as do
-programs that deal with uninterpreted byte streams.
-However, programs that perform semantic processing on
-.SM ASCII
-graphic
-characters must convert from
-.SM UTF
-to runes
-in order to work properly with non-\c
-.SM ASCII
-input.
-See
-.IR rune (2).
-.PP
-Letting numbers be binary,
-a rune x is converted to a multibyte
-.SM UTF
-sequence
-as follows:
-.PP
-01. x in [00000000.0bbbbbbb] → 0bbbbbbb
-.br
-10. x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
-.br
-11. x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb
-.br
-.PP
-Conversion 01 provides a one-byte sequence that spans the
-.SM ASCII
-character set in a compatible way.
-Conversions 10 and 11 represent higher-valued characters
-as sequences of two or three bytes with the high bit set.
-Plan 9 does not support the 4, 5, and 6 byte sequences proposed by X-Open.
-When there are multiple ways to encode a value, for example rune 0,
-the shortest encoding is used.
-.PP
-In the inverse mapping,
-any sequence except those described above
-is incorrect and is converted to rune hexadecimal 0080.
-.SH "SEE ALSO"
-.IR ascii (1),
-.IR tcs (1),
-.IR rune (3),
-.IR "The Unicode Standard" .