utf(7) - Plan 9 from User Space

From adc93f6097615f16d57e8a24a256302f2144ec4e Mon Sep 17 00:00:00 2001 From: rsc Date: Fri, 14 Jan 2005 17:37:50 +0000 Subject: cut out the html - they're going to cause diffing problems. --- man/man7/utf.html | 96 ------------------------------------------------------- 1 file changed, 96 deletions(-) delete mode 100644 man/man7/utf.html (limited to 'man/man7/utf.html') diff --git a/man/man7/utf.html b/man/man7/utf.html deleted file mode 100644 index a1e767ec..00000000 --- a/man/man7/utf.html +++ /dev/null @@ -1,96 +0,0 @@ - -utf(7) - Plan 9 from User Space - - - - -

UTF(7)

UTF(7) -

-
-

NAME
- -
- - UTF, Unicode, ASCII, rune – character set and format
- -
-

DESCRIPTION
- -
- - The Plan 9 character set and representation are based on the Unicode - Standard and on the ISO multibyte UTF-8 encoding (Universal Character - Set Transformation Format, 8 bits wide). The Unicode Standard - represents its characters in 16 bits; UTF-8 represents such values - in an 8-bit byte stream. Throughout this - manual, UTF-8 is shortened to UTF. -
- - In Plan 9, a rune is a 16-bit quantity representing a Unicode - character. Internally, programs may store characters as runes. - However, any external manifestation of textual information, in - files or at the interface between programs, uses a machine-independent, - byte-stream encoding called UTF. -
- - UTF is designed so the 7-bit ASCII set (values hexadecimal 00 - to 7F), appear only as themselves in the encoding. Runes with - values above 7F appear as sequences of two or more bytes with - values only from 80 to FF. -
- - The UTF encoding of the Unicode Standard is backward compatible - with ASCII: programs presented only with ASCII work on Plan 9 - even if not written to deal with UTF, as do programs that deal - with uninterpreted byte streams. However, programs that perform - semantic processing on ASCII graphic characters must convert - from UTF to runes in order to work properly with non-ASCII input. - See rune(3). -
- - Letting numbers be binary, a rune x is converted to a multibyte - UTF sequence as follows: -
- - 01. x in [00000000.0bbbbbbb] → 0bbbbbbb
- 10. x in [00000bbb.bbbbbbbb] → 110bbbbb, 10bbbbbb
- 11. x in [bbbbbbbb.bbbbbbbb] → 1110bbbb, 10bbbbbb, 10bbbbbb
- -
- - Conversion 01 provides a one-byte sequence that spans the ASCII - character set in a compatible way. Conversions 10 and 11 represent - higher-valued characters as sequences of two or three bytes with - the high bit set. Plan 9 does not support the 4, 5, and 6 byte - sequences proposed by X-Open. When there are - multiple ways to encode a value, for example rune 0, the shortest - encoding is used. -
- - In the inverse mapping, any sequence except those described above - is incorrect and is converted to rune hexadecimal 0080.
- -
-

SEE ALSO
- -
- - ascii(1), tcs(1), rune(3), The Unicode Standard.
- -
- -

- - -

		-
	- - - -

- - -- cgit v1.2.3