blob: f23d1ceef0c262c104125123c7d1b1a1ec034453 (
plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
|
.TH TCS 1
.SH NAME
tcs \- translate character sets
.SH SYNOPSIS
.B tcs
[
.B -slcv
]
[
.B -f
.I ics
]
[
.B -t
.I ocs
]
[
.I file ...
]
.SH DESCRIPTION
.I Tcs
interprets the named
.I file(s)
(standard input default) as a stream of characters from the
.I ics
character set or format, converts them to runes,
and then converts them into a stream of characters from the
.I ocs
character set or format on the standard output.
The default value for
.I ics
and
.I ocs
is
.BR utf ,
the
.SM UTF
encoding described in
.IR utf (7).
The
.B -l
option lists the character sets known to
.IR tcs .
Processing continues in the face of conversion errors (the
.B -s
option prevents reporting of these errors).
The
.B -c
option forces the output to contain only correctly converted characters;
otherwise,
.B 0x80
characters will be substituted for
.SM UTF
encoding errors and
.B 0xFFFD
characters will substituted for unknown characters.
.PP
The
.B -v
option generates various diagnostic and summary information on standard error,
or makes the
.B -l
output more verbose.
.PP
.I Tcs
recognizes an ever changing list of character sets.
In particular, it supports a variety of Russian and Japanese encodings.
Some of the supported encodings are
.TF jis-kanji
.TP
.B utf
The Plan 9
.SM UTF
encoding, known by ISO as UTF-8
.TP
.B utf1
The deprecated original
.SM UTF
encoding from ISO 10646
.TP
.B ascii
7-bit ASCII
.TP
.B 8859-1
Latin-1 (Central European)
.TP
.B 8859-2
Latin-2 (Czech .. Slovak)
.TP
.B 8859-3
Latin-3 (Dutch .. Turkish)
.TP
.B 8859-4
Latin-4 (Scandinavian)
.TP
.B 8859-5
Part 5 (Cyrillic)
.TP
.B 8859-6
Part 6 (Arabic)
.TP
.B 8859-7
Part 7 (Greek)
.TP
.B 8859-8
Part 8 (Hebrew)
.TP
.B 8859-9
Latin-5 (Finnish .. Portuguese)
.TP
.B koi8
KOI-8 (GOST 19769-74)
.TP
.B jis-kanji
ISO 2022-JP
.TP
.B ujis
EUC-JX: JIS 0208
.TP
.B ms-kanji
Microsoft, or Shift-JIS
.TP
.B jis
(from only) guesses between ISO 2022-JP, EUC or Shift-Jis
.TP
.B gb
Chinese national standard (GB2312-80)
.TP
.B big5
Big 5 (HKU version)
.TP
.B unicode
Unicode Standard 1.0
.TP
.B tis
Thai character set plus
.SM ASCII
(TIS 620-1986)
.TP
.B msdos
IBM PC: CP 437
.TP
.B atari
Atari-ST character set
.SH EXAMPLES
.TP
.B tcs -f 8859-1
Convert 8859-1 (Latin-1) characters into
.SM UTF
format.
.TP
.B tcs -s -f jis
Convert characters encoded in one of several shift JIS encodings into
.SM UTF
format.
Unknown Kanji will be converted into
.B 0xFFFD
characters.
.TP
.B tcs -lv
Print an up to date list of the supported character sets.
.SH SOURCE
.B \*9/src/cmd/tcs
.SH SEE ALSO
.IR ascii (1),
.IR rune (3),
.IR utf (7).
|