aboutsummaryrefslogtreecommitdiff
diff options
context:
space:
mode:
authorMichael Teichgräber <mt4swm@googlemail.com>2009-08-18 02:40:26 -0400
committerMichael Teichgräber <mt4swm@googlemail.com>2009-08-18 02:40:26 -0400
commite7c5e5ed94e02e969d45d8ab74876cac53694195 (patch)
tree91479665e533094c878f1c41c7ab9aa65eb84799
parent0829f75bba8a5eb60a7f46e21aa266736b4c4bab (diff)
downloadplan9port-e7c5e5ed94e02e969d45d8ab74876cac53694195.tar.gz
plan9port-e7c5e5ed94e02e969d45d8ab74876cac53694195.tar.bz2
plan9port-e7c5e5ed94e02e969d45d8ab74876cac53694195.zip
awk: add awk(1)
http://codereview.appspot.com/104086
-rw-r--r--man/man1/awk.1560
1 files changed, 560 insertions, 0 deletions
diff --git a/man/man1/awk.1 b/man/man1/awk.1
new file mode 100644
index 00000000..df28f145
--- /dev/null
+++ b/man/man1/awk.1
@@ -0,0 +1,560 @@
+.TH AWK 1
+.SH NAME
+awk \- pattern-directed scanning and processing language
+.SH SYNOPSIS
+.B awk
+[
+.B -F
+.I fs
+]
+[
+.B -d
+]
+[
+.BI -mf
+.I n
+]
+[
+.B -mr
+.I n
+]
+[
+.B -safe
+]
+[
+.B -v
+.I var=value
+]
+[
+.B -f
+.I progfile
+|
+.I prog
+]
+[
+.I file ...
+]
+.SH DESCRIPTION
+.I Awk
+scans each input
+.I file
+for lines that match any of a set of patterns specified literally in
+.I prog
+or in one or more files
+specified as
+.B -f
+.IR progfile .
+With each pattern
+there can be an associated action that will be performed
+when a line of a
+.I file
+matches the pattern.
+Each line is matched against the
+pattern portion of every pattern-action statement;
+the associated action is performed for each matched pattern.
+The file name
+.L -
+means the standard input.
+Any
+.IR file
+of the form
+.I var=value
+is treated as an assignment, not a file name,
+and is executed at the time it would have been opened if it were a file name.
+The option
+.B -v
+followed by
+.I var=value
+is an assignment to be done before the program
+is executed;
+any number of
+.B -v
+options may be present.
+.B -F
+.IR fs
+option defines the input field separator to be the regular expression
+.IR fs .
+.PP
+An input line is normally made up of fields separated by white space,
+or by regular expression
+.BR FS .
+The fields are denoted
+.BR $1 ,
+.BR $2 ,
+\&..., while
+.B $0
+refers to the entire line.
+If
+.BR FS
+is null, the input line is split into one field per character.
+.PP
+To compensate for inadequate implementation of storage management,
+the
+.B -mr
+option can be used to set the maximum size of the input record,
+and the
+.B -mf
+option to set the maximum number of fields.
+.PP
+The
+.B -safe
+option causes
+.I awk
+to run in
+``safe mode,''
+in which it is not allowed to
+run shell commands or open files
+and the environment is not made available
+in the
+.B ENVIRON
+variable.
+.PP
+A pattern-action statement has the form
+.IP
+.IB pattern " { " action " }
+.PP
+A missing
+.BI { " action " }
+means print the line;
+a missing pattern always matches.
+Pattern-action statements are separated by newlines or semicolons.
+.PP
+An action is a sequence of statements.
+A statement can be one of the following:
+.PP
+.EX
+.ta \w'\fLdelete array[expression]'u
+if(\fI expression \fP)\fI statement \fP\fR[ \fPelse\fI statement \fP\fR]\fP
+while(\fI expression \fP)\fI statement\fP
+for(\fI expression \fP;\fI expression \fP;\fI expression \fP)\fI statement\fP
+for(\fI var \fPin\fI array \fP)\fI statement\fP
+do\fI statement \fPwhile(\fI expression \fP)
+break
+continue
+{\fR [\fP\fI statement ... \fP\fR] \fP}
+\fIexpression\fP #\fR commonly\fP\fI var = expression\fP
+print\fR [ \fP\fIexpression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
+printf\fI format \fP\fR[ \fP,\fI expression-list \fP\fR] \fP\fR[ \fP>\fI expression \fP\fR]\fP
+return\fR [ \fP\fIexpression \fP\fR]\fP
+next #\fR skip remaining patterns on this input line\fP
+nextfile #\fR skip rest of this file, open next, start at top\fP
+delete\fI array\fP[\fI expression \fP] #\fR delete an array element\fP
+delete\fI array\fP #\fR delete all elements of array\fP
+exit\fR [ \fP\fIexpression \fP\fR]\fP #\fR exit immediately; status is \fP\fIexpression\fP
+.EE
+.DT
+.PP
+Statements are terminated by
+semicolons, newlines or right braces.
+An empty
+.I expression-list
+stands for
+.BR $0 .
+String constants are quoted \&\fL"\ "\fR,
+with the usual C escapes recognized within.
+Expressions take on string or numeric values as appropriate,
+and are built using the operators
+.B + \- * / % ^
+(exponentiation), and concatenation (indicated by white space).
+The operators
+.B
+! ++ \-\- += \-= *= /= %= ^= > >= < <= == != ?:
+are also available in expressions.
+Variables may be scalars, array elements
+(denoted
+.IB x [ i ] )
+or fields.
+Variables are initialized to the null string.
+Array subscripts may be any string,
+not necessarily numeric;
+this allows for a form of associative memory.
+Multiple subscripts such as
+.B [i,j,k]
+are permitted; the constituents are concatenated,
+separated by the value of
+.BR SUBSEP .
+.PP
+The
+.B print
+statement prints its arguments on the standard output
+(or on a file if
+.BI > file
+or
+.BI >> file
+is present or on a pipe if
+.BI | cmd
+is present), separated by the current output field separator,
+and terminated by the output record separator.
+.I file
+and
+.I cmd
+may be literal names or parenthesized expressions;
+identical string values in different statements denote
+the same open file.
+The
+.B printf
+statement formats its expression list according to the format
+(see
+.IR fprintf (3)) .
+The built-in function
+.BI close( expr )
+closes the file or pipe
+.IR expr .
+The built-in function
+.BI fflush( expr )
+flushes any buffered output for the file or pipe
+.IR expr .
+If
+.IR expr
+is omitted or is a null string, all open files are flushed.
+.PP
+The mathematical functions
+.BR exp ,
+.BR log ,
+.BR sqrt ,
+.BR sin ,
+.BR cos ,
+and
+.BR atan2
+are built in.
+Other built-in functions:
+.TF length
+.TP
+.B length
+If its argument is a string, the string's length is returned.
+If its argument is an array, the number of subscripts in the array is returned.
+If no argument, the length of
+.B $0
+is returned.
+.TP
+.B rand
+random number on (0,1)
+.TP
+.B srand
+sets seed for
+.B rand
+and returns the previous seed.
+.TP
+.B int
+truncates to an integer value
+.TP
+.B utf
+converts its numerical argument, a character number, to a
+.SM UTF
+string
+.TP
+.BI substr( s , " m" , " n\fL)
+the
+.IR n -character
+substring of
+.I s
+that begins at position
+.IR m
+counted from 1.
+.TP
+.BI index( s , " t" )
+the position in
+.I s
+where the string
+.I t
+occurs, or 0 if it does not.
+.TP
+.BI match( s , " r" )
+the position in
+.I s
+where the regular expression
+.I r
+occurs, or 0 if it does not.
+The variables
+.B RSTART
+and
+.B RLENGTH
+are set to the position and length of the matched string.
+.TP
+.BI split( s , " a" , " fs\fL)
+splits the string
+.I s
+into array elements
+.IB a [1]\f1,
+.IB a [2]\f1,
+\&...,
+.IB a [ n ]\f1,
+and returns
+.IR n .
+The separation is done with the regular expression
+.I fs
+or with the field separator
+.B FS
+if
+.I fs
+is not given.
+An empty string as field separator splits the string
+into one array element per character.
+.TP
+.BI sub( r , " t" , " s\fL)
+substitutes
+.I t
+for the first occurrence of the regular expression
+.I r
+in the string
+.IR s .
+If
+.I s
+is not given,
+.B $0
+is used.
+.TP
+.B gsub
+same as
+.B sub
+except that all occurrences of the regular expression
+are replaced;
+.B sub
+and
+.B gsub
+return the number of replacements.
+.TP
+.BI sprintf( fmt , " expr" , " ...\fL)
+the string resulting from formatting
+.I expr ...
+according to the
+.I printf
+format
+.I fmt
+.TP
+.BI system( cmd )
+executes
+.I cmd
+and returns its exit status
+.TP
+.BI tolower( str )
+returns a copy of
+.I str
+with all upper-case characters translated to their
+corresponding lower-case equivalents.
+.TP
+.BI toupper( str )
+returns a copy of
+.I str
+with all lower-case characters translated to their
+corresponding upper-case equivalents.
+.PD
+.PP
+The ``function''
+.B getline
+sets
+.B $0
+to the next input record from the current input file;
+.B getline
+.BI < file
+sets
+.B $0
+to the next record from
+.IR file .
+.B getline
+.I x
+sets variable
+.I x
+instead.
+Finally,
+.IB cmd " | getline
+pipes the output of
+.I cmd
+into
+.BR getline ;
+each call of
+.B getline
+returns the next line of output from
+.IR cmd .
+In all cases,
+.B getline
+returns 1 for a successful input,
+0 for end of file, and \-1 for an error.
+.PP
+Patterns are arbitrary Boolean combinations
+(with
+.BR "! || &&" )
+of regular expressions and
+relational expressions.
+Regular expressions are as in
+.IR regexp (7).
+Isolated regular expressions
+in a pattern apply to the entire line.
+Regular expressions may also occur in
+relational expressions, using the operators
+.BR ~
+and
+.BR !~ .
+.BI / re /
+is a constant regular expression;
+any string (constant or variable) may be used
+as a regular expression, except in the position of an isolated regular expression
+in a pattern.
+.PP
+A pattern may consist of two patterns separated by a comma;
+in this case, the action is performed for all lines
+from an occurrence of the first pattern
+though an occurrence of the second.
+.PP
+A relational expression is one of the following:
+.IP
+.I expression matchop regular-expression
+.br
+.I expression relop expression
+.br
+.IB expression " in " array-name
+.br
+.BI ( expr , expr,... ") in " array-name
+.PP
+where a
+.I relop
+is any of the six relational operators in C,
+and a
+.I matchop
+is either
+.B ~
+(matches)
+or
+.B !~
+(does not match).
+A conditional is an arithmetic expression,
+a relational expression,
+or a Boolean combination
+of these.
+.PP
+The special patterns
+.B BEGIN
+and
+.B END
+may be used to capture control before the first input line is read
+and after the last.
+.B BEGIN
+and
+.B END
+do not combine with other patterns.
+.PP
+Variable names with special meanings:
+.TF FILENAME
+.TP
+.B CONVFMT
+conversion format used when converting numbers
+(default
+.BR "%.6g" )
+.TP
+.B FS
+regular expression used to separate fields; also settable
+by option
+.BI \-F fs\f1.
+.TP
+.BR NF
+number of fields in the current record
+.TP
+.B NR
+ordinal number of the current record
+.TP
+.B FNR
+ordinal number of the current record in the current file
+.TP
+.B FILENAME
+the name of the current input file
+.TP
+.B RS
+input record separator (default newline)
+.TP
+.B OFS
+output field separator (default blank)
+.TP
+.B ORS
+output record separator (default newline)
+.TP
+.B OFMT
+output format for numbers (default
+.BR "%.6g" )
+.TP
+.B SUBSEP
+separates multiple subscripts (default 034)
+.TP
+.B ARGC
+argument count, assignable
+.TP
+.B ARGV
+argument array, assignable;
+non-null members are taken as file names
+.TP
+.B ENVIRON
+array of environment variables; subscripts are names.
+.PD
+.PP
+Functions may be defined (at the position of a pattern-action statement) thus:
+.IP
+.L
+function foo(a, b, c) { ...; return x }
+.PP
+Parameters are passed by value if scalar and by reference if array name;
+functions may be called recursively.
+Parameters are local to the function; all other variables are global.
+Thus local variables may be created by providing excess parameters in
+the function definition.
+.SH EXAMPLES
+.TP
+.L
+length($0) > 72
+Print lines longer than 72 characters.
+.TP
+.L
+{ print $2, $1 }
+Print first two fields in opposite order.
+.PP
+.EX
+BEGIN { FS = ",[ \et]*|[ \et]+" }
+ { print $2, $1 }
+.EE
+.ns
+.IP
+Same, with input fields separated by comma and/or blanks and tabs.
+.PP
+.EX
+ { s += $1 }
+END { print "sum is", s, " average is", s/NR }
+.EE
+.ns
+.IP
+Add up first column, print sum and average.
+.TP
+.L
+/start/, /stop/
+Print all lines between start/stop pairs.
+.PP
+.EX
+BEGIN { # Simulate echo(1)
+ for (i = 1; i < ARGC; i++) printf "%s ", ARGV[i]
+ printf "\en"
+ exit }
+.EE
+.SH SOURCE
+.B \*9/src/cmd/awk
+.SH SEE ALSO
+.IR sed (1),
+.IR regexp (7),
+.br
+A. V. Aho, B. W. Kernighan, P. J. Weinberger,
+.I
+The AWK Programming Language,
+Addison-Wesley, 1988. ISBN 0-201-07981-X
+.SH BUGS
+There are no explicit conversions between numbers and strings.
+To force an expression to be treated as a number add 0 to it;
+to force it to be treated as a string concatenate
+\&\fL""\fP to it.
+.br
+The scope rules for variables in functions are a botch;
+the syntax is worse.
+.br
+UTF is not always dealt with correctly,
+though
+.I awk
+does make an attempt to do so.
+The
+.I split
+function with an empty string as final argument now copes
+with UTF in the string being split.