diff options
author | Russ Cox <rsc@swtch.com> | 2020-05-04 18:34:19 -0400 |
---|---|---|
committer | Russ Cox <rsc@swtch.com> | 2020-05-04 23:41:15 -0400 |
commit | 47d4646eebac34c0b94951cfcf1b81ed2ca513e1 (patch) | |
tree | fff188620bba3491f209e6268fdb87b816c3b236 /src/cmd/rc/syn.y | |
parent | c1c1b5267fd5e14be531a4b22ed0124b35d427cb (diff) | |
download | plan9port-47d4646eebac34c0b94951cfcf1b81ed2ca513e1.tar.gz plan9port-47d4646eebac34c0b94951cfcf1b81ed2ca513e1.tar.bz2 plan9port-47d4646eebac34c0b94951cfcf1b81ed2ca513e1.zip |
rc: add recursive descent parser
The old yacc-based parser is available with the -Y flag,
which will probably be removed at some point.
The new -D flag dumps a parse tree of the input,
without executing it. This allows comparing the output
of rc -D and rc -DY on different scripts to see that the
two parsers behave the same.
The rc paper ends by saying:
It is remarkable that in the four most recent editions of the UNIX
system programmer’s manual the Bourne shell grammar described in the
manual page does not admit the command who|wc. This is surely an
oversight, but it suggests something darker: nobody really knows what
the Bourne shell’s grammar is. Even examination of the source code is
little help. The parser is implemented by recursive descent, but the
routines corresponding to the syntactic categories all have a flag
argument that subtly changes their operation depending on the context.
Rc’s parser is implemented using yacc, so I can say precisely what the
grammar is.
The new recursive descent parser here has no such flags.
It is a straightforward translation of the yacc.
The new parser will make it easier to handle free carats
in more generality as well as potentially allow the use of
unquoted = as a word character.
Going through this exercise has highlighted a few
dark corners here as well. For example, I was surprised to
find that
x >f | y
>f x | y
are different commands (the latter redirects y's output).
It is similarly surprising that
a=b x | y
sets a during the execution of y.
It is also a bit counter-intuitive
x | y | z
x | if(c) y | z
are not both 3-phase pipelines.
These are certainly not things we should change, but they
are not entirely obvious from the man page description,
undercutting the quoted claim a bit.
On the other hand, who | wc is clearly accepted by the grammar
in the manual page, and the new parser still handles that test case.
Diffstat (limited to 'src/cmd/rc/syn.y')
-rw-r--r-- | src/cmd/rc/syn.y | 10 |
1 files changed, 5 insertions, 5 deletions
diff --git a/src/cmd/rc/syn.y b/src/cmd/rc/syn.y index c7de3531..5c98ef80 100644 --- a/src/cmd/rc/syn.y +++ b/src/cmd/rc/syn.y @@ -1,5 +1,5 @@ %term FOR IN WHILE IF NOT TWIDDLE BANG SUBSHELL SWITCH FN -%term WORD REDIR DUP PIPE SUB +%term WORD REDIR REDIRW DUP PIPE SUB %term SIMPLE ARGLIST WORDS BRACE PAREN PCMD PIPEFD /* not used in syntax */ /* operator priorities -- lowest first */ %left IF WHILE FOR SWITCH ')' NOT @@ -19,7 +19,7 @@ %type<tree> line paren brace body cmdsa cmdsan assign epilog redir %type<tree> cmd simple first word comword keyword words %type<tree> NOT FOR IN WHILE IF TWIDDLE BANG SUBSHELL SWITCH FN -%type<tree> WORD REDIR DUP PIPE +%type<tree> WORD REDIR REDIRW DUP PIPE %% rc: { return 1;} | line '\n' {return !compile($1);} @@ -45,7 +45,7 @@ cmd: {$$=0;} | IF NOT {skipnl();} cmd {$$=mung1($2, $4);} | FOR '(' word IN words ')' {skipnl();} cmd /* - * if ``words'' is nil, we need a tree element to distinguish between + * if ``words'' is nil, we need a tree element to distinguish between * for(i in ) and for(i), the former being a loop over the empty set * and the latter being the implicit argument loop. so if $5 is nil * (the empty set), we represent it as "()". don't parenthesize non-nil @@ -73,7 +73,7 @@ cmd: {$$=0;} simple: first | simple word {$$=tree2(ARGLIST, $1, $2);} | simple redir {$$=tree2(ARGLIST, $1, $2);} -first: comword +first: comword | first '^' word {$$=tree2('^', $1, $3);} word: keyword {lastword=1; $1->type=WORD;} | comword @@ -85,7 +85,7 @@ comword: '$' word {$$=tree1('$', $2);} | WORD | '`' brace {$$=tree1('`', $2);} | '(' words ')' {$$=tree1(PAREN, $2);} -| REDIR brace {$$=mung1($1, $2); $$->type=PIPEFD;} +| REDIRW brace {$$=mung1($1, $2); $$->type=PIPEFD;} keyword: FOR|IN|WHILE|IF|NOT|TWIDDLE|BANG|SUBSHELL|SWITCH|FN words: {$$=(struct tree*)0;} | words word {$$=tree2(WORDS, $1, $2);} |