[antlr-interest] The behavior.....
John B. Brodie
jbb at acm.org
Tue Jan 27 14:12:20 PST 2009
Meena Vinod asked:
> I have a new problem now. My application's .g is as follows:
>
> CMD_CD : 'cd';
> CMD_SHOW: 'show';
> CMD_RESET: 'reset';
> CMD_SET: 'set';
> CMD_OPTION: (' -all' | ' -help' | ' -version' | ' -display')*;
> TSEP: ' ' { SKIP(); };
>
> The parser is defined as:
> cmd_validate: (cmd)(CMD_OPTION) EOF;
>
> cmd: CMD_CD | CMD_SHOW | CMD_RESET | CMD_SET;
>
> A valid input to my application is "cd -help -version -display". I can
> achieve this with the " " (space) prefixed for each of the CMD_OPTION
> value. However, if I enter "cd-help", then my application hangs on
> ANTLRWorks and even my C code hangs.
>
> I would want to ensure that there is a space between the "cmd" and the
> "cmd_option" (if there is a cmd_option value).
>
> So, my application takes values as follows:
> 1. cd -help -version -display
> 2. cd -version
> 3. cd
> 4. show -help -version -display....... et al
>
> What should I do so that it throws an exception when I enter "cd-help" or
> "show-version"?
Your CMD_OPTION lexer rule accepts the empty string as a valid possible input.
This is generally a very bad thing and is the cause of your problem here.
When given the input sentence "cd-help" your lexer will happily consume the
first and second characters as a CMD_CD token.
Next it will encounter the hyphen ('-') character. But your lexer rules have
all specified that a blank (' ') must appear before any valid hyphen. So you
lexer realizes that this particular hyphen is an recognizable character. But
first, as you have specified, a CMD_OPTION token is valid as the empty string,
so your lexer recognizes the empty string after the 'cd' and before the '-' as
a CMD_OPTION token and emits it into the token stream.
Now your lexer is still looking at the hyphen ('-') character that follows the
'cd' CMD_CD token and it realizes that this particular hyphen is an
recognizable character. But first, as you have specified, a CMD_OPTION token
is valid as the empty string, so your lexer recognizes the empty string after
the 'cd' and before the '-' as a CMD_OPTION token and emits it into the token
stream.
Now your lexer is still looking at the hyphen ('-') character that follows the
'cd' CMD_CD token and it realizes that this particular hyphen is an
recognizable character. But first, as you have specified, a CMD_OPTION token
is valid as the empty string, so your lexer recognizes the empty string after
the 'cd' and before the '-' as a CMD_OPTION token and emits it into the token
stream.
Now your lexer is still looking at the hyphen ('-') character that follows the
'cd' CMD_CD token and it realizes that this particular hyphen is an
recognizable character. But first, as you have specified, a CMD_OPTION token
is valid as the empty string, so your lexer recognizes the empty string after
the 'cd' and before the '-' as a CMD_OPTION token and emits it into the token
stream.
and as you can tell from the above discussion, your lexer is stuck in an
infinite (e.g. hung) emitting empty CMD_OPTION tokens trying to handle all
valid lexer rules before throwing an unrecognizable character error.
(Actually, in my test rig it runs out of memory fairly quickly and dies)
As a temporary fix: remove the * operator from the CMD_OPTION lexer rule and
put it on the CMD_OPTION phrase in the parser cmd_validate rule. Thus:
CMD_OPTION: ' -all' | ' -help' | ' -version' | ' -display';
cmd_validate: cmd CMD_OPTION* EOF;
I think this change solves this specific problem (at least is does in my test
rig). But I believe you have a deeper issue(s) here because your TSEP lexer
rule says that blanks should be ignored and yet you insist on having blanks
between elements of your command; so blanks are significant. Which way is it
supposed to be? I think by trying to skip blanks sometimes and require blanks
other times by adding a blank to the text of each of the options is, in the
long run, a big mistake. Just my 2 cents.
Hope this helps...
---
-jbb
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090127/7d9c7283/attachment.html
More information about the antlr-interest
mailing list