[antlr-interest] The behavior.....

John B. Brodie jbb at acm.org
Tue Jan 27 14:12:20 PST 2009


Meena Vinod asked:
> I have a new problem now. My application's .g is as follows:
>
>     CMD_CD : 'cd';
>     CMD_SHOW: 'show';
>     CMD_RESET: 'reset';
>     CMD_SET: 'set';
>     CMD_OPTION: (' -all' | ' -help' | ' -version' | ' -display')*;
>     TSEP: ' ' { SKIP(); };
>
> The parser is defined as:
> cmd_validate: (cmd)(CMD_OPTION) EOF;
>
> cmd: CMD_CD | CMD_SHOW | CMD_RESET | CMD_SET;
>
> A valid input to my application is "cd -help -version -display". I can
> achieve this with the " " (space) prefixed for each of the CMD_OPTION
> value. However, if I enter "cd-help", then my application hangs on
> ANTLRWorks and even my C code hangs.
>
> I would want to ensure that there is a space between the "cmd" and the
> "cmd_option" (if there is a cmd_option value).
>
> So, my application takes values as follows:
> 1. cd -help -version -display
> 2. cd -version
> 3. cd
> 4. show -help -version -display....... et al
>
> What should I do so that it throws an exception when I enter "cd-help" or
> "show-version"?

Your CMD_OPTION lexer rule accepts the empty string as a valid possible input.

This is generally a very bad thing and is the cause of your problem here.

When given the input sentence "cd-help" your lexer will happily consume the 
first and second characters as a CMD_CD token.

Next it will encounter the hyphen ('-') character. But your lexer rules have 
all specified that a blank (' ') must appear before any valid hyphen. So you 
lexer realizes that this particular hyphen is an recognizable character. But 
first, as you have specified, a CMD_OPTION token is valid as the empty string, 
so your lexer recognizes the empty string after the 'cd' and before the '-' as 
a CMD_OPTION token and emits it into the token stream.

Now your lexer is still looking at the hyphen ('-') character that follows the 
'cd' CMD_CD token and it realizes that this particular hyphen is an 
recognizable character. But first, as you have specified, a CMD_OPTION token 
is valid as the empty string, so your lexer recognizes the empty string after 
the 'cd' and before the '-' as a CMD_OPTION token and emits it into the token 
stream.

Now your lexer is still looking at the hyphen ('-') character that follows the 
'cd' CMD_CD token and it realizes that this particular hyphen is an 
recognizable character. But first, as you have specified, a CMD_OPTION token 
is valid as the empty string, so your lexer recognizes the empty string after 
the 'cd' and before the '-' as a CMD_OPTION token and emits it into the token 
stream.

Now your lexer is still looking at the hyphen ('-') character that follows the 
'cd' CMD_CD token and it realizes that this particular hyphen is an 
recognizable character. But first, as you have specified, a CMD_OPTION token 
is valid as the empty string, so your lexer recognizes the empty string after 
the 'cd' and before the '-' as a CMD_OPTION token and emits it into the token 
stream.

and as you can tell from the above discussion, your lexer is stuck in an 
infinite (e.g. hung) emitting empty CMD_OPTION tokens trying to handle all 
valid lexer rules before throwing an unrecognizable character error. 
(Actually, in my test rig it runs out of memory fairly quickly and dies)

As a temporary fix: remove the * operator from the CMD_OPTION lexer rule and 
put it on the CMD_OPTION phrase in the parser cmd_validate rule. Thus:

CMD_OPTION: ' -all' | ' -help' | ' -version' | ' -display';
cmd_validate: cmd CMD_OPTION* EOF;

I think this change solves this specific problem (at least is does in my test 
rig). But I believe you have a deeper issue(s) here because your TSEP lexer 
rule says that blanks should be ignored and yet you insist on having blanks 
between elements of your command; so blanks are significant. Which way is it 
supposed to be? I think by trying to skip blanks sometimes and require blanks 
other times by adding a blank to the text of each of the options is, in the 
long run, a big mistake. Just my 2 cents.

Hope this helps...
---
   -jbb


-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090127/7d9c7283/attachment.html 


More information about the antlr-interest mailing list