[antlr-interest] Ambiguous lexer rules

Sun Jul 3 22:56:37 PDT 2011

Hi Stephen,

Thank you! Semantic predicates work for me. In fact, the lexer::xstrMode 
and lexer::inArray in following code are used to mimic the parser logic. We need 
them because lexer is run before parser. Is it possible for ANTLR to merge the 
parser and the lexer into a single run? For example: when a parser rule is 
selected to run, ANTLR deduces the selected rule and select the correct lexer 
rule(just like how it select sub parser rules) to parse the raw string, so that 
we can avoid introducing parser logic into lexer.
---------------------------------------------------------------------------------

grammar test;
@lexer::members {
    boolean inArray = false;
    int xstrMode = 0;
}

ARRAY_START 
:{xstrMode == 0}? =>'[' {inArray = true; }
;
ARRAY_END
:{inArray}? => ']'  {inArray = false; }
;
XSTR_TAG : {!inArray}? => 'xstr' {xstrMode = 1;};
XSTR_BEGIN
:{xstrMode == 1}? => (' ' | '\t') { xstrMode = 2; }
;
STRING : ('a'..'z' | 'A'..'Z' | '0'..'9')+;
XSTRING
:{xstrMode == 2}? => '\u0021'..'\u007e'+ {xstrMode = 0; }
;

array : ARRAY_START STRING+ ARRAY_END;
xstr 
:XSTR_TAG XSTR_BEGIN XSTRING;
---------------------------------------------------------------------------------

Thanks,
Fussi

----- Original Message ----
From: Stephen Tuttlebee <themightystephen at googlemail.com>
To: antlr-interest at antlr.org
Sent: Sun, July 3, 2011 10:55:23 PM
Subject: Re: [antlr-interest] Ambiguous lexer rules

I see your problem. The lexer is independent of the parser -- it doesn't matter 
what the parser is expecting (i.e. even though the parser is expecting a '[' and 
then STRING+, the lexer just sees a sequence of characters '[aaa' that are to be 
grouped into an XSTRING token rather than a '[' token followed by a STRING token 
of value 'aaa')...the lexer will just produce tokens based on the characters 
coming in and the lexer rules (and based on other criteria such as choosing the 
longest match, the first lexer rule that appears, etc., of which I can't 
remember all the details).

One possible solution could be to use semantic predicates. There's an example of 
this athttp://www.antlr.org/wiki/display/ANTLR3/1.+Lexer  where in the lexing of 
XML, a tagMode boolean variable is set whenever opening and closing tags ('<' 
and'>') are seen. Other lexer rules can then have (gated) semantic predicates 
which cause themselves to be enabled or disabled depending on whether the 
predicate (tagMode) was true or false, respectively.

You could try the same thing for your lexer rules for '[' and ']' (currently it 
doesn't look like you have explicit lexer rules for '[' and ']' -- you would 
need them if you use this technique), and then ensure the XSTRING rule has a 
semantic predicate of {!inArray}?=>  (assuming your boolean was called 
'inArray'). One of the last chapters in the ANTLR book has some good stuff on 
semantic predicates (in the context of parsers, but you can use them in lexers 
too).

An alternative *workaround* to your problem would be to change your XSTRING 
lexer rule to exclude the ASCII characters '[' and ']'. I think that would work.

Hope that helps,
Stephen

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe: 
http://www.antlr.org/mailman/options/antlr-interest/your-email-address