[antlr-interest] Ambiguous lexer rules
Fs Cc
reginfo_ar at ymail.com
Sun Jul 3 22:56:37 PDT 2011
Hi Stephen,
Thank you! Semantic predicates work for me. In fact, the lexer::xstrMode
and lexer::inArray in following code are used to mimic the parser logic. We need
them because lexer is run before parser. Is it possible for ANTLR to merge the
parser and the lexer into a single run? For example: when a parser rule is
selected to run, ANTLR deduces the selected rule and select the correct lexer
rule(just like how it select sub parser rules) to parse the raw string, so that
we can avoid introducing parser logic into lexer.
---------------------------------------------------------------------------------
grammar test;
@lexer::members {
boolean inArray = false;
int xstrMode = 0;
}
ARRAY_START
:{xstrMode == 0}? =>'[' {inArray = true; }
;
ARRAY_END
:{inArray}? => ']' {inArray = false; }
;
XSTR_TAG : {!inArray}? => 'xstr' {xstrMode = 1;};
XSTR_BEGIN
:{xstrMode == 1}? => (' ' | '\t') { xstrMode = 2; }
;
STRING : ('a'..'z' | 'A'..'Z' | '0'..'9')+;
XSTRING
:{xstrMode == 2}? => '\u0021'..'\u007e'+ {xstrMode = 0; }
;
array : ARRAY_START STRING+ ARRAY_END;
xstr
:XSTR_TAG XSTR_BEGIN XSTRING;
---------------------------------------------------------------------------------
Thanks,
Fussi
----- Original Message ----
From: Stephen Tuttlebee <themightystephen at googlemail.com>
To: antlr-interest at antlr.org
Sent: Sun, July 3, 2011 10:55:23 PM
Subject: Re: [antlr-interest] Ambiguous lexer rules
I see your problem. The lexer is independent of the parser -- it doesn't matter
what the parser is expecting (i.e. even though the parser is expecting a '[' and
then STRING+, the lexer just sees a sequence of characters '[aaa' that are to be
grouped into an XSTRING token rather than a '[' token followed by a STRING token
of value 'aaa')...the lexer will just produce tokens based on the characters
coming in and the lexer rules (and based on other criteria such as choosing the
longest match, the first lexer rule that appears, etc., of which I can't
remember all the details).
One possible solution could be to use semantic predicates. There's an example of
this athttp://www.antlr.org/wiki/display/ANTLR3/1.+Lexer where in the lexing of
XML, a tagMode boolean variable is set whenever opening and closing tags ('<'
and'>') are seen. Other lexer rules can then have (gated) semantic predicates
which cause themselves to be enabled or disabled depending on whether the
predicate (tagMode) was true or false, respectively.
You could try the same thing for your lexer rules for '[' and ']' (currently it
doesn't look like you have explicit lexer rules for '[' and ']' -- you would
need them if you use this technique), and then ensure the XSTRING rule has a
semantic predicate of {!inArray}?=> (assuming your boolean was called
'inArray'). One of the last chapters in the ANTLR book has some good stuff on
semantic predicates (in the context of parsers, but you can use them in lexers
too).
An alternative *workaround* to your problem would be to change your XSTRING
lexer rule to exclude the ASCII characters '[' and ']'. I think that would work.
Hope that helps,
Stephen
List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list