[antlr-interest] Overlapping tokens
David Maxwell
david at crlf.net
Wed Oct 5 13:46:32 PDT 2005
Hi all,
Thanks to everyone who replied (on topic ;-) to my C++ beginner
questions. That did help me get further.
Now I have a more specific query.
In a lex/yacc example, I could do something like this:
"FooBar" { printf ("Found a FOOBAR lex token\n");
strcpy(yylval.stval,yytext);
return FOOBAR; }
[a-zA-Z_]* { printf("Found a ID lex token\n");
strcpy(yylval.stval,yytext);
return ID; }
If the input text is:
=====
Foobar
=====
The lexer will pass a FOOBAR token to the parser, which then either
accepts it, or not, based on the current position in the grammar.
Any text of the form [a-zA-Z_]* that doesn't match "FooBar" will result in
an ID token being returned to the paser.
In lex/yacc, that is valid for strings such as "Foo".
In Antlr, a run-time error is produced, even with k > length(FooBar) >
length(Foo)
Parse exception: <cin>:1:4: expecting ''B'', found '' ''
So, what I'm confused about is this: If I was writing a language without
reserved keywords, I would expect to have to match every piece of
textual input and check it against a list of keywords, and make sure the
parser could use it as a keyword token if appropriate, or an ID if
appropriate. In that case, the 'ID' token matcher would be the only
entry in the lexer...
However, in a lanaguage with reserved keywords, the above seems like a
reasonable way to write the lexer patterns, but every substring of the
reserved keywords ends up being reserved (in-effect) too.
Why does Antlr demand that the rest of the token must be 'ooBar' once it
sees the 'F' - when it has another valid token to use - even when given
enough 'k' to tell the difference?
Thanks again,
David
More information about the antlr-interest
mailing list