[antlr-interest] Overlapping tokens

Wed Oct 5 14:03:49 PDT 2005

Sounds to me as though you are looking for the tokens{} section of the
grammar.

Class MyLexer extends Lexer
options
{
  k=1; // Just to highlight the point
}
tokens
{
  FOOBAR = "FooBar";
}

ID : ('a'..'z' | 'A'..'Z' | '_')*;

That should do the trick.

Hope this helps,

Oliver

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of David Maxwell
Sent: 05 October 2005 21:47
To: antlr-interest at antlr.org
Subject: [antlr-interest] Overlapping tokens

Hi all,

Thanks to everyone who replied (on topic ;-) to my C++ beginner
questions.  That did help me get further.

Now I have a more specific query.

In a lex/yacc example, I could do something like this:

"FooBar"                { printf ("Found a FOOBAR lex token\n");
                          strcpy(yylval.stval,yytext);
                          return FOOBAR; }

[a-zA-Z_]*              { printf("Found a ID lex token\n");
                          strcpy(yylval.stval,yytext);
                          return ID; }

If the input text is:
=====
Foobar
=====

The lexer will pass a FOOBAR token to the parser, which then either
accepts it, or not, based on the current position in the grammar.

Any text of the form [a-zA-Z_]* that doesn't match "FooBar" will result
in
an ID token being returned to the paser.

In lex/yacc, that is valid for strings such as "Foo".

In Antlr, a run-time error is produced, even with k > length(FooBar) >
length(Foo)

Parse exception: <cin>:1:4: expecting ''B'', found '' ''

So, what I'm confused about is this: If I was writing a language without
reserved keywords, I would expect to have to match every piece of
textual input and check it against a list of keywords, and make sure the
parser could use it as a keyword token if appropriate, or an ID if
appropriate. In that case, the 'ID' token matcher would be the only
entry in the lexer...

However, in a lanaguage with reserved keywords, the above seems like a
reasonable way to write the lexer patterns, but every substring of the
reserved keywords ends up being reserved (in-effect) too.

Why does Antlr demand that the rest of the token must be 'ooBar' once it
sees the 'F' - when it has another valid token to use - even when given
enough 'k' to tell the difference?

Thanks again,

							David

_____________________________________________________________________
This email has been scanned by the MessageLabs Email Security System.

_____________________________________________________________________
This email message, including any attachments, may contain confidential and proprietary information for the sole use of the intended recipient.  If you are not the intended recipient, you are hereby notified that any use, copying or dissemination of this message is strictly prohibited.  If you received this message in error, please notify Brooks Automation, Inc. immediately by reply email or by calling Brooks US Headquarters at +1 978-262-2400. Then delete this message from your system, without making any copy or distribution.  Thank you.