[antlr-interest] Lookahead problem

Ilia Kantor ilia at obnovlenie.ru
Sun May 15 11:16:41 PDT 2005


I want to create a language with functions like ~func{}, variables #var and
arrays #var{1}{~func{}}.
Besides, curly bracers are allowed as plain-text: ~func{my list: {1,2,3} }

Here is a simple lexer/parser for a part of the task.

-----------------------------
class SimpleTaskLexer extends Lexer;
LCURL: '{';
RCURL: '}' ;
ANY: (~('~' | '{' | '}' | '#'))+;
 
protected NAME: ('A'..'Z' | 'a'..'z' | '0'..'9' | '_')+;
VARIABLE: '#' NAME;
FUNCTION: '~' NAME;
 
class SimpleTaskParser extends Parser;
expr:	function EOF;
function: FUNCTION curly_text;
curly_text: LCURL entries RCURL;
entries: entry entries |;
entry: ANY | function | curly_text;
---------------------------

The problem is that I can't add array to read #var{...}{...}
Where curly braces after #var denote array member number.

Naturally it would be:
1. entry: ANY | function | curly_text | variable;
2. variable: VARIABLE (curly_text)*;

But that leads to (for string 2.)
warning:nondeterminism upon
k==1:LCURL
between alt 1 and exit branch of block


I guess that's because parser reads var then it does not know where array
members end and usual curly text begins.
The logical answer is simple: get as many {} after #var as possible as array
members.

How to implement that? I tried lookahead like:
entry: ANY | function | curly_text |
	(VARIABLE (curly_text)*) => VARIABLE (curly_text)*
	| VARIABLE;
But it did not work..




More information about the antlr-interest mailing list