[antlr-interest] newbie lookahead question

Lance Gutteridge lance at thinkingworks.com
Fri Apr 21 12:41:08 PDT 2006


Hi,
I'm a user of ANTLR but I'm nowhere close to being a real pro with it. 
I've just learnt what I had to implement my script language. So please 
excuse me if this is a dumb question.

I saw the question regarding lookahead and it is something I have been 
wrestling with as well.

I have a large number of keywords in my language.

For purpose of example here are five:

ON:         "on";
OF:          "of";
OFF:        "off";
OFFICE:  "office";
ACTIVATE:         "activate";
ACTIVATED:      "activated";

To disambiguate between ACTIVATE and ACTIVATED requires k = 9.

That seems inefficient although it works.

On the other hand Martin's solution of matching the leading part of the 
word and then using $setType seems difficult. If I want k=1, I would 
need to have a rule for any words that share a leading letter (e.g. ON 
and OF  and OFF and OFFICE).

To  handle the ON, OF, OFF, and OFFICE case in the manner Martin 
suggests would be a fairly complicated rule, because it has to say that 
the token is OF is it matches "OF" and then a whitespace, to disambiuate 
it from OFF and OFFICE. Then the same has to be done to decide between 
OFF and OFFICE. (BTW I would be grateful for an example of such a rule 
as I have had trouble constructing one that works for this kind of 
situation).

Is there no way to handle this in general, other than setting k to be as 
long as the longest prefix that two keywords have in common?  I would 
really like a technique that every time I add a new keyword I don't have 
to figure out all the ones it could conflct with and write some kind of 
long statement that descends through the letters and peels off ambiguities.


Thanks,
Lance



Martin Probst wrote:

> Hi,
>
> you can either increase your lookahead (which is not advisable in  
> this case), or rather do it manually:
>
> CALIBRATION_THINGY:
>   "CALIBRATION_" ( "METHOD" { $setType(CAL_METHOD);} |  "HANDLE" { 
> $setType(CAL_HANDLE);} );
>
> This parses the CALIBRATION_ part and then decides what kind of token  
> type this is later. You'll may want to add "CAL_METHOD" and  
> "CAL_HANDLE" to the tokens section of your grammar because they  
> aren't declared automatically if used like this - you can use that to  
> give them a proper help message later on.
>
> Martin
>
>
> Am 17.04.2006 um 06:30 schrieb Lucien Stals:
>
>> Hi,
>>
>> I've been learning ANTLR for about two weeks and want to be able to  
>> parse (then transform into XML) an input file in a specific markup  
>> language (ASAP2). I have not worked with parsers before and I feel  
>> like I'm in a little over my head. It's sink or swim time for me.
>>
>> I have some basic stuff working, but I'm getting lots of warnings  
>> about ambiguity.
>>
>> Part of a sample input file might look like:
>>
>> ...
>> /begin CALIBRATION_METHOD
>>     "Slewing"
>>     1
>>     /begin CALIBRATION_HANDLE
>>         0x1ffbf8
>>         0x400
>>         0
>>         AllSlews
>>     /end CALIBRATION_HANDLE
>> /end CALIBRATION_METHOD
>> ...
>>
>>
>> My question is about look ahead.
>> In my parser, I have rules like:
>>
>>
>> calibrationMethod:BEGIN CAL_METH
>>             (calibrationHandle)*
>>             END CAL_METH;
>>            
>> calibrationHandle:BEGIN CAL_HAND
>>             END CAL_HAND;
>>
>> Where my lexer rules are:
>>
>> protected
>> SLASH        :'/';
>>
>> BEGIN        :SLASH "begin";
>>
>> END        :SLASH "end";
>>
>> CAL_METH    :"CALIBRATION_METHOD";
>>
>> CAL_HAND    :"CALIBRATION_HANDLE";
>>
>>
>> (I'm just dealing with the tag structure first. Parsing the actual  
>> data is my next step. I have filter=true for now so I can ignore  
>> what I can't parse yet)
>>
>> And I'm getting ambiguity warnings. Should I set my lookahead to  
>> something silly like 13 just to look past "CALIBRATION_"? (I read  
>> that bigger lookaheads are performance killers) Or is there a  
>> smarter way to do this? Should I use predicates like:
>>
>> calibrationMethod:BEGIN CAL_METH {this.inCalMeth=true;}
>>             (calibrationHandle)*
>>             END CAL_METH {this.inCalMeth=false;};
>>            
>> calibrationHandle:{this.inCalMeth}?
>>         BEGIN CAL_HAND
>>         END CAL_HAND;
>>
>> Perhaps I'm completely off base. If it looks like I'm really  missing 
>> something, you might be right. Feel free to let me know.
>>
>> This is only one of the problems I'm having, but I'll just keep it  
>> to one question per post ;)
>>
>> BTW, if anyone is aware of a grammar that is similar which I can  get 
>> inspiration from, can you let me know?
>>
>> Thanks,
>>
>> Lucien.
>>
>
>


More information about the antlr-interest mailing list