[antlr-interest] lexing tips needed... (or, could I use predicate in lexer rule?)

Lloyd Dupont ld at galador.net
Sun Jul 15 20:18:39 PDT 2007


Hi Richard,

Hey, you've done all the work!
Thanks for your suggestion, looks good, I'll take a closer look at it 
tonight (it's for a home project this time).

----- Original Message ----- 
From: "Richard Clark" <rdclark at gmail.com>
To: "Lloyd Dupont" <ld at galador.net>
Cc: <antlr-interest at antlr.org>
Sent: Monday, July 16, 2007 12:48 PM
Subject: Re: [antlr-interest] lexing tips needed... (or, could I use 
predicate in lexer rule?)


> I'm starting to think that questions of the form "how to I use a
> semantic predicate in the Lexer for..." should be a FAQ with the
> answer "let the parser deal with the semantics".
>
> In your case, I'd do something like this:
>
> expr : atom (methodCall)* ; /* Will need to add options for arithmetic
> expressions */
>
> atom: number
>       | STRING
>       | object=ID
>       ;
>
> methodCall: '.' method=ID '(' paramList ')' ;
>
> paramList : expr (',' expr)* ;
>
> number: '-'? INT (('.' INT ('e' '-'? INT)?)? ;
>
> ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ;
>
> INT: '0'..'9'+;
>
> /* and so on... */
>
> ...Richard
>
>
>
>
> On 7/15/07, Lloyd Dupont <ld at galador.net> wrote:
>>
>>
>> Yep, I'm starting to think I should use the parser to distinguish between
>> these issue.
>> Glad someone confirmed!
>> It's just.. unusal, so I didn't dare go this way to start with....
>>
>> ----- Original Message -----
>> From: Micheal J
>> To: antlr-interest at antlr.org
>>
>> Sent: Sunday, July 15, 2007 10:11 PM
>> Subject: Re: [antlr-interest] lexing tips needed... (or,could I use
>> predicate in lexer rule?)
>>
>>
>> Hi,
>>
>> I was really after these:
>>
>> ".1"
>> --> DOT NUM_INT["1"] ?
>> --> NUM_FLOAT[".1"] ?
>>
>> ".1 1"
>>
>> --> DOT NUM_INT["1"] WHITESPACE NUM_INT["1"] ?
>> --> NUM_FLOAT[".1"] WHITESPACE NUM_INT["1"] ?
>>
>> As for your examples, "1." is ambiguos at the lexing stage but the parser
>> has more context and the ambiguity disappears. If you followed my
>> suggestion, you just emit a NUM_FLOAT_DOT["1."] in the lexer and the 
>> parser
>> can deal with it as a float value or prefix to a memberAcess (i.e. the
>> "<object><.>" bit of an "<object><.><member>" operation).
>>
>> float
>>     : NUM_FLOAT    // "1.e3"
>>     | NUM_FLOAT_DOT // "1."
>>     .... ;
>>
>> memberAccessExpr
>>     : ID DOT ID
>>     | NUM_FLOAT_DOT ID  // "1.echo", "1.e"
>>     .... ;
>>
>> Micheal
>>
>> -----------------------
>> The best way to contact me is via the list/forum. My time is very 
>> limited.
>>
>>
>> -----Original Message-----
>> From: Lloyd Dupont [mailto:ld at galador.net]
>> Sent: 15 July 2007 12:11
>> To: Micheal J; antlr-interest at antlr.org
>> Subject: Re: [antlr-interest] lexing tips needed... (or,could I use
>> predicate in lexer rule?)
>>
>>
>> The problem is: the result really depends of what's after.
>>
>> 1. is ambiguous
>>
>> 1.e is ambiguous
>> 1.e3 is a float
>> 1.echo is INT DOT ID
>>
>>
>> ----- Original Message -----
>> From: Micheal J
>> To: antlr-interest at antlr.org
>> Sent: Sunday, July 15, 2007 11:36 AM
>> Subject: Re: [antlr-interest] lexing tips needed... (or,could I use
>> predicate in lexer rule?)
>>
>>
>> Hi, Forgot to ask:
>>
>> What should the lexer return for these values (note the spaces)?
>> ".1"
>> "1 .1"
>> "1 ."
>>
>> Micheal
>>
>>
>> -----------------------
>> The best way to contact me is via the list/forum. My time is very 
>> limited.
>>
>>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org
>> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of
>> Lloyd Dupont
>> Sent: 15 July 2007 02:02
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] lexing tips needed... (or,could I use predicate 
>> in
>> lexer rule?)
>>
>>
>> For my parser I reaped of the lexer rules from the Java exemple.
>> Particularly you've got this lexer token:
>> NUM_FLOAT
>>  : DIGITS '.' (DIGITS)? (EXPONENT_PART)? (FLOAT_TYPE_SUFFIX)?
>>  | ............ other match
>>  ;
>>
>> Now there is a problem with that!....
>> in the language I target everything is an object.
>>
>> so:
>>
>> "2."  is NUM_FLOAT["2."]
>> "2.1"  is NUM_FLOAT["2.1"]"2.ToString()" is NUM_INT["2"] DOT["."]
>> ID["ToString"] LPAREN["("] RPAREN[")]
>>
>> So I was trying to disambiguate the lexer with a construct like that 
>> (trying
>> predicate in the lexer):
>> NUM_FLOAT
>>  : (
>>    ( DIGITS '.' EXPONENT_PART )=> DIGITS '.' (EXPONENT_PART)
>> (FLOAT_TYPE_SUFFIX)?
>>   | ( DIGITS '.' ID )=> DIGITS
>>   | DIGITS '.' (DIGITS)? (EXPONENT_PART)? (FLOAT_TYPE_SUFFIX)?
>>   )
>>   |    ..................... other match
>> ;
>>
>> but this doesn't seem to work,
>> 1. it told me the NUM_INT rule is now not accessible anymore (I guess the
>> NUM_FLOAT rule absorbs it with my second alternative)
>> 2. but more importantly an input such as "2.a" return "2." followed by
>> MismatchTokenException.
>>
>> can I use predicate in lexer rule?
>> how could I disambiguate the NUM_FLOAT lexer rule?
>> I.e. being both able to read "2.3" (NUM_FLOAT) and "2.ToString()" 
>> (NUM_INT
>> DOT ID LPAREN RPAREN)?
>>
> 


More information about the antlr-interest mailing list