[antlr-interest] lexing tips needed... (or, could I use predicate in lexer rule?)

Richard Clark rdclark at gmail.com
Sun Jul 15 19:48:05 PDT 2007


I'm starting to think that questions of the form "how to I use a
semantic predicate in the Lexer for..." should be a FAQ with the
answer "let the parser deal with the semantics".

In your case, I'd do something like this:

expr : atom (methodCall)* ; /* Will need to add options for arithmetic
expressions */

atom: number
       | STRING
       | object=ID
       ;

methodCall: '.' method=ID '(' paramList ')' ;

paramList : expr (',' expr)* ;

number: '-'? INT (('.' INT ('e' '-'? INT)?)? ;

ID : ('a'..'z' | 'A'..'Z' | '_') ('a'..'z' | 'A'..'Z' | '_' | '0'..'9')* ;

INT: '0'..'9'+;

/* and so on... */

...Richard




On 7/15/07, Lloyd Dupont <ld at galador.net> wrote:
>
>
> Yep, I'm starting to think I should use the parser to distinguish between
> these issue.
> Glad someone confirmed!
> It's just.. unusal, so I didn't dare go this way to start with....
>
> ----- Original Message -----
> From: Micheal J
> To: antlr-interest at antlr.org
>
> Sent: Sunday, July 15, 2007 10:11 PM
> Subject: Re: [antlr-interest] lexing tips needed... (or,could I use
> predicate in lexer rule?)
>
>
> Hi,
>
> I was really after these:
>
> ".1"
> --> DOT NUM_INT["1"] ?
> --> NUM_FLOAT[".1"] ?
>
> ".1 1"
>
> --> DOT NUM_INT["1"] WHITESPACE NUM_INT["1"] ?
> --> NUM_FLOAT[".1"] WHITESPACE NUM_INT["1"] ?
>
> As for your examples, "1." is ambiguos at the lexing stage but the parser
> has more context and the ambiguity disappears. If you followed my
> suggestion, you just emit a NUM_FLOAT_DOT["1."] in the lexer and the parser
> can deal with it as a float value or prefix to a memberAcess (i.e. the
> "<object><.>" bit of an "<object><.><member>" operation).
>
> float
>     : NUM_FLOAT    // "1.e3"
>     | NUM_FLOAT_DOT // "1."
>     .... ;
>
> memberAccessExpr
>     : ID DOT ID
>     | NUM_FLOAT_DOT ID  // "1.echo", "1.e"
>     .... ;
>
> Micheal
>
> -----------------------
> The best way to contact me is via the list/forum. My time is very limited.
>
>
> -----Original Message-----
> From: Lloyd Dupont [mailto:ld at galador.net]
> Sent: 15 July 2007 12:11
> To: Micheal J; antlr-interest at antlr.org
> Subject: Re: [antlr-interest] lexing tips needed... (or,could I use
> predicate in lexer rule?)
>
>
> The problem is: the result really depends of what's after.
>
> 1. is ambiguous
>
> 1.e is ambiguous
> 1.e3 is a float
> 1.echo is INT DOT ID
>
>
> ----- Original Message -----
> From: Micheal J
> To: antlr-interest at antlr.org
> Sent: Sunday, July 15, 2007 11:36 AM
> Subject: Re: [antlr-interest] lexing tips needed... (or,could I use
> predicate in lexer rule?)
>
>
> Hi, Forgot to ask:
>
> What should the lexer return for these values (note the spaces)?
> ".1"
> "1 .1"
> "1 ."
>
> Micheal
>
>
> -----------------------
> The best way to contact me is via the list/forum. My time is very limited.
>
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of
> Lloyd Dupont
> Sent: 15 July 2007 02:02
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] lexing tips needed... (or,could I use predicate in
> lexer rule?)
>
>
> For my parser I reaped of the lexer rules from the Java exemple.
> Particularly you've got this lexer token:
> NUM_FLOAT
>  : DIGITS '.' (DIGITS)? (EXPONENT_PART)? (FLOAT_TYPE_SUFFIX)?
>  | ............ other match
>  ;
>
> Now there is a problem with that!....
> in the language I target everything is an object.
>
> so:
>
> "2."  is NUM_FLOAT["2."]
> "2.1"  is NUM_FLOAT["2.1"]"2.ToString()" is NUM_INT["2"] DOT["."]
> ID["ToString"] LPAREN["("] RPAREN[")]
>
> So I was trying to disambiguate the lexer with a construct like that (trying
> predicate in the lexer):
> NUM_FLOAT
>  : (
>    ( DIGITS '.' EXPONENT_PART )=> DIGITS '.' (EXPONENT_PART)
> (FLOAT_TYPE_SUFFIX)?
>   | ( DIGITS '.' ID )=> DIGITS
>   | DIGITS '.' (DIGITS)? (EXPONENT_PART)? (FLOAT_TYPE_SUFFIX)?
>   )
>   |    ..................... other match
> ;
>
> but this doesn't seem to work,
> 1. it told me the NUM_INT rule is now not accessible anymore (I guess the
> NUM_FLOAT rule absorbs it with my second alternative)
> 2. but more importantly an input such as "2.a" return "2." followed by
> MismatchTokenException.
>
> can I use predicate in lexer rule?
> how could I disambiguate the NUM_FLOAT lexer rule?
> I.e. being both able to read "2.3" (NUM_FLOAT) and "2.ToString()" (NUM_INT
> DOT ID LPAREN RPAREN)?
>


More information about the antlr-interest mailing list