[antlr-interest] Ambiguity between floating point literal and method call

Jim Idle jimi at temporal-wave.com
Wed Nov 2 09:48:57 PDT 2011


You can browse it here. Try not to depart from the lexer until you have
this working.



http://kenai.com/projects/openjfx-compiler/sources/jfx-debug/show/src/share/classes/com/sun/tools/javafx/antlr?rev=6727



Jim



*From:* Ross Bamford [mailto:roscoml at gmail.com]
*Sent:* Wednesday, November 02, 2011 5:18 AM
*To:* Jim Idle
*Cc:* antlr-interest at antlr.org
*Subject:* Re: [antlr-interest] Ambiguity between floating point literal
and method call



Thanks, Jim. I'd seen that FAQ page before, and had played with integrating
that approach into my grammar, however I still don't seem to be able to get
it to work - parsing input such as: "1.foo()" results in the 1 and it's
period being matched together (outputting '1.'), meaning that my parser
never sees the INTEGER DOT ID production, and I get NoViableAlt exceptions.
Interestingly, after integrating the changes you suggested hex literal
method calls also no longer work, which they do with my "normal" literal
lexing.



I would very much like to look at the JavaFX source and see how it's done
over there. Unfortunately though I have very limited Internet service here
(I live in a very rural area) and I wonder if you know if it's browseable
online rather than having to download the source tree?



Thanks again,

Ross

On Thu, Oct 27, 2011 at 12:02 AM, Jim Idle <jimi at temporal-wave.com> wrote:

Please see the FAQ:

http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%
2C+dot%2C+range%2C+time+specs<http://www.antlr.org/wiki/display/ANTLR3/Lexer+grammar+for+floating+point%25%0d%0a2C+dot%2C+range%2C+time+specs>

Which you can modify for your purpose, then you can add INTEGER DOT ID in
your parser. If you were to download the source code for the JavaFX
compiler, you will see that it supports that exact syntax.


Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Ross Bamford
> Sent: Wednesday, October 26, 2011 3:37 PM
> To: antlr-interest at antlr.org
> Subject: [antlr-interest] Ambiguity between floating point literal and
> method call

>
> Hi all,
>
> Have posted here recently, and thanks again for all your help in
> getting my various problems fixed. I'm implementing a basic scripting
> language for use in embedded systems, and I've come across another
> problem that, after much googling and tinkering I still can seem to
> fix. In this language, numbers are first-class objects, and I need to
> be able to call methods on them, in the standard way, e.g. 1.foo() .
> However, I'm coming up against a problem whereby the parser can't
> distinguish between this and floating point literals. I've tried
> various combinations of predicates and the like, but just don't seem to
> be able to get it working. Any help would be much appreciated!
>
> Thanks in advance,
> Ross Bamford
>
> /* ** GRAMMAR FOLLOWS ** */
> grammar BasicLang;
>
> options {
>     output=AST;
>     ASTLabelType=CommonTree;
>     backtrack=true;
>     memoize=true;
> }
>
> tokens {
>   ASSIGN;
>   METHOD_CALL;
>   ARGS;
>   BLOCK;
>   ORBLOCK;
>   SELF;
>   ASSIGN_RECEIVER;
>   ASSIGN_LOCAL;
>   FIELD_ACCESS;
>   LVALUE;
> }
>
> start_rule
>   :   script
>   ;
>
> script
>   :   statement+
>   |   EOF!
>   ;
>
> statement
>   :   expr terminator!
>   ;
>
> expr
>   :   assign_expr
>   |   math_expr
>   ;
>
> assign_expr
> @init {boolean explicitReceiver=false;}
>   :   (rec=IDENTIFIER DOT {explicitReceiver=true;})? id=IDENTIFIER
> ASSIGN
> expr -> {explicitReceiver}? ^(ASSIGN ASSIGN_RECEIVER[$rec.getText()]
> LVALUE[$id.getText()] expr) -> ^(ASSIGN ASSIGN_LOCAL
> LVALUE[$id.getText()]
> expr)
>   ;
>
> math_expr
>   :   mult_expr ((ADD^|SUB^) mult_expr)*
>   ;
>
> mult_expr
>   :   pow_expr ((MUL^|DIV^|MOD^) pow_expr)*
>   ;
>
> pow_expr
>   :   unary_expr ((POW^) unary_expr)*
>   ;
>
> unary_expr
>   :   NOT? atom
>   ;
>
> meth_call
> @init {boolean explicitReceiver=false;}
>   :   (IDENTIFIER DOT {explicitReceiver=true;})? func_call_expr ->
> {explicitReceiver}? ^(METHOD_CALL IDENTIFIER func_call_expr) ->
> ^(METHOD_CALL SELF func_call_expr)
>   |   literal DOT func_call_expr -> ^(METHOD_CALL literal
> func_call_expr)
>   ;
>
> fragment
> func_call_expr
>   :   IDENTIFIER^ argument_list block? orblock?
>   ;
>
> fragment
> block
>   :   LCURLY TERMINATOR? statement* RCURLY -> ^(BLOCK statement*)
>   ;
>
> fragment
> orblock
>   :   OR LCURLY TERMINATOR? statement* RCURLY -> ^(ORBLOCK statement*)
>   ;
>
> fragment
> argument_list
>   :   LPAREN (expr (COMMA expr)*)? RPAREN -> ^(ARGS expr expr*)?
>   ;
>
> class_identifier
>   :     rec=IDENTIFIER DOT id=IDENTIFIER -> ^(FIELD_ACCESS $rec $id)
>   ;
>
> literal
>   :     DECIMAL_LITERAL
>   |     OCTAL_LITERAL
>   |     HEX_LITERAL
>   |     FLOATING_POINT_LITERAL
>   |     STRING_LITERAL
>   |     CHARACTER_LITERAL
>   ;
>
> atom
>   :     literal
>   |     meth_call
>   |     IDENTIFIER
>   |     class_identifier
>   |     LPAREN! expr RPAREN!
>   ;
>
> terminator
>   :     TERMINATOR
>   |     EOF
>   ;
>
> OR  :   'or';
>
> POW :   '^' ;
> MOD :   '%' ;
> ADD :   '+' ;
> SUB :   '-' ;
> DIV :   '/' ;
> MUL :   '*' ;
> NOT :   '!' ;
>
> ASSIGN
>     :   '='
>     ;
>
> LPAREN
>     :   '('
>     ;
>
> RPAREN
>     :   ')'
>     ;
>
> LCURLY
>     :   '{'
>     ;
>
> RCURLY
>     :   '}'
>     ;
>
> COMMA
>     :   ','
>     ;
>
> DOT :   '.' ;
>
> IDENTIFIER
>   : ID_LETTER (ID_LETTER|'0'..'9')*
>   ;
>
> fragment
> ID_LETTER
>   : '$'
>   | 'A'..'Z'
>   | 'a'..'z'
>   | '_'
>   ;
>
> CHARACTER_LITERAL
>     :   '\'' ( EscapeSequence | ~('\''|'\\') ) '\''
>     ;
>
> STRING_LITERAL
>     :  '"' ( EscapeSequence | ~('\\'|'"') )* '"'
>     ;
>
> HEX_LITERAL : '0' ('x'|'X') HexDigit+ IntegerTypeSuffix? ;
>
> DECIMAL_LITERAL : ('0' | '1'..'9' '0'..'9'*) IntegerTypeSuffix? ;
>
> OCTAL_LITERAL : '0' ('0'..'7')+ IntegerTypeSuffix? ;
>
> fragment
> HexDigit : ('0'..'9'|'a'..'f'|'A'..'F') ;
>
> fragment
> IntegerTypeSuffix
>   : ('l'|'L')
>   | ('u'|'U')  ('l'|'L')?
>   ;
>
> FLOATING_POINT_LITERAL
>     :   ('0'..'9')+ '.' ('0'..'9')* Exponent? FloatTypeSuffix?
>     |   '.' ('0'..'9')+ Exponent? FloatTypeSuffix?
>     |   ('0'..'9')+ Exponent? FloatTypeSuffix?
>   ;
>
> fragment
> Exponent : ('e'|'E') ('+'|'-')? ('0'..'9')+ ;
>
> fragment
> FloatTypeSuffix : ('f'|'F'|'d'|'D') ;
>
> fragment
> EscapeSequence
>     :   '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\'|'/')
>     |   OctalEscape
>     |   UnicodeEscape
>     ;
>
> fragment
> OctalEscape
>     :   '\\' ('0'..'3') ('0'..'7') ('0'..'7')
>     |   '\\' ('0'..'7') ('0'..'7')
>     |   '\\' ('0'..'7')
>     ;
>
> fragment
> UnicodeEscape
>     :   '\\' 'u' HexDigit HexDigit HexDigit HexDigit
>     ;
> COMMENT
>     :   '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
>     ;
>
> LINE_COMMENT
>     : '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
>     ;
>
> TERMINATOR
>   : '\r'? '\n'
>   | ';'
>   ;
>
> WS  :  (' '|'\r'|'\t'|'\u000C') {$channel=HIDDEN;}
>     |  '...' '\r'? '\n'  {$channel=HIDDEN;}
>     ;
>

> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address

List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list