[antlr-interest] Expression embedded in arbitary Text

Joachim Rosskopf antlr at b0nz0.de
Tue Apr 1 04:45:29 PDT 2008


Hello Drmitry,

that was the approach I was using previously. I parsed the expressions 
soley with regex. But that was getting pretty ugly and doesn´t work well 
with nested as well as suceeding expression. So I searched for something 
like:

statement
    :    ( options { greedy=false; } :  . )+
    |    ( options { greedy=true; }  :   EXPRESSION_OPEN! expression 
EXPRESSION_CLOSE!  )+
    ;

But the point '.' in the above example stands for any defined lexer rule 
and not any character as I would like to have.
Is that possible with antlr?

Regards
---
Joachim

Dmitry Gusev schrieb:
> I'd recommend you to use Regular expressions to extract the "#{bla}" 
> things.
>
> Then you'll be able to use these match results as an input to your Parser.
>
>
> On Tue, Apr 1, 2008 at 1:21 PM, Joachim Rosskopf <antlr at b0nz0.de 
> <mailto:antlr at b0nz0.de>> wrote:
>
>     Hello List,
>
>     currently I´m working on a small grammar to build an expression
>     language
>     for an ETL tool. This works very nice for the expression ( e.g
>     #{foo.bar('test')} ) itself. It gets parsed to the desired AST.
>
>     But I´m not able to figure out Lexer/Parser rules, that make it
>     possible
>     to embed the expression in arbitary text (e.g. an URI,
>     http://www.dom.com/#{res.uri()}
>     <http://www.dom.com/#%7Bres.uri%28%29%7D> ). So every character
>     not consumed by
>     the expression should be in one rule.
>
>     Can someone please give me an hint? I attached the grammar.
>     Thank you in advance.
>
>     Best regards
>     ---
>     Joachim
>
>     grammar el;
>
>     options {
>            backtrack=true;
>            output=AST;
>            ASTLabelType=CommonTree;
>            language=CSharp;
>     }
>
>     tokens {
>            OBJECT_IDENTIFIER;
>            LOGICAL_EXPRESSION;
>            FUNCTIONAL_EXPRESSION;
>            VALUE_EXPRESSION;
>            ARGUMENT_LIST;
>     }
>
>     @lexer::namespace {
>            DataPumper.AntlrExpressionLanguage
>     }
>
>     @parser::namespace {
>            DataPumper.AntlrExpressionLanguage
>     }
>
>     statement
>            :       ( options { greedy=true; }  :    EXPRESSION_OPEN!
>     expression EXPRESSION_CLOSE! )+
>            ;
>
>     expression
>            :       functionalExpression            -> ^(
>     FUNCTIONAL_EXPRESSION functionalExpression )
>            |       valueExpression                 -> ^(
>     VALUE_EXPRESSION valueExpression )
>            |       literal
>            ;
>
>     valueExpression
>            :       objectIdentifier
>            ;
>
>
>     functionalExpression
>            :       objectIdentifier BRACE_OPEN! (argumentList)?
>     BRACE_CLOSE!
>            ;
>
>
>     argumentList
>            :       argument (SEMICOLON argument )*         -> ^(
>     ARGUMENT_LIST argument+ )
>            ;
>
>     argument
>            :        ( literal | statement )
>            ;
>
>
>     objectIdentifier
>            :       IDENTIFIER ( '.' IDENTIFIER )* -> ^(
>     OBJECT_IDENTIFIER IDENTIFIER+ )
>            ;
>
>     fragment
>     literal
>            :       HEX_LITERAL             -> ^( HEX_LITERAL )
>            |       DECIMAL_LITERAL         -> ^( DECIMAL_LITERAL )
>            |       OCTAL_LITERAL           -> ^( OCTAL_LITERAL )
>            |       FLOATING_POINT_LITERAL  -> ^( FLOATING_POINT_LITERAL )
>            |       STRING_LITERAL          -> ^( STRING_LITERAL )
>            ;
>
>     IDENTIFIER
>            :       LETTER ( LETTER | '0'..'9')*
>            ;
>
>     fragment
>     LETTER
>            :       'A'..'Z'
>            |       'a'..'z'
>            ;
>
>     HEX_LITERAL
>            :       '0' ('x'|'X') HEX_DIGIT+
>            ;
>
>     DECIMAL_LITERAL
>            :       ('0' | '1'..'9' '0'..'9'*)
>            ;
>
>     OCTAL_LITERAL
>            :       '0' ('0'..'7')+
>            ;
>
>     fragment
>     HEX_DIGIT
>            :       ('0'..'9' | 'a'..'f' | 'A'..'F')
>            ;
>
>
>     FLOATING_POINT_LITERAL
>            :       ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
>            |       '.' ('0'..'9')+ EXPONENT?
>            |       ('0'..'9')+ EXPONENT?
>            ;
>
>     fragment
>     EXPONENT
>            :       ('e'|'E') ('+'|'-')? ('0'..'9')+
>            ;
>
>
>     STRING_LITERAL
>            :       '\'' STRING '\''
>            ;
>
>     fragment
>     STRING
>            :       ( ESCAPESEQ | ~('\'' | '\\') )*
>            ;
>
>     fragment
>     ESCAPESEQ
>            :       '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
>            ;
>
>
>     WS
>            :       (' '|'\r'|'\t'|'\u000C'|'\n') { channel=99; }
>            ;
>
>     SEMICOLON
>            :       ','
>            ;
>
>     EXPRESSION_OPEN
>            :       '#{'
>            ;
>
>     EXPRESSION_CLOSE
>            :       '}'
>            ;
>
>     BRACE_OPEN
>            :       '('
>            ;
>
>     BRACE_CLOSE
>            :       ')'
>            ;
>
>     COMMENT
>            :       '/*' ( options {greedy=false;} : . )* '*/' {
>     channel=99; }
>            ;
>
>     LINE_COMMENT
>            :       '//' ~('\n'|'\r')* '\r'? '\n' { channel=99; }
>            ;
>
>
> --
> Dmitry Gusev



More information about the antlr-interest mailing list