[antlr-interest] Expression embedded in arbitary Text

Tue Apr 1 04:55:17 PDT 2008

On Tue, Apr 1, 2008 at 3:45 PM, Joachim Rosskopf <antlr at b0nz0.de> wrote:

> Hello Drmitry,
>
> that was the approach I was using previously. I parsed the expressions
> soley with regex. But that was getting pretty ugly and doesn´t work well
> with nested as well as suceeding expression. So I searched for something
> like:
>
> statement
>    :    ( options { greedy=false; } :  . )+
>     |    ( options { greedy=true; }  :   EXPRESSION_OPEN! expression
> EXPRESSION_CLOSE!  )+
>    ;
>
> But the point '.' in the above example stands for any defined lexer rule
> and not any character as I would like to have.
> Is that possible with antlr?

You can always declare a token that match any char you want and refer to
that token

>
> Regards
> ---
> Joachim
>
> Dmitry Gusev schrieb:
> > I'd recommend you to use Regular expressions to extract the "#{bla}"
> > things.
> >
> > Then you'll be able to use these match results as an input to your
> Parser.
> >
> >
> > On Tue, Apr 1, 2008 at 1:21 PM, Joachim Rosskopf <antlr at b0nz0.de
> > <mailto:antlr at b0nz0.de>> wrote:
> >
> >     Hello List,
> >
> >     currently I´m working on a small grammar to build an expression
> >     language
> >     for an ETL tool. This works very nice for the expression ( e.g
> >     #{foo.bar('test')} ) itself. It gets parsed to the desired AST.
> >
> >     But I´m not able to figure out Lexer/Parser rules, that make it
> >     possible
> >     to embed the expression in arbitary text (e.g. an URI,
> >     http://www.dom.com/#{res.uri()}<http://www.dom.com/#%7Bres.uri%28%29%7D>
> >     <http://www.dom.com/#%7Bres.uri%28%29%7D> ). So every character
> >     not consumed by
> >     the expression should be in one rule.
> >
> >     Can someone please give me an hint? I attached the grammar.
> >     Thank you in advance.
> >
> >     Best regards
> >     ---
> >     Joachim
> >
> >     grammar el;
> >
> >     options {
> >            backtrack=true;
> >            output=AST;
> >            ASTLabelType=CommonTree;
> >            language=CSharp;
> >     }
> >
> >     tokens {
> >            OBJECT_IDENTIFIER;
> >            LOGICAL_EXPRESSION;
> >            FUNCTIONAL_EXPRESSION;
> >            VALUE_EXPRESSION;
> >            ARGUMENT_LIST;
> >     }
> >
> >     @lexer::namespace {
> >            DataPumper.AntlrExpressionLanguage
> >     }
> >
> >     @parser::namespace {
> >            DataPumper.AntlrExpressionLanguage
> >     }
> >
> >     statement
> >            :       ( options { greedy=true; }  :    EXPRESSION_OPEN!
> >     expression EXPRESSION_CLOSE! )+
> >            ;
> >
> >     expression
> >            :       functionalExpression            -> ^(
> >     FUNCTIONAL_EXPRESSION functionalExpression )
> >            |       valueExpression                 -> ^(
> >     VALUE_EXPRESSION valueExpression )
> >            |       literal
> >            ;
> >
> >     valueExpression
> >            :       objectIdentifier
> >            ;
> >
> >
> >     functionalExpression
> >            :       objectIdentifier BRACE_OPEN! (argumentList)?
> >     BRACE_CLOSE!
> >            ;
> >
> >
> >     argumentList
> >            :       argument (SEMICOLON argument )*         -> ^(
> >     ARGUMENT_LIST argument+ )
> >            ;
> >
> >     argument
> >            :        ( literal | statement )
> >            ;
> >
> >
> >     objectIdentifier
> >            :       IDENTIFIER ( '.' IDENTIFIER )* -> ^(
> >     OBJECT_IDENTIFIER IDENTIFIER+ )
> >            ;
> >
> >     fragment
> >     literal
> >            :       HEX_LITERAL             -> ^( HEX_LITERAL )
> >            |       DECIMAL_LITERAL         -> ^( DECIMAL_LITERAL )
> >            |       OCTAL_LITERAL           -> ^( OCTAL_LITERAL )
> >            |       FLOATING_POINT_LITERAL  -> ^( FLOATING_POINT_LITERAL
> )
> >            |       STRING_LITERAL          -> ^( STRING_LITERAL )
> >            ;
> >
> >     IDENTIFIER
> >            :       LETTER ( LETTER | '0'..'9')*
> >            ;
> >
> >     fragment
> >     LETTER
> >            :       'A'..'Z'
> >            |       'a'..'z'
> >            ;
> >
> >     HEX_LITERAL
> >            :       '0' ('x'|'X') HEX_DIGIT+
> >            ;
> >
> >     DECIMAL_LITERAL
> >            :       ('0' | '1'..'9' '0'..'9'*)
> >            ;
> >
> >     OCTAL_LITERAL
> >            :       '0' ('0'..'7')+
> >            ;
> >
> >     fragment
> >     HEX_DIGIT
> >            :       ('0'..'9' | 'a'..'f' | 'A'..'F')
> >            ;
> >
> >
> >     FLOATING_POINT_LITERAL
> >            :       ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
> >            |       '.' ('0'..'9')+ EXPONENT?
> >            |       ('0'..'9')+ EXPONENT?
> >            ;
> >
> >     fragment
> >     EXPONENT
> >            :       ('e'|'E') ('+'|'-')? ('0'..'9')+
> >            ;
> >
> >
> >     STRING_LITERAL
> >            :       '\'' STRING '\''
> >            ;
> >
> >     fragment
> >     STRING
> >            :       ( ESCAPESEQ | ~('\'' | '\\') )*
> >            ;
> >
> >     fragment
> >     ESCAPESEQ
> >            :       '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
> >            ;
> >
> >
> >     WS
> >            :       (' '|'\r'|'\t'|'\u000C'|'\n') { channel=99; }
> >            ;
> >
> >     SEMICOLON
> >            :       ','
> >            ;
> >
> >     EXPRESSION_OPEN
> >            :       '#{'
> >            ;
> >
> >     EXPRESSION_CLOSE
> >            :       '}'
> >            ;
> >
> >     BRACE_OPEN
> >            :       '('
> >            ;
> >
> >     BRACE_CLOSE
> >            :       ')'
> >            ;
> >
> >     COMMENT
> >            :       '/*' ( options {greedy=false;} : . )* '*/' {
> >     channel=99; }
> >            ;
> >
> >     LINE_COMMENT
> >            :       '//' ~('\n'|'\r')* '\r'? '\n' { channel=99; }
> >            ;
> >
> >
> > --
> > Dmitry Gusev
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080401/aa7fdf5d/attachment.html