[antlr-interest] Expression embedded in arbitary Text
Joachim Rosskopf
antlr at b0nz0.de
Tue Apr 1 04:45:29 PDT 2008
Hello Drmitry,
that was the approach I was using previously. I parsed the expressions
soley with regex. But that was getting pretty ugly and doesn´t work well
with nested as well as suceeding expression. So I searched for something
like:
statement
: ( options { greedy=false; } : . )+
| ( options { greedy=true; } : EXPRESSION_OPEN! expression
EXPRESSION_CLOSE! )+
;
But the point '.' in the above example stands for any defined lexer rule
and not any character as I would like to have.
Is that possible with antlr?
Regards
---
Joachim
Dmitry Gusev schrieb:
> I'd recommend you to use Regular expressions to extract the "#{bla}"
> things.
>
> Then you'll be able to use these match results as an input to your Parser.
>
>
> On Tue, Apr 1, 2008 at 1:21 PM, Joachim Rosskopf <antlr at b0nz0.de
> <mailto:antlr at b0nz0.de>> wrote:
>
> Hello List,
>
> currently I´m working on a small grammar to build an expression
> language
> for an ETL tool. This works very nice for the expression ( e.g
> #{foo.bar('test')} ) itself. It gets parsed to the desired AST.
>
> But I´m not able to figure out Lexer/Parser rules, that make it
> possible
> to embed the expression in arbitary text (e.g. an URI,
> http://www.dom.com/#{res.uri()}
> <http://www.dom.com/#%7Bres.uri%28%29%7D> ). So every character
> not consumed by
> the expression should be in one rule.
>
> Can someone please give me an hint? I attached the grammar.
> Thank you in advance.
>
> Best regards
> ---
> Joachim
>
> grammar el;
>
> options {
> backtrack=true;
> output=AST;
> ASTLabelType=CommonTree;
> language=CSharp;
> }
>
> tokens {
> OBJECT_IDENTIFIER;
> LOGICAL_EXPRESSION;
> FUNCTIONAL_EXPRESSION;
> VALUE_EXPRESSION;
> ARGUMENT_LIST;
> }
>
> @lexer::namespace {
> DataPumper.AntlrExpressionLanguage
> }
>
> @parser::namespace {
> DataPumper.AntlrExpressionLanguage
> }
>
> statement
> : ( options { greedy=true; } : EXPRESSION_OPEN!
> expression EXPRESSION_CLOSE! )+
> ;
>
> expression
> : functionalExpression -> ^(
> FUNCTIONAL_EXPRESSION functionalExpression )
> | valueExpression -> ^(
> VALUE_EXPRESSION valueExpression )
> | literal
> ;
>
> valueExpression
> : objectIdentifier
> ;
>
>
> functionalExpression
> : objectIdentifier BRACE_OPEN! (argumentList)?
> BRACE_CLOSE!
> ;
>
>
> argumentList
> : argument (SEMICOLON argument )* -> ^(
> ARGUMENT_LIST argument+ )
> ;
>
> argument
> : ( literal | statement )
> ;
>
>
> objectIdentifier
> : IDENTIFIER ( '.' IDENTIFIER )* -> ^(
> OBJECT_IDENTIFIER IDENTIFIER+ )
> ;
>
> fragment
> literal
> : HEX_LITERAL -> ^( HEX_LITERAL )
> | DECIMAL_LITERAL -> ^( DECIMAL_LITERAL )
> | OCTAL_LITERAL -> ^( OCTAL_LITERAL )
> | FLOATING_POINT_LITERAL -> ^( FLOATING_POINT_LITERAL )
> | STRING_LITERAL -> ^( STRING_LITERAL )
> ;
>
> IDENTIFIER
> : LETTER ( LETTER | '0'..'9')*
> ;
>
> fragment
> LETTER
> : 'A'..'Z'
> | 'a'..'z'
> ;
>
> HEX_LITERAL
> : '0' ('x'|'X') HEX_DIGIT+
> ;
>
> DECIMAL_LITERAL
> : ('0' | '1'..'9' '0'..'9'*)
> ;
>
> OCTAL_LITERAL
> : '0' ('0'..'7')+
> ;
>
> fragment
> HEX_DIGIT
> : ('0'..'9' | 'a'..'f' | 'A'..'F')
> ;
>
>
> FLOATING_POINT_LITERAL
> : ('0'..'9')+ '.' ('0'..'9')* EXPONENT?
> | '.' ('0'..'9')+ EXPONENT?
> | ('0'..'9')+ EXPONENT?
> ;
>
> fragment
> EXPONENT
> : ('e'|'E') ('+'|'-')? ('0'..'9')+
> ;
>
>
> STRING_LITERAL
> : '\'' STRING '\''
> ;
>
> fragment
> STRING
> : ( ESCAPESEQ | ~('\'' | '\\') )*
> ;
>
> fragment
> ESCAPESEQ
> : '\\' ('b'|'t'|'n'|'f'|'r'|'\"'|'\''|'\\')
> ;
>
>
> WS
> : (' '|'\r'|'\t'|'\u000C'|'\n') { channel=99; }
> ;
>
> SEMICOLON
> : ','
> ;
>
> EXPRESSION_OPEN
> : '#{'
> ;
>
> EXPRESSION_CLOSE
> : '}'
> ;
>
> BRACE_OPEN
> : '('
> ;
>
> BRACE_CLOSE
> : ')'
> ;
>
> COMMENT
> : '/*' ( options {greedy=false;} : . )* '*/' {
> channel=99; }
> ;
>
> LINE_COMMENT
> : '//' ~('\n'|'\r')* '\r'? '\n' { channel=99; }
> ;
>
>
> --
> Dmitry Gusev
More information about the antlr-interest
mailing list