[antlr-interest] Parsing Strings with placeholders

Gavin Lambert antlr at mirality.co.nz
Wed Feb 25 11:04:30 PST 2009


At 23:16 25/02/2009, Joern Gebhardt wrote:
>a = "wonderful"
>b = "The weather is ${a}."
>
>The usage of these placeholder variables is only allowed inside 
>of STRING expressions.
[...]
>My question is now, how do I define the lexer/parser rules in an 
>intelligent way so that I can easily replace the placeholders by 
>their content?

Personally, I wouldn't bother altering the rules -- just lex it 
exactly as you did before (as a monolithic string) and then at 
parse or tree-walk time put in some custom code to find and 
replace the placeholders.

>I tried this:
>
>IDENTIFIER
>     : ('_' | 'a'..'z' | 'A'..'Z' ) ( '_' | 'a'..'z' | 'A'..'Z' | 
> '1'..'9' )*
>   ;
>
>STRING
>     :    '"' ( LITERAL | PLACEHOLDER )* '"'
>     ;
>
>LITERAL
>     :    (  EscapeSequence | ~( '\\' | '"' | '\r' | '\n'  )  )*
>     ;
>
>
>fragment
>EscapeSequence
>     :   '\\' ( 'b' |  't'  |   'n'  |   'f'  |   'r'  |   '\"' 
> |   '\''  |   '\\'   |  ('0'..'3') ('0'..'7') ('0'..'7')  | 
> ('0'..'7') ('0'..'7')  |  ('0'..'7')  )
>     ;
>
>PLACEHOLDER
>     :    '$' IDENTIFIER
>     |    '${' IDENTIFIER '}'
>     ;
>
>However, now the Lexer has no idea that a "LITERAL" can only 
>exist inside a STRING and the matching for the above rules is not 
>unambiguous any more.

Your LITERAL and PLACEHOLDER rules should be fragments as well, 
since you don't want them being matched at the top level.



More information about the antlr-interest mailing list