[antlr-interest] Parsing Strings with placeholders
Johannes Luber
JALuber at gmx.de
Wed Feb 25 03:47:30 PST 2009
> Hi,
>
> my language is able to define and use some variables/placeholders similar
> to
> the UNIX shell scripts:
>
> a = "wonderful"
> b = "The weather is ${a}."
>
> The usage of these placeholder variables is only allowed inside of STRING
> expressions.
>
> My question is now, how do I define the lexer/parser rules in an
> intelligent
> way so that I can easily replace the placeholders by their content?
I think using island-grammars may be the solution. See <http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control> as well the mentioned example.
Johannes
>
> Without the placeholders my STRING lexer rules looks like this:
>
> STRING
> : '"' ( EscapeSequence | ~( '\\' | '"' | '\r' | '\n' ) )* '"'
> ;
>
> fragment
> EscapeSequence
> : '\\' ( 'b' | 't' | 'n' | 'f' | 'r' | '\"' | '\'' |
> '\\' | ('0'..'3') ('0'..'7') ('0'..'7') | ('0'..'7') ('0'..'7') |
> ('0'..'7') )
> ;
>
> Can anybody please give me a hint how I get the placeholders inside of
> that?
>
> I tried this:
>
> IDENTIFIER
> : ('_' | 'a'..'z' | 'A'..'Z' ) ( '_' | 'a'..'z' | 'A'..'Z' | '1'..'9'
> )*
> ;
>
> STRING
> : '"' ( LITERAL | PLACEHOLDER )* '"'
> ;
>
> LITERAL
> : ( EscapeSequence | ~( '\\' | '"' | '\r' | '\n' ) )*
> ;
>
>
> fragment
> EscapeSequence
> : '\\' ( 'b' | 't' | 'n' | 'f' | 'r' | '\"' | '\'' |
> '\\' | ('0'..'3') ('0'..'7') ('0'..'7') | ('0'..'7') ('0'..'7') |
> ('0'..'7') )
> ;
>
> PLACEHOLDER
> : '$' IDENTIFIER
> | '${' IDENTIFIER '}'
> ;
>
> However, now the Lexer has no idea that a "LITERAL" can only exist inside
> a
> STRING and the matching for the above rules is not unambiguous any more.
>
> Thanks in advance for any useful hints,
> Joe
--
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger01
More information about the antlr-interest
mailing list