[antlr-interest] Parsing Strings with placeholders

Joern Gebhardt joern.gebhardt at gmail.com
Wed Feb 25 02:16:51 PST 2009


Hi,

my language is able to define and use some variables/placeholders similar to
the UNIX shell scripts:

a = "wonderful"
b = "The weather is ${a}."

The usage of these placeholder variables is only allowed inside of STRING
expressions.

My question is now, how do I define the lexer/parser rules in an intelligent
way so that I can easily replace the placeholders by their content?

Without the placeholders my STRING lexer rules looks like this:

STRING
    :   '"' (  EscapeSequence | ~( '\\' | '"' | '\r' | '\n'  )  )*  '"'
    ;

fragment
EscapeSequence
    :   '\\' ( 'b' |  't'  |   'n'  |   'f'  |   'r'  |   '\"' |   '\''  |
'\\'   |  ('0'..'3') ('0'..'7') ('0'..'7')  | ('0'..'7') ('0'..'7')  |
('0'..'7')  )
    ;

Can anybody please give me a hint how I get the placeholders inside of that?

I tried this:

IDENTIFIER
    : ('_' | 'a'..'z' | 'A'..'Z' ) ( '_' | 'a'..'z' | 'A'..'Z' | '1'..'9' )*
  ;

STRING
    :    '"' ( LITERAL | PLACEHOLDER )* '"'
    ;

LITERAL
    :    (  EscapeSequence | ~( '\\' | '"' | '\r' | '\n'  )  )*
    ;


fragment
EscapeSequence
    :   '\\' ( 'b' |  't'  |   'n'  |   'f'  |   'r'  |   '\"' |   '\''  |
'\\'   |  ('0'..'3') ('0'..'7') ('0'..'7')  | ('0'..'7') ('0'..'7')  |
('0'..'7')  )
    ;

PLACEHOLDER
    :    '$' IDENTIFIER
    |    '${' IDENTIFIER '}'
    ;

However, now the Lexer has no idea that a "LITERAL" can only exist inside a
STRING and the matching for the above rules is not unambiguous any more.

Thanks in advance for any useful hints,
Joe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090225/653ac88b/attachment.html 


More information about the antlr-interest mailing list