[antlr-interest] Scanning Perl-style quoted strings q{foo{bar}quux}?!
Ralf S. Engelschall
rse+antlr-interest at engelschall.com
Thu Jul 30 01:13:18 PDT 2009
On Wed, Jul 29, 2009, David-Sarah Hopwood wrote:
> [...]
> (I'm assuming, without knowing Perl very well, that only the delimiters
> that appear on the "outside" have to nest, e.g. q{foo[{bar}quux} is valid.)
Yes, exactly. Only the "X" in qX...X have to nest correctly. Anything
between is talen as-is.
> > Remains the question: what is the best way to implement this in ANTLR 3?
>
> Remember that lexer rules can be recursive, so you don't have to explicitly
> keep track of nesting depth. The following approach (untested) is more
> declarative, and incidentally avoids the problem you encountered:
>
> QSTRING
> : 'q' ( AngleQS | BraceQS | BrackQS | ParenQS | SlashQS | BangQS ) ;
>
> fragment AngleQS
> : '<' ( AngleQS | ~('<' | '>') )* '>' ;
>
> fragment BraceQS
> : '{' ( BraceQS | ~('{' | '}') )* '}' ;
>
> fragment BrackQS
> : '[' ( BrackQS | ~('[' | ']') )* ']' ;
>
> fragment ParenQS
> : '[' ( ParenQS | ~('[' | ']') )* ']' ;
>
> fragment SlashQS
> : '/' ( SlashQS | ~'/' )* '/' ;
>
> fragment BangQS
> : '!' ( BangQS | ~'!' )* '!' ;
Hmmmm.... interesting approach. Many thanks for the hint about the
recursion possibility in lexer rules.
Remains just the problem that although the opening/closing characters
are the fixed set of 4 pairs, the "/" and "!" were just examples.
Actually any other punctuation character can be used, also for instance
q%...%, q=....=, etc. But here the semantic predicates can help again, I
think. My current solution is now:
/* Perl-style quoted string */
QSTRING : 'q' (QS_ANGLE | QS_BRACE | QS_BRACK | QS_PAREN | QS_OTHER);
fragment QS_ANGLE : '<' (('\\' '<') => '\\' '<' | QS_ANGLE | ~('<' | '>'))* '>';
fragment QS_BRACE : '{' (('\\' '{') => '\\' '{' | QS_BRACE | ~('{' | '}'))* '}';
fragment QS_BRACK : '[' (('\\' '[') => '\\' '[' | QS_BRACK | ~('[' | '}'))* ']';
fragment QS_PAREN : '(' (('\\' '(') => '\\' '(' | QS_PAREN | ~('(' | ')'))* ')';
fragment QS_OTHER_CH: ~('<'|'>'|'{'|'}'|'['|']'|'('|')'|'a'..'z'|'A'..'Z'|'0'..'9');
fragment QS_OTHER : delimiter=QS_OTHER_CH
( '\\' { input.LT(1) == $delimiter.text.charAt(0) }? => .
| { input.LT(1) != $delimiter.text.charAt(0) }? => .
)*
{ input.LT(1) == $delimiter.text.charAt(0) }? => .;
This already correctly recognizes all qX...X constructs. I now just have
to filter out the escape sequences and remove the leading qX and the
trailing X...
Ralf S. Engelschall
rse at engelschall.com
www.engelschall.com
More information about the antlr-interest
mailing list