[antlr-interest] Scanning Perl-style quoted strings q{foo{bar}quux}?!

Ralf S. Engelschall rse+antlr-interest at engelschall.com
Wed Jul 29 14:07:45 PDT 2009


To a small ANTLR-based expression language I would like to add
Perl-style quoted strings:

    q{foo{bar}quux}
    q(foo(bar)quux)
    q!foo/bar/quux!

For those who don't know these constructs: it is a variant of
non-interpolating strings where one doesn't have to quote the quote
character. And in case of one of the open/close pairs of quote
characters ("(" + ")", "<" + ">", "[" + "]" and "{" + "}") one can even
nest them without escaping (as long as there as the nesting is correct,
i.e., equal number of open and close characters).

Remains the question: what is the best way to implement this in ANTLR 3?

Although my general knowledge about parsing is good, I'm not an ANTLR
expert. Hence, my naive and best attempt with ANTLR current is (trying
to leverage ANTLR predicates):

QSTRING    @init { int n = 0; }
           : 'q'
             ( options { greedy=false; }:
               open=('<'|'{'|'['|'('|'/'|'!')
               ( { (   $open == '<' && input.LT(1) == '<'
                    || $open == '{' && input.LT(1) == '{'
                    || $open == '[' && input.LT(1) == '['
                    || $open == '(' && input.LT(1) == '(')
                 }? . { n++; }
               | { (   $open == '<' && input.LT(1) == '>'
                    || $open == '{' && input.LT(1) == '}'
                    || $open == '[' && input.LT(1) == ']'
                    || $open == '(' && input.LT(1) == ')') && n > 0
                 }? . { n--; }
               | { (   $open == '<' && input.LT(1) != '>'
                    || $open == '{' && input.LT(1) != '}'
                    || $open == '[' && input.LT(1) != ']'
                    || $open == '(' && input.LT(1) != ')'
                    ||                 input.LT(1) != $open)
                 }? .
               )*
               ( { $open == '<' }?         '>'
               | { $open == '{' }?         '}'
               | { $open == '[' }?         ']'
               | { $open == '(' }?         ')'
               | { input.LT(1) == $open }? .
               )
             )
           ;

The ANTLR 3.1.3 generation process is happy about this, but the
resulting Java code cannot be compiled because some symbols are not
available:

| $ make
| [generate] SCLLexer.java SCLParser.java <- SCL.g
| [compile] SCLLexer.class <- SCLLexer.java
| SCLLexer.java:2210: cannot find symbol
| symbol  : variable open
| location: class SCLLexer.DFA11
|                         if ( (( (   open == '<' && input.LT(1) == '<'
|                                     ^
| SCLLexer.java:2210: cannot find symbol
| symbol  : method LT(int)
| location: interface org.antlr.runtime.IntStream
|                         if ( (( (   open == '<' && input.LT(1) == '<'
|                                                         ^
| SCLLexer.java:2211: cannot find symbol

The reason seems to be because ANTLR puts some of the stuff into an
own Java class (for a sub-DFA?) and this cannot access the "open" and
"input" variables. Is there a workaround?

But ok, perhaps I'm totally wrong with my whole solution attempt at
all. Perhaps those ANTLR semantic predicates would not even work the
way I expect them to work. Perhaps there is a lot easier approach in
scanning and recognizing those Perl-style quoted strings. Has anybody
any hints? Perhaps scanning the input tokens with an embedded Java-only
loop construct?
                                       Ralf S. Engelschall
                                       rse at engelschall.com
                                       www.engelschall.com



More information about the antlr-interest mailing list