[antlr-interest] my code is marked with start/end tokens

java nagila java.nagila at gmail.com
Fri Oct 10 12:42:51 PDT 2008


hi all,
i'm new to antlr, and my first real language has this thing where the code
could be divided to many parts in arbitrary text.

here is an example input, where the code (java in this case) comes in parts
and marked by start/end tokens ( '(|' and  '|)' ):

********
bla bla bla
********
(|
import foo.bar.*;
|)
bla etc.
(| import com.example.*; |)
comment comment arbitrary text bla bla
(|
/* My Class C */
class C {
    // one line comment
}
|)
bla bla bla

i've tried many way to hide the text around the actual code, a-la the
non-greedy comment rule, but couldn't find the right way.

note that:
    1. i _really_ don't care what's outside of the '(|' .* '|)' code blocks.
    it could be another language with my code in a comment, for all i
care. (e.g. "<a><b><!-- (| class C {} |) --></b></a>").
    this also means i'd like the lexer to hide it and not gather it in a
parser rule.

    2. my code could be divided to these parts only in well known places
(parser rules), so, in contrast to #1 above, i really care about where the
'(|' '|)' tokens are (and so i guess that my parser should be aware of
them).
    if my language was java, as in the example above, i'd allow only
importDeclarations or classDefinitions between the start/end tokens.

    3. other than that, my language is quite normal - i hide WS as usual, i
have only one line comments and multiple files are loaded with an 'import'
(and not an '#include'). oh, i have #IFDEFs, but they can't span over more
than one '(|...|)' part.

help anyone?

thanks, asaf :-)
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20081010/b9a17fbd/attachment.html 


More information about the antlr-interest mailing list