[antlr-interest] Handling explicit continuation characters
Gavin Lambert
antlr at mirality.co.nz
Tue Jan 13 12:35:08 PST 2009
At 04:49 14/01/2009, Brisard, Fred D wrote:
>I'm not concerned about the line count - in fact, I want to know
>which physical line a token is located for subsequent
regeneration
>of the source. I'm using this for a "syntax directed"
editor. I
>just want to absorb the continuations quietly.
That's kinda what I was getting at though. I'm not sure whether
it's the lexer or the stream that maintains the line count -- if
it's the stream, then a stream replacement solution is definitely
the way to go, I think. If it's the lexer, though, then using a
stream replacement will mean that you'll get the line numbers
*after continuations are removed*, which will be different than
the lines in the source file (and hence useless).
>I still can't figure out how to handle the case where
continuation
>characters (- and +) are embedded in prior to the end of
line. A +
>or - is only a continuation if the following character is an end
of
>line. If this isn't true, then the + or - is a valid character
in
>an token.
[...]
>The - at the end of the verylongparm is absorbed as part of the
ID
>token.
>
>The above works OK if there's WS between the last token and the
-,
>but that't not the syntax I have to conform to.
As Johannes said, using a modified stream is definitely the
easiest way to go here.
Otherwise, you'll need to modify your rules such that they refuse
to match a - or + if it's followed by a newline. The issue is
that once ANTLR is *inside* a lexer rule, it will continue
consuming characters as long as that rule alone is happy to do so
-- it doesn't consider the possibility of stopping earlier and
matching some other rule instead. (Which at times like this can
be annoying, but it's also what allows comments and island
grammars to work.)
One way of doing this would be to modify your rules like so:
CONTINUEPLUS: '+' '\r'? '\n' { $channel = HIDDEN; };
CONTINUEMINUS: '-' '\r'? '\n' { $channel = HIDDEN; };
fragment
Special
: '_' | '='
| { input.LA(2) != '\r' && input.LA(2) != '\n' }? => ('+' | '-')
| '/' | '\\'
| ':' | ';'
| '<' | '>'
| '.' | ',' | '?' | '!'
| '~' | '%' | '^' | '&' | '*'
| '{' | '}' | '[' | ']' | '|'
;
More information about the antlr-interest
mailing list