[antlr-interest] Handling explicit continuation characters

Tue Jan 13 12:35:08 PST 2009

At 04:49 14/01/2009, Brisard, Fred D wrote:
 >I'm not concerned about the line count - in fact, I want to know 

 >which physical line a token is located for subsequent 
regeneration
 >of the source.  I'm using this for a "syntax directed" 
editor.  I
 >just want to absorb the continuations quietly.

That's kinda what I was getting at though.  I'm not sure whether 
it's the lexer or the stream that maintains the line count -- if 
it's the stream, then a stream replacement solution is definitely 
the way to go, I think.  If it's the lexer, though, then using a 
stream replacement will mean that you'll get the line numbers 
*after continuations are removed*, which will be different than 
the lines in the source file (and hence useless).

 >I still can't figure out how to handle the case where 
continuation
 >characters (- and +) are embedded in prior to the end of 
line.  A +
 >or - is only a continuation if the following character is an end 
of
 >line.  If this isn't true, then the + or - is a valid character 
in
 >an token.
[...]
 >The - at the end of the verylongparm is absorbed as part of the 
ID
 >token.
 >
 >The above works OK if there's WS between the last token and the 
-,
 >but that't not the syntax I have to conform to.

As Johannes said, using a modified stream is definitely the 
easiest way to go here.

Otherwise, you'll need to modify your rules such that they refuse 
to match a - or + if it's followed by a newline.  The issue is 
that once ANTLR is *inside* a lexer rule, it will continue 
consuming characters as long as that rule alone is happy to do so 
-- it doesn't consider the possibility of stopping earlier and 
matching some other rule instead.  (Which at times like this can 
be annoying, but it's also what allows comments and island 
grammars to work.)

One way of doing this would be to modify your rules like so:

CONTINUEPLUS: '+' '\r'? '\n' { $channel = HIDDEN; };
CONTINUEMINUS: '-' '\r'? '\n' { $channel = HIDDEN; };

fragment
Special
  : '_' | '='
  | { input.LA(2) != '\r' && input.LA(2) != '\n' }? => ('+' | '-')
	|	'/' | '\\'
	|	':' | ';'
	|	'<' | '>'
	|	'.' | ',' | '?' | '!'
	|	'~' | '%' | '^' | '&' | '*'
	|	'{' | '}' | '[' | ']' | '|'
	;