[antlr-interest] Handling explicit continuation characters

Brisard, Fred D Fred.Brisard at ca.com
Tue Jan 13 07:49:02 PST 2009


Thanks for the suggestions on this issue.

I'm not concerned about the line count - in fact, I want to know which physical line a token is located for subsequent regeneration of the source.  I'm using this for a "syntax directed" editor.  I just want to absorb the continuations quietly.

I still can't figure out how to handle the case where continuation characters (- and +) are embedded in prior to the end of line.  A + or - is only a continuation if the following character is an end of line.  If this isn't true, then the + or - is a valid character in an token.  

My lexer rules look like this --

/*
	LEXER RULES
*/

ID	: Any+
	| Quote (Any | Blank)* Quote
	;

fragment
Blank	: ' '
	;

fragment
Any	:( AlphaNum | Special | NATL  )
	;

fragment
Quote	:	'\''
	;
fragment
Special :	'_' | '-' | '=' | '+'
	|	'/' | '\\'
	|	':' | ';'
	|	'<' | '>'
	|	'.' | ',' | '?' | '!'
	|	'~' | '%' | '^' | '&' | '*'
	|	'{' | '}' | '[' | ']' | '|'
	;

fragment
AlphaNum:	ALPHA|DIGIT;


fragment
DIGIT   : 	('0'..'9');

fragment
ALPHA
	: 	('a'..'z'|'A'..'Z')
        ;

fragment
NATL 	:  	( '$' | '#' | '@')
        ;

EOS	:
	(	'\r'
	|	'\n'
	)+
	;

CONTINUEMINUS
	:	'-\r'
	|	'-\n'
	|	'-\r\n'
	{ $channel=HIDDEN; }
	;

CONTINUEPLUS
	:	'+\r'
	|	'+\n'
	|	'+\r\n'
	{ $channel=HIDDEN; }
	;

WS  	:
	(   	' '
        |   	'\t'
        )+
        { $channel=HIDDEN; }
    	;

COMMENT
	: '/*' (options {greedy=false;} : . )* '*/'
	{ $channel=HIDDEN; }
	;

I have a problem when I have a statement like the following --

Cmd parm1 parm2 verylong-
parm

The - at the end of the verylongparm is absorbed as part of the ID token.  

The above works OK if there's WS between the last token and the -, but that't not the syntax I have to conform to.

Thanks for any additional feedback.

-----Original Message-----
From: Johannes Luber [mailto:JALuber at gmx.de] 
Sent: Tuesday, January 13, 2009 9:13 AM
To: Gavin Lambert; Brisard, Fred D; antlr-interest at antlr.org
Subject: Re: [antlr-interest] Handling explicit continuation characters

> At 21:05 13/01/2009, Johannes Luber wrote:
>  >Wouldn't it be easier to create an own StringStream (dreived 
> from
>  >ANTLRStringStream) which silently swallows the + and - as well 
> the
>  >following newline? Then both lexer and parser are cleaner.
> 
> That's certainly a possibility (and perhaps a good one), but 
> that'd end up screwing up the line numbering, wouldn't it?

I don't see, why swallowing the two characters would prevent increasing the line count. Tokens receive the line number from the stream itself and not because the lexer counts newlines.

Johannes
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger



More information about the antlr-interest mailing list