[antlr-interest] Handling explicit continuation characters

Johannes Luber JALuber at gmx.de
Tue Jan 13 08:09:38 PST 2009


> Thanks for the suggestions on this issue.
> 
> I'm not concerned about the line count - in fact, I want to know which
> physical line a token is located for subsequent regeneration of the source. 
> I'm using this for a "syntax directed" editor.  I just want to absorb the
> continuations quietly.
> 
> I still can't figure out how to handle the case where continuation
> characters (- and +) are embedded in prior to the end of line.  A + or - is only a
> continuation if the following character is an end of line.  If this isn't
> true, then the + or - is a valid character in an token.  
> 
> My lexer rules look like this --
> 
> /*
> 	LEXER RULES
> */
> 
> ID	: Any+
> 	| Quote (Any | Blank)* Quote
> 	;
> 
> fragment
> Blank	: ' '
> 	;
> 
> fragment
> Any	:( AlphaNum | Special | NATL  )
> 	;
> 
> fragment
> Quote	:	'\''
> 	;
> fragment
> Special :	'_' | '-' | '=' | '+'
> 	|	'/' | '\\'
> 	|	':' | ';'
> 	|	'<' | '>'
> 	|	'.' | ',' | '?' | '!'
> 	|	'~' | '%' | '^' | '&' | '*'
> 	|	'{' | '}' | '[' | ']' | '|'
> 	;
> 
> fragment
> AlphaNum:	ALPHA|DIGIT;
> 
> 
> fragment
> DIGIT   : 	('0'..'9');
> 
> fragment
> ALPHA
> 	: 	('a'..'z'|'A'..'Z')
>         ;
> 
> fragment
> NATL 	:  	( '$' | '#' | '@')
>         ;
> 
> EOS	:
> 	(	'\r'
> 	|	'\n'
> 	)+
> 	;
> 
> CONTINUEMINUS
> 	:	'-\r'
> 	|	'-\n'
> 	|	'-\r\n'
> 	{ $channel=HIDDEN; }
> 	;
> 
> CONTINUEPLUS
> 	:	'+\r'
> 	|	'+\n'
> 	|	'+\r\n'
> 	{ $channel=HIDDEN; }
> 	;
> 
> WS  	:
> 	(   	' '
>         |   	'\t'
>         )+
>         { $channel=HIDDEN; }
>     	;
> 
> COMMENT
> 	: '/*' (options {greedy=false;} : . )* '*/'
> 	{ $channel=HIDDEN; }
> 	;
> 
> I have a problem when I have a statement like the following --
> 
> Cmd parm1 parm2 verylong-
> parm
> 
> The - at the end of the verylongparm is absorbed as part of the ID token. 
> 
> The above works OK if there's WS between the last token and the -, but
> that't not the syntax I have to conform to.
> 
> Thanks for any additional feedback.

IMO, only a modified stream can make this work right. Unless you want to merge/modify the tokens in the parser which is more work and potentially buggy than doing the stream approach (especially if you add an option to the program which allows to swap the modifed stream class with the normal stream class to see if a bug is in the stream or in the grammar).

Johannes
> 
> -----Original Message-----
> From: Johannes Luber [mailto:JALuber at gmx.de] 
> Sent: Tuesday, January 13, 2009 9:13 AM
> To: Gavin Lambert; Brisard, Fred D; antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Handling explicit continuation characters
> 
> > At 21:05 13/01/2009, Johannes Luber wrote:
> >  >Wouldn't it be easier to create an own StringStream (dreived 
> > from
> >  >ANTLRStringStream) which silently swallows the + and - as well 
> > the
> >  >following newline? Then both lexer and parser are cleaner.
> > 
> > That's certainly a possibility (and perhaps a good one), but 
> > that'd end up screwing up the line numbering, wouldn't it?
> 
> I don't see, why swallowing the two characters would prevent increasing
> the line count. Tokens receive the line number from the stream itself and not
> because the lexer counts newlines.
> 
> Johannes
> > 
> > 
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe:
> > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> 
> -- 
> Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen:
> http://www.gmx.net/de/go/multimessenger

-- 
Sensationsangebot verlängert: GMX FreeDSL - Telefonanschluss + DSL 
für nur 16,37 Euro/mtl.!* http://dsl.gmx.de/?ac=OM.AD.PD003K1308T4569a


More information about the antlr-interest mailing list