[antlr-interest] Handling explicit continuation characters
Brisard, Fred D
Fred.Brisard at ca.com
Tue Jan 13 07:49:02 PST 2009
Thanks for the suggestions on this issue.
I'm not concerned about the line count - in fact, I want to know which physical line a token is located for subsequent regeneration of the source. I'm using this for a "syntax directed" editor. I just want to absorb the continuations quietly.
I still can't figure out how to handle the case where continuation characters (- and +) are embedded in prior to the end of line. A + or - is only a continuation if the following character is an end of line. If this isn't true, then the + or - is a valid character in an token.
My lexer rules look like this --
/*
LEXER RULES
*/
ID : Any+
| Quote (Any | Blank)* Quote
;
fragment
Blank : ' '
;
fragment
Any :( AlphaNum | Special | NATL )
;
fragment
Quote : '\''
;
fragment
Special : '_' | '-' | '=' | '+'
| '/' | '\\'
| ':' | ';'
| '<' | '>'
| '.' | ',' | '?' | '!'
| '~' | '%' | '^' | '&' | '*'
| '{' | '}' | '[' | ']' | '|'
;
fragment
AlphaNum: ALPHA|DIGIT;
fragment
DIGIT : ('0'..'9');
fragment
ALPHA
: ('a'..'z'|'A'..'Z')
;
fragment
NATL : ( '$' | '#' | '@')
;
EOS :
( '\r'
| '\n'
)+
;
CONTINUEMINUS
: '-\r'
| '-\n'
| '-\r\n'
{ $channel=HIDDEN; }
;
CONTINUEPLUS
: '+\r'
| '+\n'
| '+\r\n'
{ $channel=HIDDEN; }
;
WS :
( ' '
| '\t'
)+
{ $channel=HIDDEN; }
;
COMMENT
: '/*' (options {greedy=false;} : . )* '*/'
{ $channel=HIDDEN; }
;
I have a problem when I have a statement like the following --
Cmd parm1 parm2 verylong-
parm
The - at the end of the verylongparm is absorbed as part of the ID token.
The above works OK if there's WS between the last token and the -, but that't not the syntax I have to conform to.
Thanks for any additional feedback.
-----Original Message-----
From: Johannes Luber [mailto:JALuber at gmx.de]
Sent: Tuesday, January 13, 2009 9:13 AM
To: Gavin Lambert; Brisard, Fred D; antlr-interest at antlr.org
Subject: Re: [antlr-interest] Handling explicit continuation characters
> At 21:05 13/01/2009, Johannes Luber wrote:
> >Wouldn't it be easier to create an own StringStream (dreived
> from
> >ANTLRStringStream) which silently swallows the + and - as well
> the
> >following newline? Then both lexer and parser are cleaner.
>
> That's certainly a possibility (and perhaps a good one), but
> that'd end up screwing up the line numbering, wouldn't it?
I don't see, why swallowing the two characters would prevent increasing the line count. Tokens receive the line number from the stream itself and not because the lexer counts newlines.
Johannes
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
More information about the antlr-interest
mailing list