[antlr-interest] onerous lex pattern
Bryan Ewbank
ewbank at gmail.com
Mon Jan 23 05:58:21 PST 2006
Hi Jeff,
How about if you change the way you think about this multi-line token so that
it starts with a "|" in col 1, and continues through the first newline not
followed by a "|" char? It requires k=2, but that shouldn't be a problem...
I'm not too good with ANTLR lexer rules - I just use lex - but it would look
something like this:
MULTILINESTRING:
( {inputState.guessing != 0 || getColumn() == 1}?
'|'!
( options {greedy=true;}: ~('\r' | '\n') )*
( options {greedy=true;}:
NL
'|'!
( options {greedy=true;}: ~('\r' | '\n') )*
)*
)
;
Is the final NL of the last line starting with "|" considered part of the
token? I'd assume "no", right?
Note that there is a difference between what you described and the rule that
you wrote:
> Rose serializes strings that have a quote or a newline
> in them by starting them at column 1 and beginning
> each line of the string with a '|'. So my lexer rule
> looks like this:
> MULTILINESTRING:
> ({inputState.guessing != 0 || getColumn() == 1}?
> '|'!)
> ( options { greedy = false; }:
> ~('\r' | '\n')
> )*
> (NL)+
> ;
The description requires every line in the string to have a leading "|", but
the rule allows blank lines to be part of the token. Is this desired, rather
than requiring a "|" between adjacent newlines?
E.g.
|this is the question - one string or two?
|is this the same string?
|description says no, rule says yes...
More information about the antlr-interest
mailing list