[antlr-interest] onerous lex pattern

Bryan Ewbank ewbank at gmail.com
Mon Jan 23 05:58:21 PST 2006


Hi Jeff,

How about if you change the way you think about this multi-line token so that
it starts with a "|" in col 1, and continues through the first newline not
followed by a "|" char?  It requires k=2, but that shouldn't be a problem...

I'm not too good with ANTLR lexer rules - I just use lex - but it would look
something like this:

MULTILINESTRING:
    ( {inputState.guessing != 0 || getColumn() == 1}?
        '|'!
        ( options {greedy=true;}: ~('\r' | '\n') )*
        ( options {greedy=true;}:
            NL
            '|'!
            ( options {greedy=true;}: ~('\r' | '\n') )*
        )*
    )
    ;

Is the final NL of the last line starting with "|" considered part of the
token?  I'd assume "no", right?

Note that there is a difference between what you described and the rule that
you wrote:

> Rose serializes strings that have a quote or a newline
> in them by starting them at column 1 and beginning
> each line of the string with a '|'. So my lexer rule
> looks like this:

> MULTILINESTRING:
>     ({inputState.guessing != 0 || getColumn() == 1}?
> '|'!)
>     ( options { greedy = false; }:
>         ~('\r' | '\n')
>         )*
>         (NL)+
> ;

The description requires every line in the string to have a leading "|", but
the rule allows blank lines to be part of the token.  Is this desired, rather
than requiring a "|" between adjacent newlines?

E.g.
    |this is the question - one string or two?

    |is this the same string?
    |description says no, rule says yes...


More information about the antlr-interest mailing list