[antlr-interest] Reading a string of fixed size

Thomas Brandon tbrandonau at gmail.com
Mon Aug 27 09:41:52 PDT 2007


On 8/28/07, Alexandre Hamez <alexandre.hamez at lip6.fr> wrote:
> Thanks for your interest. But the fact is that I want to create a
> Token with exactly NUMBER characters. The following characters will
> match for other token. It's not an error if there are characters
> which follow. Its mean that I can have something like that:
>
>         CAMI_STRING (',' CAMI_STRING)*
>
> ( for newcomers: CAMI_STRING : NUMBER ':' STRING where the size of
> STRING is given by NUMBER).
>
>
> Moreover, as a strange side effect of the following code, newlines
> make the parser completely mad:
>
> > CAMI_STRING
> >       :
> >       NUMBER ':'
> >       {
> >               // Get the current position in stream
> >               int start  = input.getCharPositionInLine();
> >               // Computing the position of the last character of the STRING to
> > be read
> >               int end = start + Integer.parseInt($NUMBER.text) - 1;
> >               // Set the value of the returned value to STRING
> >               setText(input.substring(start,end));
> >               // Update the position in the stream
> >               input.seek(end+1);
> >       }
> >       ;
Seek takes an absolute index in the stream so you should get the
location from input.index() rather than input.getCharPositionInLine().
Also, you are going to get exceptions calling substring if a length is
specified that extends past the end of the stream. You could call
seek, which won't seek past the end of the stream and then check the
resulting input to determine how many characters could actually be
found. Or repeatedly call consume and check for EOF.
Or a better solution might be to use predicates to handle the
matching. Something like:
CAMI_STRING
    :   NUMBER ':' fs=FIXED_LENGTH_STRING[Integer.parseInt($NUMBER.text)]
        { setText($fs.text); }
    ;

fragment
FIXED_LENGTH_STRING[int len]
    :   ( { len-- > 0 }?=> .)+ { len == 0 }?
    ;
should work. Or you may want to replace the second predicate in
FIXED_LENGTH_STRING with code to record an error if all characters
could not be matched, rather than handling the resulting predicate
failure exception.

Tom.


More information about the antlr-interest mailing list