[antlr-interest] recombining tokens
Johannes Luber
jaluber at gmx.de
Mon Jul 28 03:13:42 PDT 2008
Davyd Madeley schrieb:
> Hi all,
>
> I'm currently writing a grammar in which '/' is used to append a
> qualifier to a token. Unfortunately it is also used in path parameters.
>
> I am trying to figure out how I can recombine tokens in the case where I
> determine I'm reading a path.
>
> e.g.
>
> // these are my token delimiters
> TOKEN
> : ~(','|'>'|'*'|'/'|'('|')'|LINE_TERMINATOR)+
> ;
>
> At one point in the state machine, I expect to be able to start reading
> parameters ('LINE' is a special token at the start of the file, but
> after that is just a regular token):
>
> parameter
> : a=TOKEN -> PARAMETER[$a]
> | a='LINE' -> PARAMETER[$a]
> | path -> ^(PATH path)
> ;
>
> path
> : ('/' TOKEN)+
> ;
>
> Every so often, a path will be provided. Currently this will be
> tokenised around the '/', which is undesirable.
>
> e.g.
> PATH (9) .......................... PATH
> '/' (20) ........................ /
> TOKEN (11) ...................... path
> '/' (20) ........................ /
> TOKEN (11) ...................... to
> '/' (20) ........................ /
> TOKEN (11) ...................... my.file
>
> What I want to do is be able to recombine this into a
> PARAMETER["/path/to/my.file"].
>
> Someone spoke about a concatenation operator, but I can't find any info
> about it.
>
> Regards,
> --davyd
>
The root cause of the problem is that the tokenizer is independent from
the parser, so you can't decide without extra code in the lexer if a '/'
belongs to a qualifier or a path. This approach means also to create a
mini-parser, which may need more context information than a pure lexer
can provide. It may be easier to recognize in a first pass the paths as
a series of tokens and then to rewrite it into a single one. This
approach means that you need an AST grammar.
Another question is if you truly need a single PATH token or if you can
use the $path.text attribute instead. Depending on your needs this may
still perform better than the other two approaches above.
Johannes
More information about the antlr-interest
mailing list