[antlr-interest] ambigous lexer tokens
Torsten Curdt
tcurdt at vafer.org
Thu Jun 28 02:54:20 PDT 2007
On 28.06.2007, at 11:13, Wincent Colaiuta wrote:
> El 27/6/2007, a las 22:44, Torsten Curdt escribió:
>
>> I would like to write a grammar for the following output:
>>
>> drwxr-xr-x 23 tcurdt tcurdt 782 Jun 24 22:54 ..
>> -rw-r--r-- 1 tcurdt tcurdt 18545 Nov 1 2006
>> ASMContentHandler.Rule.html
>>
>> Of course that means that the tokens (TYPE/MODS/INT/NAME/HOUR/
>> YEAR) for the lexer are ambiguous.
>> How should such a grammar look like? Pointers?
>
> I think you have a number of options:
>
> 1. Given that many of the tokens look the same, don't try to
> differentiate between them in the lexer. Instead handle everything
> in the parser.
OK
> 2. Use predicates in the lexer to turn alternatives on and off
> depending on which "column" you're in (ie. make a context-sensitive
> lexer).
Could you give an example how that would look like?
> 3. Don't use ANTLR for this task. The input is so limited and
> regular that it may be quicker to just write something by hand.
Was tempted as it should be easy to do with just a regular
expression. But I wanted to see if antlr would be suitable for it too.
> I personally would go with "3" in this case because I think you are
> much more likely to come up with a correct parser by hand; ANTLR is
> a very complex tool and it can deviate from your expectations in
> incredibly subtle and hard-to-understand ways.
I used v2 before ...but in that case the lexing was much more obvious.
Thanks a lot for your input.
cheers
--
Torsten
More information about the antlr-interest
mailing list