[antlr-interest] First grammar a simple string template language

Kevin J. Cummings cummings at kjchome.homeip.net
Fri Sep 24 06:36:58 PDT 2010


On 09/24/2010 07:56 AM, Daniel Lidström wrote:
> From: "Kevin J. Cummings" <cummings at kjchome.homeip.net>
>> On 09/23/2010 09:50 AM, Daniel Lidström wrote:
>>> How do I "capture" the dash and colon? Here's my grammar:
>>
>> If the "-" and ":" are a part of your format string, then they are not a
>> part of your input, are they?  I would think that outputting them would
>> be a function of how you handle your format string (which you included
>> above).  It looks like you are outputting just the "variable" part of
>> your format string and not the "constant" part....
>>
>> When you parse your format string, you will need to save the constant
>> parts verbatim.
>>
>> Perhaps you can use the "dot notation" (of the lexer) to save anything
>> that isn't one of your tokens listed below, and output them verbatim.
>> So, you need another token type to catch "anything else".
> 
> Thanks for the suggestion. I have taken a step back and my grammar now
> looks like this:
> 
> program
>     : statement*
>     ;
> 
> statement
>     : TEXT
>     | variable
>     ;
> 
> variable
>     : '[' LETTERS ']'
>     ;
> 
> fragment LETTER : 'a'..'z' | 'A'..'Z' ;
> LETTERS : LETTER+ ;
> TEXT : ~('[' | ']')+ ;
> 
> Using Antlr IDE within Eclipse I can see that this parses something like
> "set status [yyyy]-[M]-[d] [H]:[m]"
> correctly (all [...] are treated like variables, the rest are
> statements). My TEXT lexer seems to be working fine. If I try to use the
> dot, I get an error:
> 
> TEXT : .+ ;
> 
> error(201): /TemplateCommand/src/com/gpsgate/TemplateCommand.g:31:8: The
> following alternatives can never be matched: 1
> |---> TEXT : .+ ;
> 
> Is there a way to use dot or should I just be fine with the TEXT lexer
> as is?

You've done fine.  Watch for conflicts between LETTERS and TEXT.
Remember that the lexer tries to match as large a token at once as it
can.  I seem to remember that TEXT should be one of your last rules so
that explicit spelling will take precedence over it.  If you run into
specific conflicts, you might be able to help differentiate them with
syntactic predicates in your lexer.  Jim Idle always points towards the
floating point number/integer/date/range example as to how to best do that.

Have fun!

> Daniel

-- 
Kevin J. Cummings
kjchome at rcn.com
cummings at kjchome.homeip.net
cummings at kjc386.framingham.ma.us
Registered Linux User #1232 (http://counter.li.org)


More information about the antlr-interest mailing list