[antlr-interest] Match a repetition of characters

Robin diabeteman at gmail.com
Fri Jun 24 14:01:01 PDT 2011


The thing is that I'd like something more generic (I don't want to repeat
the code for each special character as there are about 30 of them that can
match)

Here is what I'm trying to match so you have the background
http://docutils.sourceforge.net/docs/ref/rst/restructuredtext.html#sections

The minimum length of the underline is driven by the length of the title
text. I was thinking about something like this :

https://raw.github.com/robin-jarry/rst4eclipse/master/src/main/java/org/diabeteman/rst4eclipse/Rst.g

But it doesn't work...


On Fri, Jun 24, 2011 at 7:40 PM, Jim Idle <jimi at temporal-wave.com> wrote:

> Don't try to do this in the lexer or parser, you will just get difficult
> to interpret syntax errors. You want to generate semantic errors with more
> context. However, you want to do something like this if you must
> distinguish 4 or more from singles:
>
> fragment UNDERSCORES;
> UNDERSCORE: '_'
>             (    ('___')=> '_'+ {$type = UNDERSCORES;}
>                 |
>              )
> ;
>
> But this:
>
> UNDERSCORES: '_'+;
>
> Then
>
> prule: UNDERSCORES { if (countem($UNDERSCORES) < 4) { semantic error } ;
>
>
> Is probably a better approach.
>
> Jim
>
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Douglas Godfrey
> > Sent: Friday, June 24, 2011 8:39 AM
> > To: Robin
> > Cc: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Match a repetition of characters
> >
> > underline returns [char symbol]
> >  : underlineAtom {$symbol=$underlineAtom.text} {$symbol}+ LINE_BREAK  ;
> >
> > underlineAtom
> >  : ( UNDERSCORE UNDERSCORE UNDERSCORE UNDERSCORE+ )  | ( STAR STAR STAR
> > STAR+ )  | ( PIPE PIPE PIPE PIPE+ )  | ( BACKTICK BACKTICK BACKTICK
> > BACKTICK+ )  | ( COLUMN COLUMN COLUMN COLUMN+ )  | ( SPECIAL_CHAR
> > SPECIAL_CHAR SPECIAL_CHAR SPECIAL_CHAR+ )  ;
> >
> >
> >
> > On Fri, Jun 24, 2011 at 6:02 AM, Robin <diabeteman at gmail.com> wrote:
> >
> > > Hello everyone,
> > >
> > > I'm trying to write a rule that matches the repetition (4 or more) of
> > > the same special character
> > >
> > > For example:
> > >
> > > "^^^^^^^^^^^^^^^^^^^^"
> > >
> > > or
> > >
> > > "________________"
> > >
> > > I have these lexer rules :
> > >
> > > UNDERSCORE : '_';
> > > BACKTICK : '`';
> > > STAR : '*';
> > > PIPE : '|';
> > > COLUMN : ':';
> > > SPECIAL_CHAR :
> > >
> > >
> > ('!'|'"'|'#'|'$'|'%'|'&'|'\''|'('|')'|'+'|','|'.'|'/'|';'|'<'|'='|'>'|
> > > '?'|'@'|'['|'\\'|']'|'^'|'{'|'}'|'~');
> > > LINE_BREAK : '\u000C'?'\r'?'\n';
> > >
> > > And I'd like to write a parser rule named "underline" that only
> > > matches if this is a repetition of *the same character* and that
> > > returns this character. So that enclosing rules can use it.
> > >
> > > For now I wrote this:
> > >
> > > underline returns [char symbol]
> > >  : underlineAtom {$symbol=$underlineAtom.text} {$symbol}+ LINE_BREAK
> > > ;
> > >
> > > underlineAtom
> > >  : UNDERSCORE
> > >  | STAR
> > >  | PIPE
> > >  | BACKTICK
> > >  | COLUMN
> > >  | SPECIAL_CHAR
> > >  ;
> > >
> > > But my grammar does not compile...
> > >
> > > Can someone help me on this ? :)
> > >
> > > Thanks
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe:
> > > http://www.antlr.org/mailman/options/antlr-interest/your-email-
> > address
> > >
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>


More information about the antlr-interest mailing list