[antlr-interest] Reading block of arbitrary text delimited by curly braces

Wed Jul 18 12:25:22 PDT 2012

No, it is just saying that the next part of the rule can eat that too, but
it will do the right thing.

You can lose the warning:

            (
                ('{')=>'{'
              | { error("Missing opening brace for BLOCK"); }
            )

And you can do that with any other warnings in the rule.

I use this technique all the time.

Jim

> -----Original Message-----
> From: Burton Samograd [mailto:burton.samograd at markit.com]
> Sent: Wednesday, July 18, 2012 11:44 AM
> To: Jim Idle
> Cc: antlr-interest at antlr.org
> Subject: RE: [antlr-interest] Reading block of arbitrary text delimited
> by curly braces
>
> Good idea but giving the ( '{' | ... ) alternative gives me multiple
> alternative warnings/errors, possibly because we already have LCURLY
> defined as a lexer token:
>
> warning(200): SDL.g:869:35: Decision can match input such as "'{'"
> using multiple alternatives: 1, 2
>
> --
> Burton Samograd
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Jim Idle
> Sent: Wednesday, July 18, 2012 11:34 AM
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Reading block of arbitrary text delimited
> by curly braces
>
> You will have to handle this in the lexer - you are trying to perform
> syntax driven lexing and this requires context and communication
> between the parser and the lexer and is either not going to work at
> all, or will fail in seemingly strange ways.
>
>
> BLOCK: 'BLOCK'
>        (
>            (
>                '{'
>              | { error("Missing opening brace for BLOCK"); }
>            )
>
> { /* Could set token start here */ }
>
>               (~'}')*
>
> { /* Could set token end here by calling emit(); }
>
>                  (   '}'  // Good
>                    | { error("Missing closing brace"); }
>                  )
>        )
> ;
>
> You might need to tweak the above for your needs, but you are not going
> to make this work correctly from the parser. You could fake lexer
> states so that you get more than one token in the stream, but your
> errors are so simple, that I personally would not bother.
>
> Jim
>
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Burton Samograd
> > Sent: Wednesday, July 18, 2012 9:50 AM
> > To: Stephen Siegel
> > Cc: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Reading block of arbitrary text
> > delimited by curly braces
> >
> > To clarify why pulling in the block as a whole token was not ideal,
> we
> > did have it working that way but an issue was presented where we
> would
> > like to give a better error message when the curlies are forgotten.
> > Initially I tried to create another block matching rule that started
> > with 'BLOCK' and searched for any character that was not a { and used
> > that in an alternate match rule but it caused issues in other parts
> of
> > the parser which made little sense.  This is why I am looking to
> break
> > the block rule out of Its single lexer token implementation if it's
> > possible.
> >
> > --
> > Burton Samograd
> >
> > -----Original Message-----
> > From: Stephen Siegel [mailto:siegel at udel.edu]
> > Sent: Wednesday, July 18, 2012 10:15 AM
> > To: Burton Samograd
> > Cc: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Reading block of arbitrary text
> > delimited by curly braces
> >
> > Yeah, but maybe it can't work.  I think the fundamental problem is
> > that what you consider to be a token depends on the state of the
> > parser, so some communication has to place from the parser to the
> > lexer, which is weird.  It makes more sense to make the whole "BLOCK
> > {...}" one token, as Mike wrote.  Here is a complete grammer which I
> > ran on some examples and works fine:
> >
> > grammar exp;
> >
> > file    :       BLOCK* EOF;
> >
> > BLOCK   :       'BLOCK' WS* LCURLY ( options {greedy=false;} : . )*
> > RCURLY
> >         ;
> >
> > LCURLY  :       '{';
> > RCURLY  :       '}';
> >
> > WS  :  (' '|'\r'|'\t'|'\u000C'|'\n') {$channel=HIDDEN;}
> >     ;
> >
> >
> > The "BLOCK {" and "}" do appear in the token text but there is
> > probably some way to get rid of them.
> >
> > On Jul 18, 2012, at 10:55 AM, Burton Samograd wrote:
> >
> > > Is this what you are suggesting?
> > >
> > > // Global
> > > bool inBlockData = false;
> > >
> > > // Parser
> > > block
> > >    : BLOCK LCURLY { inBlockData = true; }  BLOCK_DATA RCURLY {
> > inBlockData = false; }
> > >        -> ^(BLOCK BLOCK_DATA)
> > >    ;
> > >
> > > // Lexer
> > > BLOCK : 'BLOCK' ;
> > > BLOCK_DATA : { inBlockData }?=> (~'}')* ;
> > >
> > > Using this technique gets me a bit further, but I am getting a
> > > recognition exception when I hit the BLOCK_DATA like it is still
> > going
> > > through my original lexer/parser and not collecting the BLOCK_DATA
> > > like I would like it to.
> > >
> > > I did some reading on semantic predicates but literature just gave
> > > examples for parser rules so I am not sure if I applied the concept
> > to lexer rules properly.
> > >
> > > --
> > > Burton Samograd
> > >
> > > -----Original Message-----
> > > From: Stephen Siegel [mailto:siegel at udel.edu]
> > > Sent: Tuesday, July 17, 2012 6:35 PM
> > > To: Burton Samograd
> > > Cc: antlr-interest at antlr.org
> > > Subject: Re: [antlr-interest] Reading block of arbitrary text
> > > delimited by curly braces
> > >
> > > You could use a boolean variable added to the lexer "inBlock".
> > Initially it is false.  Set it to true just after matching the LCURLY
> > and back to false after matching RCURLY in the block rule.   They you
> > could define the BLOCK_DATA token using inBlock as a guard (I think
> > that's called a "semantic predicate").  BLOCK_DATA should match
> > anything EXCEPT RCURLY (I'm assuming you don't want to allow RCURLY
> in
> > the block data, or how would you know when the block ends? -- just
> > like a comment in C, for example.)
> > > -Steve
> > >
> > > On Jul 17, 2012, at 3:57 PM, Burton Samograd wrote:
> > >
> > >> Hello,
> > >>
> > >> We have a requirement where we need to include a block of
> arbitrary
> > text in a block, like so:
> > >>
> > >> BLOCK { some arbitrary text here }
> > >>
> > >> We initially got around this by making the whole block a token,
> > like:
> > >>
> > >> BLOCK : 'BLOCK (' '|'\t'|'\r'|'\n')* '{' (~'}')*  '}' ;
> > >>
> > >> but this is less than ideal.  I am thinking that we would use
> > something like:
> > >>
> > >> block : BLOCK RCURLY BLOCK_DATA LCURLY
> > >>
> > >> with BLOCK : 'BLOCK' and LCURLY/RCURLY as { and }.
> > >>
> > >> I am stuck on specifying BLOCK_DATA which is basically .* to the
> > lexer.  I have attempted to access the input stream from the parser
> > RECOGNIZER but have not been able to come up with a solution.
> > >>
> > >> I am looking to basically hijack the input stream after seeing a
> > BLOCK token so I can read the arbitrary text; I can parse out the  {
> }
> > as needed.
> > >>
> > >> Is this possible?
> > >>
> > >> --
> > >> Burton Samograd
> > >>
> > >> ________________________________
> > >> This e-mail, including accompanying communications and
> attachments,
> > >> is strictly confidential and only for the intended recipient. Any
> > >> retention, use or disclosure not expressly authorised by Markit is
> > >> prohibited. This email is subject to all waivers and other terms
> at
> > >> the following link:
> > >> http://www.markit.com/en/about/legal/email-disclaimer.page
> > >>
> > >> Please visit http://www.markit.com/en/about/contact/contact-
> us.page?
> > for contact information on our offices worldwide.
> > >>
> > >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > >> Unsubscribe:
> > >> http://www.antlr.org/mailman/options/antlr-interest/your-email-
> > addres
> > >> s
> > >
> > >
> > > This e-mail, including accompanying communications and attachments,
> > is
> > > strictly confidential and only for the intended recipient. Any
> > > retention, use or disclosure not expressly authorised by Markit is
> > > prohibited. This email is subject to all waivers and other terms at
> > > the following link:
> > > http://www.markit.com/en/about/legal/email-disclaimer.page
> > >
> > > Please visit http://www.markit.com/en/about/contact/contact-
> us.page?
> > for contact information on our offices worldwide.
> >
> >
> > This e-mail, including accompanying communications and attachments,
> is
> > strictly confidential and only for the intended recipient. Any
> > retention, use or disclosure not expressly authorised by Markit is
> > prohibited. This email is subject to all waivers and other terms at
> > the following link: http://www.markit.com/en/about/legal/email-
> > disclaimer.page
> >
> > Please visit http://www.markit.com/en/about/contact/contact-us.page?
> > for contact information on our offices worldwide.
> >
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-
> interest/your-
> > email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address
>
> This e-mail, including accompanying communications and attachments, is
> strictly confidential and only for the intended recipient. Any
> retention, use or disclosure not expressly authorised by Markit is
> prohibited. This email is subject to all waivers and other terms at the
> following link: http://www.markit.com/en/about/legal/email-
> disclaimer.page
>
> Please visit http://www.markit.com/en/about/contact/contact-us.page?
> for contact information on our offices worldwide.