[antlr-interest] How to handle blackslashes correctly?

Wed Jan 19 08:54:57 PST 2011

I think that you are expecting the parser to tell the lexer which token it
should return. It will not do that. The language you are trying to parse is
broken really, but you may be able to create state in your lexer by setting
flags to create different tokens at different points, or you might specify
the keyword BrowserMatch as eating everything after it to the newline and
extracting the path later (this is what I usually do).

However, unless you post both lexer and parser parts that you want to work,
then we can't help you explicitly. Think of the lexer running first and
making ALL the tokens, then the parser running afterwards.

Jim

> -----Original Message-----
> From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> bounces at antlr.org] On Behalf Of Hiran Chaudhuri
> Sent: Wednesday, January 19, 2011 6:32 AM
> To: Pop Qvarnström
> Cc: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] How to handle blackslashes correctly?
>
> I will add the lexer part of my grammar. The rest is longish and
> probably not relevant here (I hope).
>
> ======8 8===========================
>
>
> -----Ursprüngliche Nachricht-----
> Von: "Pop Qvarnström"
> Gesendet: Jan 18, 2011 6:14:01 PM
> An: "Hiran Chaudhuri"
> Betreff: Re: [antlr-interest] How to handle blackslashes correctly?
>
> >Could you provide your grammar, or relevant parts of it?
> >
> >Cheers,
> >Pop
> >
> >On Tue, Jan 18, 2011 at 5:10 PM, Hiran Chaudhuri  wrote:
> >
> >> Hello everybody.
> >>
> >> I've got input files with different meaning for backslashes.
> >> Therefore my lexer does not really know how to generate the tokens
> >> and the parser does not what I want it to do. Maybe someone can help
> me checking this?
> >> A backslash before a linefeed means the linefeed is just whitespace,
> >> whereas elsewhere it is not.
> >> A backslash in some regions of the file is meant to be part of a
> file
> >> path (Windows).
> >> A backslash in some regions of the file is part of a regular
> expression.
> >> I'm not interested in parsing that, so it shall be handled like a
> >> string value.
> >> A backslash before a quote inside a quoted string means the quote
> >> does not terminate the string.
> >>
> >> I've created a grammar that can handle all cases from my point of
> >> view. Now let's look at one fragment:
> >>
> >> BrowserMatch \bMSIE !no-gzip !gzip-only-text/html
> >>
> >> This should be parsed as
> >> Keyword BrowserMatch
> >> value   \bMSIE
> >> not     !
> >> value   no-gzip!gzip-only-text/html
> >> not     !
> >> value   gzip-only-text/html
> >>
> >> but it is parsed as
> >> Keyword BrowserMatch
> >> unknown \b
> >> value   MSIE
> >> ...
> >>
> >> My expression for value allows backslash and the necessary letters,
> >> still the parser thinks it should not recognize this value.
> >> What can be the reason for that?
> >>
> >> Hiran
> >> ___________________________________________________________
> >> Empfehlen Sie WEB.DE DSL Ihren Freunden und Bekannten und wir
> >> belohnen Sie mit bis zu 50,- Euro!
> >> https://freundschaftswerbung.web.de
> >>
> >> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >> Unsubscribe:
> >> http://www.antlr.org/mailman/options/antlr-interest/your-email-
> addres
> >> s
> >>
> >
> >List: http://www.antlr.org/mailman/listinfo/antlr-interest
> >Unsubscribe:
> >http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> ___________________________________________________________
> WEB.DE DSL Doppel-Flat ab 19,99 &euro;/mtl.! Jetzt mit gratis Handy-
> Flat! http://produkte.web.de/go/DSL_Doppel_Flatrate/2
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> email-address