[antlr-interest] Fwd: Mismatched token problem
Johannes Luber
JALuber at gmx.de
Thu Jan 15 09:01:05 PST 2009
> Just realized that when I hit "Reply" it sent the message to Kevin
> instead of the list. Can't we change the mailman configuration to set
> the Reply-To: header in the email to always be to the mailing list?
There have been several discussions about changing this behaviour already - all ended with the result, that it is ultimately up to Ter who likes the current behaviour.
Johannes
>
> Rich
>
> ---------- Forwarded message ----------
> From: Richard Wallace <rwallace at thewallacepack.net>
> Date: Wed, Jan 14, 2009 at 7:19 PM
> Subject: Re: [antlr-interest] Mismatched token problem
> To: "Kevin J. Cummings" <cummings at kjchome.homeip.net>
>
>
> So, rather than continuing to talk about it all in an abstract way and
> showing you just bits I threw up the project I'm working on on Google
> Code <http://code.google.com/p/cssselectors/>. It's a library for
> using CSS selectors to get elements out of XML documents. I'm hoping
> to be able to use it in integration tests of web applications rather
> than having to use XPath which I've never really liked. The ANTLR
> grammar can be found at
> <http://code.google.com/p/cssselectors/source/browse/trunk/src/main/antlr/com/threelevers/css/CssSelectors.g>.
>
> On Wed, Jan 14, 2009 at 4:51 PM, Kevin J. Cummings
> <cummings at kjchome.homeip.net> wrote:
> > Richard Wallace wrote:
> >
> >> Ok, I'm feeling really dense right now. I put the rules in as follows:
> >>
> >> fragment IDENTFRAGMENT
> >> : ('_' | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' )
> >> ;
> >>
> >> fragment IDENTNUMFRAGMENT
> >> : IDENTFRAGMENT | '0' .. '9'
> >> ;
> >>
> >> IDENT
> >> : IDENTFRAGMENT ( DASH | IDENTNUMFRAGMENT )*
> >> ;
> >>
> >> DASH
> >> : '-' ( options{greedy=true;} : IDENTFRAGMENT { $type = IDENT; } )?
> >> ;
> >>
> >> And I even understand what it means (I think), but I'm still running
> >> into the problem that in the expression 4n-1, n-1 is still being
> >> considered an expression. I had to change protected to fragment to
> >
> > Sorry I thought you were using Antlr 2.7.7, that must of been someone
> else I
> > was chatting with, yes, fragment is correct for Antlr 3.x
> >
> >> get the lexer to not try and match 4 as a IDENTNUMFRAGMENT and the
> >> IDENT rule to match the language. But I don't think that should cause
> >> this not to work, should it? I must be missing something. Any ideas?
> >
> > In your expr rule you specify S* as possible whitespace seperators.
> Also, if
> > you need to match n-1 as an IDENT, then its possible that you need do
> > another fragment to catch the 'n' and what follows as an IDENT.
> >
>
> Sorry, in this case I don't want n-1 to be an IDENT. It should be in
> most cases, but in this case, when inside a :nth-child() function it
> shouldn't be considered an IDENT. In CSS it is perfectly valid to
> have something like
> #n-1
> where n-1 is the id of the element we want to find.
>
> The reason I include whitespace explicitly in some places rather than
> ignoring it is because it is important in one context in CSS. In the
> selector
> #a .b
> the space between the #a and #b is significant because it indicates
> that we are looking for an element with a class of "b" that is a
> descendant of an element with an id of "a". I couldn't figure out a
> way to make the spaces everywhere else be ignored but still have this
> one be recognized properly. If the space isn't recognized properly,
> "#a .b" is treated the same as "#a.b" which has a completely different
> meaning.
>
> > By default, ANTLR does greedy matching of tokens. In other words, it
> tries
> > to match as much as possible based on your rules. It also tokenizes
> before
> > it parses. So, if you don't want 4n-1 to be NUMBER IDENT, then you need
> to
> > have a lexer rule to catch something different. Does it help if you try
> a
> > lexer rule that catches NUMBER 'n' as a TOKEN? and then use *that* in
> your
> > expr rule?
> >
>
> I'm not sure exactly what you mean here. I've looked at a bunch of
> examples and can't figure it out. I tried adding a
>
> tokens {
> MAGN;
> }
>
> but then I'm not sure where to put the lexer rule. I tried
>
> ATERM : ( NUMBER? 'n' ) -> MAGN
>
> but ANTLR claims MAGN is an unexpected token so obviously I'm doing
> something wrong.
>
> > Also, when I code expression parsers that don't care about whitespace, I
> > just set whitespace to be ignored in the lexer. ANTLR will still stop
> > lexing tokens when it finds a whitespace. So, in general, I never
> reference
> > whitespace in the parser. You need to fix your token stream so that the
> > parser does the right thing with what it finds.
> >
> > Make a lexer rule for: DASH? NUMBER? 'n' Or maybe just for NUMBER 'n'
> >
>
> I tried a rule called ATERM that looked like
>
> ATERM : DASH? NUMBER? 'n' ;
>
> and tried putting that in the nth_child_expr as
>
> nth_child_expr : ATERM S* ('+' | DASH) S* NUMBER and that didn't help
> either.
>
> > Sorry for being vague, but I hope its helpful.
> >
>
> Hopefully, now that my full grammar is out there you can take a better
> look at it and see what's going on. I appreciate all the help, it's
> been really valuable and I'm learning a lot (mostly how much I have to
> learn about antlr ;)).
>
> >> Rich
> >
> > --
> > Kevin J. Cummings
> > kjchome at rcn.com
> > cummings at kjchome.homeip.net
> > cummings at kjc386.framingham.ma.us
> > Registered Linux User #1232 (http://counter.li.org)
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
--
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger
More information about the antlr-interest
mailing list