[antlr-interest] Fwd: Mismatched token problem

Thu Jan 15 09:01:05 PST 2009

> Just realized that when I hit "Reply" it sent the message to Kevin
> instead of the list.  Can't we change the mailman configuration to set
> the Reply-To: header in the email to always be to the mailing list?

There have been several discussions about changing this behaviour already - all ended with the result, that it is ultimately up to Ter who likes the current behaviour.

Johannes

> 
> Rich
> 
> ---------- Forwarded message ----------
> From: Richard Wallace <rwallace at thewallacepack.net>
> Date: Wed, Jan 14, 2009 at 7:19 PM
> Subject: Re: [antlr-interest] Mismatched token problem
> To: "Kevin J. Cummings" <cummings at kjchome.homeip.net>
> 
> 
> So, rather than continuing to talk about it all in an abstract way and
> showing you just bits I threw up the project I'm working on on Google
> Code <http://code.google.com/p/cssselectors/>.  It's a library for
> using CSS selectors to get elements out of XML documents.  I'm hoping
> to be able to use it in integration tests of web applications rather
> than having to use XPath which I've never really liked.  The ANTLR
> grammar can be found at
> <http://code.google.com/p/cssselectors/source/browse/trunk/src/main/antlr/com/threelevers/css/CssSelectors.g>.
> 
> On Wed, Jan 14, 2009 at 4:51 PM, Kevin J. Cummings
> <cummings at kjchome.homeip.net> wrote:
> > Richard Wallace wrote:
> >
> >> Ok, I'm feeling really dense right now.  I put the rules in as follows:
> >>
> >> fragment IDENTFRAGMENT
> >>    : ('_' | 'a'..'z'| 'A'..'Z' | '\u0100'..'\ufffe' )
> >>    ;
> >>
> >> fragment IDENTNUMFRAGMENT
> >>    : IDENTFRAGMENT | '0' .. '9'
> >>    ;
> >>
> >> IDENT
> >>    : IDENTFRAGMENT ( DASH | IDENTNUMFRAGMENT )*
> >>    ;
> >>
> >> DASH
> >>    : '-' ( options{greedy=true;} : IDENTFRAGMENT { $type = IDENT; } )?
> >>    ;
> >>
> >> And I even understand what it means (I think), but I'm still running
> >> into the problem that in the expression 4n-1, n-1 is still being
> >> considered an expression.  I had to change protected to fragment to
> >
> > Sorry I thought you were using Antlr 2.7.7, that must of been someone
> else I
> > was chatting with, yes, fragment is correct for Antlr 3.x
> >
> >> get the lexer to not try and match 4 as a IDENTNUMFRAGMENT and the
> >> IDENT rule to match the language.  But I don't think that should cause
> >> this not to work, should it?  I must be missing something.  Any ideas?
> >
> > In your expr rule you specify S* as possible whitespace seperators.
> Also, if
> > you need to match n-1 as an IDENT, then its possible that you need do
> > another fragment to catch the 'n' and what follows as an IDENT.
> >
> 
> Sorry, in this case I don't want n-1 to be an IDENT.  It should be in
> most cases, but in this case, when inside a :nth-child() function it
> shouldn't be considered an IDENT.  In CSS it is perfectly valid to
> have something like
>    #n-1
> where n-1 is the id of the element we want to find.
> 
> The reason I include whitespace explicitly in some places rather than
> ignoring it is because it is important in one context in CSS.  In the
> selector
>    #a .b
> the space between the #a and #b is significant because it indicates
> that we are looking for an element with a class of "b" that is a
> descendant of an element with an id of "a".  I couldn't figure out a
> way to make the spaces everywhere else be ignored but still have this
> one be recognized properly.  If the space isn't recognized properly,
> "#a .b" is treated the same as "#a.b" which has a completely different
> meaning.
> 
> > By default, ANTLR does greedy matching of tokens. In other words, it
> tries
> > to match as much as possible based on your rules.  It also tokenizes
> before
> > it parses.  So, if you don't want 4n-1 to be NUMBER IDENT, then you need
> to
> > have a lexer rule to catch something different.  Does it help if you try
> a
> > lexer rule that catches NUMBER 'n' as a TOKEN? and then use *that* in
> your
> > expr rule?
> >
> 
> I'm not sure exactly what you mean here.  I've looked at a bunch of
> examples and can't figure it out.  I tried adding a
> 
> tokens {
>    MAGN;
> }
> 
> but then I'm not sure where to put the lexer rule.  I tried
> 
> ATERM : ( NUMBER? 'n' ) -> MAGN
> 
> but ANTLR claims MAGN is an unexpected token so obviously I'm doing
> something wrong.
> 
> > Also, when I code expression parsers that don't care about whitespace, I
> > just set whitespace to be ignored in the lexer.  ANTLR will still stop
> > lexing tokens when it finds a whitespace.  So, in general, I never
> reference
> > whitespace in the parser.  You need to fix your token stream so that the
> > parser does the right thing with what it finds.
> >
> > Make a lexer rule for:  DASH? NUMBER? 'n'  Or maybe just for NUMBER 'n'
> >
> 
> I tried a rule called ATERM that looked like
> 
> ATERM : DASH? NUMBER? 'n' ;
> 
> and tried putting that in the nth_child_expr as
> 
> nth_child_expr : ATERM S* ('+' | DASH) S* NUMBER and that didn't help
> either.
> 
> > Sorry for being vague, but I hope its helpful.
> >
> 
> Hopefully, now that my full grammar is out there you can take a better
> look at it and see what's going on.  I appreciate all the help, it's
> been really valuable and I'm learning a lot (mostly how much I have to
> learn about antlr ;)).
> 
> >> Rich
> >
> > --
> > Kevin J. Cummings
> > kjchome at rcn.com
> > cummings at kjchome.homeip.net
> > cummings at kjc386.framingham.ma.us
> > Registered Linux User #1232 (http://counter.li.org)
> >
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
Psssst! Schon vom neuen GMX MultiMessenger gehört? Der kann`s mit allen: http://www.gmx.net/de/go/multimessenger