[antlr-interest] Re: Newbie needing parser help
craigbarker1
craigbarker1 at yahoo.com
Mon Apr 26 11:28:34 PDT 2004
Is there an easy way to make the parser think that it's been sent a
quoted string by inserting the " token into the token stream if its
not the next one? I suppose this also causes the problem of how to
position the closing ". Effectively nothing between the commas is
significant but if I try something along the lines of (~(COMMA|NL))*
I get lots of non-determinism.
Thanks for your help.
--- In antlr-interest at yahoogroups.com, "lgcraymer" <lgc at m...> wrote:
> Ugly problem. What might make sense for this one is to make state
> changes in the lexer and recognize strings in your COMMA rule.
>
> That is:
>
> ID :
> <character tokens>
> { hash table lookup; set commaText var if appropriate }
> ;
>
>
> COMMA :
> { commaText = true }? ','! (~(',' | '\n'))+
> { _ttype = COMMATEXT; }
> | ','
> ;
>
> You can probably also do something with a token filter.
>
> --Loring
>
> --- In antlr-interest at yahoogroups.com, "craigbarker1"
> <craigbarker1 at y...> wrote:
> > Hi All,
> >
> > I'm relatively new to all this language recognition stuff and
have a
> > question that I could really use a hand with. It's probably not
that
> > hard, it's more likely that i'm just missing something obvious.
> >
> > The issue is that i'm trying to parse a language that allows
> > unquoted strings to be passed as parameters to functions. There
are
> > no rules on what can go inside these unquoted string's - they
can be
> > the names of literals, functions or any random sequence of
> > characters.
> >
> > I've tried recognising a set of ID tokens (defined as per the
java
> > grammer specification) but this is no good as i've got
> > testLiterals=true; so anything that is a literal comes through
from
> > the lexer as a specific token type and therefore doesn't match
> > against ID.
> >
> > Here is an example of the type of thing i'm trying to match:
> >
> > PAGES,Sale detail,Status changes,Sale costs
> >
> > The issue lies with the fact that each of the parameters are
REALLY
> > strings but in this bizzare language they don't have to be
double
> > quoted. The issue is further compounded by the fact that the
word
> > Status is really a function name and hence has a specific token
type.
> >
> > Here is a snippet of the grammer i've done so far to deal with
> > this:
> >
> > designerCommand
> > //Commands to the designer
> > : "SIZE" COMMA NUM_INT COMMA NUM_INT
> > | "PAGES" COMMA textParameter (COMMA textParameter)*
> > ;
> >
> > textParameter
> > : (ID)*
> > | STRING_LITERAL
> > ;
> >
> > Please let me know if you can provide any advise at all or even
> > point me to a relevant article somewhere.
> >
> > Many thanks in advance,
> >
> > Craig
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list