[antlr-interest] White spaces within token definition

Thu May 8 01:26:24 PDT 2008

Hi,
comments follows

>Anyway, he seems to be on track now :-)

>>Maybe.  But it was a different name and address responding afterwards.  So
who knows if it was the same person or not :)

>>Anyway, fairly unimportant.  I'm just in one of those "quibbling moods" ;)
Given, You make me laugh with happy:)
I first response him but not "response all" at the 3rd mail session.:)
Look, there are so many ways to handle multiple words with whitespaces.

Maybe we can do a comparasion and pick out one way for all.

Any thoughts?

Regards,
Qinxian

2008/5/7 向雅 <fyaoxy at gmail.com>:

> Hi,
>
> In my ON project <http://on.dev.java.net/>, I use the style,
> Multiple-Words as a token.
> the grammar look like follows:
>
> string    :    (WORD+|QSTRING)->^(STRING[$text]);
>
> WORD        :    (~(' ' | '\t' | ',' | ':' | ';' | '{' | '}' | '\r' | '\n'
> | '"' | '\''))+;
>
> QSTRING
> @after{
>     setText(getText().substring(1, getText().length()-1));
> }
>     :    ('"' (~('"'))* '"')    |    ('\'' (~('\''))* '\'');
>
>
> Regards,
> Qinxian
>
> 2008/5/7 Haralambi Haralambiev <hharalambiev at gmail.com>:
>
> > Hello,
> >
> > Is this question too newbie, or is there noone that could answer it?
> >
> > Could someone please give me some insight on the problem, as I
> > do want to understand the cause and not work around the issue.
> >
> > Thanks,
> > Hari
> >
> > On 4/25/08, Haralambi Haralambiev <hharalambiev at gmail.com> wrote:
> > >
> > > Hello,
> > >
> > > I have stumbled upon a problem, that although has some workarounds,
> > > has puzzled me over why it is happening.
> > > (I searched for a similar question, but was unable to find it. I am
> > > sorry if this has been answered somewhere else. If so, please provide me the
> > > link.)
> > >
> > > Consider the following lexer grammar:
> > > ---------------------------------------------------
> > > lexer grammar test;
> > >
> > > CMD_EXIT : 'COMMAND EXIT';
> > > ID : ('A'..'Z'|'a'..'z')+;
> > > WhiteSpaces : (' '|'\t')+ {$channel=HIDDEN;};
> > > ---------------------------------------------------
> > >
> > > Consider that the language that is recognized has many commands with
> > > the syntax "COMMAND <name of the command>", but I am interested only in the
> > > exit command, so I consider "COMMAND EXIT" as a token.
> > > However, I would like
> > > "COMMAND <something else>" to be matched as the sequence of two ID tokens.
> > >
> > > With the grammar above, the "COMMAND EXIT" is successfully matched as a CMD_EXIT token, however "COMMAND XYZ" produces an error "line
> > > 1:8 mismatched character 'X' expecting
> > > 'E'" and what is left (only the character Z) is matched as ID.
> > >
> > > In the generated lexer class, in the mTokes() method I noticed that the lexer will consider everything that starts with "COMMAND " as the CMD_EXIT
> > > token.
> > > It just doesn't consider the characters in the token definition, that were after the white space (i.e. 'E', 'X', 'I' and 'T') during the recognition.
> > >
> > > So, if you could enlighten me on why is this happening, I will be very
> > > grateful!
> > >
> > > Best Regards,
> > > Hari
> > >
> >
> >
>
>
> --
> 致敬
> 向雅

-- 
致敬
向雅

-- 
致敬
向雅
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080508/fd99c8b4/attachment-0001.html