[antlr-interest] Novice Question - Token for all characters from a given point to End of Line

Gavin Lambert antlr at mirality.co.nz
Wed Aug 6 04:13:08 PDT 2008


At 09:24 6/08/2008, Brisard, Fred D wrote:
 >I am currently collecting each "word" (separated by WS) for the
 >length of the line and identifying them separately.  I really
 >just need to get all the words as a single token - at least,
 >that's what I think I want to do.

I still don't see why you would want that.  That would just make 
the job of figuring out what it all means much harder.

 >I should describe more of what I'm doing.  I'm creating a parser 

 >that parses a "language" and then provides the ability to 
display the
 >information in a form-based view for editing.  I will then let 
the
[...]
 >In addition, the command name and keyword values have implied
 >abbreviations.  So if you have 2 keywords - before and after, 
then
 >b and a are sufficient to discriminate between them.

All of this stuff is best handled in the parser -- just create 
simple tokens eg. WORD, NUMBER, QUOTED_STRING, OPEN_BRACKET, etc, 
and work out what they actually mean at the parser level.

 >Finally there is the concept of continuation - a statement can 
be
 >continued by the last character on a line being a + or -.  The - 
is
 >used when whitespace at the beginning of the subsequent line is
 >significant; + just ignores any whitespace at the beginning of 
the
 >subsequent line.

This one you should handle in the lexer; you can swallow up the 
intervening EOL and whitespace to hide it from the parser that 
way, so it just sees a single continuous statement.



More information about the antlr-interest mailing list