[antlr-interest] Novice Question - Token for all characters from a given point to End of Line

Brisard, Fred D Fred.Brisard at ca.com
Tue Aug 5 14:24:55 PDT 2008


Hi Gavin,

Thanks for your suggestion.

I am currently collecting each "word" (separated by WS) for the length
of the line and identifying them separately.  I really just need to get
all the words as a single token - at least, that's what I think I want
to do.

I should describe more of what I'm doing.  I'm creating a parser that
parses a "language" and then provides the ability to display the
information in a form-based view for editing.  I will then let the user
modify any of the values as well as create or remove statements for the
language or any of the operands.  I then plan to rewrite the modified
program source while maintaining as much of the original formatting as
possible.  ANTLR's features solve many of these requirements but it also
needs some coaxing (at least I think it does) for some of my needs.

I have some issues working with ANTLR (probably mostly due to my lack of
skills).

The language is keyword oriented - no reserve words
The language is case-insensitive.
The language was originally based on the IBM TSO command processor
syntax where the individual commands have the following syntax --

Command <positional parameters> <keyword parameters>

The positional parameters are specified for a given command.  If the
command has 4 positional parameters then the user can specify 0-4
parameters.  

The keyword parameters are keyword( keyvalue ).  The key values can be
multiple parameters.  Any number of keyword parameters can be specified.

In addition, the command name and keyword values have implied
abbreviations.  So if you have 2 keywords - before and after, then b and
a are sufficient to discriminate between them.

Finally there is the concept of continuation - a statement can be
continued by the last character on a line being a + or -.  The - is used
when whitespace at the beginning of the subsequent line is significant;
+ just ignores any whitespace at the beginning of the subsequent line.

I've received several suggestions for handling some of the above issues.
I still haven't handled the abbreviation problem nor this concept of
treating the text remaining on a line as a single token.

I just have this uneasy feeling that I'm not really using the
capabilities of ANTLR to solve my problem in an elegant way.  I think
I'm just "hacking" a solution.


Thanks again, Fred


-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Gavin Lambert
Sent: Tuesday, August 05, 2008 5:00 PM
To: Brisard, Fred D; antlr-interest at antlr.org
Subject: Re: [antlr-interest] Novice Question - Token for all characters
from a given point to End of Line

At 06:03 6/08/2008, Brisard, Fred D wrote:

>I have a keyword style grammar and have the need 
>to accept all the characters until the end of 
>line to be accepted as a single token.
>
>For example, I have a statement that is of the following type
>
>Command multiple arguments (EOL)
>
>Where Command can be a command name and the 
>multiple arguments are one or more 
>arguments.  There can be from 1 to many 
>arguments - each argument does not have a fixed 
>content - it may be an integer, a string, a 
>quoted string.  The characters in the string can be most anything.
>
>I was looking for something similar to the 
>multiple line comment technique using the 
>greedy=false option.  Collect all the characters 
>following the Command into a single token.

Do you really *really* need everything to be one 
token?  I would have thought it'd be more useful 
to lex each argument as a separate token (making 
the best guess as to content type as is 
possible), and then sort it out into command vs. 
arguments at the parser level.  (Of course, to do 
that you'll need to make sure the EOL is not 
hidden from the parser.)




More information about the antlr-interest mailing list