[antlr-interest] Tokenizing question
Amal Khailtash
akhailtash at gmail.com
Sun Feb 10 14:33:06 PST 2008
Each word is separated with whitespace. Again this is from a Verilog VCD
grammar that seems to have many ambiguities. I rewrote it to make it simple
to explain. Part of the original grammar looks like:
value_change_dump_definition
: declaration_command* enddefinitions simulation_command*
;
declaration_command
: <other_rules_here>
| timescale
;
timescale
: '$timescale' NUMBER time_unit '$end'
time_unit
: 's'
| 'ms'
| 'us'
| 'ns'
| 'ps'
| 'fs'
;
simulation_command
: <other_rules_here>
| value_change
;
value_change
: scalar_value_change
;
scalar_value_change
: VALUE IDENTIFIER
;
VALUE
: ('0' | '1' | 'x' | 'X' | 'z' | 'Z')
;
IDENTIFIER
: ('!'..'~')+
;
fragment
DIGIT
: '0'..'9'
;
NUMBER
: DIGIT+
;
The problem is the scalar_value_change rule. VALUE and IDENTIFIER can be
connected together.
A sample scalar_value_change is:
1aae
0aae
There are many ambiguities in this grammar even at the lexer level that is
giving me a hard time.
-- Amal
On Feb 10, 2008 4:44 PM, Mark Volkmann <r.mark.volkmann at gmail.com> wrote:
> On Feb 10, 2008 9:17 AM, Amal Khailtash <akhailtash at gmail.com> wrote:
> > In a language that whitespace is ignored, how can one tokenize and parse
> > constructs like this:
> >
> > word : number identifier ;
> >
> > where 'word' could look like:
> >
> > 10 abc or 10abc
> >
> > In this case number and identifier could have no whitespace between them
> or
> > have some.
>
> How can you tell where one "word" ends and the next begins?
> Is each "word" on its own line?
>
> --
> R. Mark Volkmann
> Object Computing, Inc.
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20080210/d2c45662/attachment.html
More information about the antlr-interest
mailing list