[antlr-interest] Tokenizing question
akhailtash at gmail.com
Sun Feb 10 14:33:06 PST 2008
Each word is separated with whitespace. Again this is from a Verilog VCD
grammar that seems to have many ambiguities. I rewrote it to make it simple
to explain. Part of the original grammar looks like:
: declaration_command* enddefinitions simulation_command*
: '$timescale' NUMBER time_unit '$end'
: VALUE IDENTIFIER
: ('0' | '1' | 'x' | 'X' | 'z' | 'Z')
The problem is the scalar_value_change rule. VALUE and IDENTIFIER can be
A sample scalar_value_change is:
There are many ambiguities in this grammar even at the lexer level that is
giving me a hard time.
On Feb 10, 2008 4:44 PM, Mark Volkmann <r.mark.volkmann at gmail.com> wrote:
> On Feb 10, 2008 9:17 AM, Amal Khailtash <akhailtash at gmail.com> wrote:
> > In a language that whitespace is ignored, how can one tokenize and parse
> > constructs like this:
> > word : number identifier ;
> > where 'word' could look like:
> > 10 abc or 10abc
> > In this case number and identifier could have no whitespace between them
> > have some.
> How can you tell where one "word" ends and the next begins?
> Is each "word" on its own line?
> R. Mark Volkmann
> Object Computing, Inc.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the antlr-interest