[antlr-interest] ANTLR 2.7 Lexer problem

Wed Jun 13 06:32:35 PDT 2007

David,

The lexer is a part of one of the examples distributed with ANTLR v2.7.6 and
v2.7.7:
    examples/csharp/csharp_v1/CSharpLexer.g

I have attached a copy in any case.

The other examples uses a mixture of tokens {...} declarations and embedded
string literals (in the parser) to resolve the inherent non-determinism that
exists between keywords and ID/IDENT/IDENTIFIER rules. That actually amounts
to the same thing since string literals in the parser (but *not* the lexer)
are effectively [treated as] tokens {...} declarations under the hood.

The usage in CSharpLexer.g is simply more explicit. One could argue that it
is "better" since it supports the use of alternative/multiple parsers with
the lexer since it avoids some potential token vocabulary mismatch issue
that the implicit declaration idiom is susceptible to in this scenario.

With regards to your attempts, either "asm" only or, _all_ three character
sequences can be recognised as either AsmStatement or IDENT tokens
(depending on whether your IDENT rule accepts the '_' character). IOW, there
is a real non-determinism/ambiguity here that shutting off the warning
cannot resolve.

Just to be clear, if you really don't care about multiple parsers using the
lexer, you could just declare AsmStatement in your parser (rather than your
lexer) as shown below:

    asmStatement
         :    "__asm" 
        |    "_asm"
        |    "asm"
        ;

This will essentially direct the ANTLR tool to create the tokens {...}
declarations automagically under the hood.

Micheal

-----------------------
The best way to contact me is via the list/forum. My time is very limited. 

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Wigg, J D
Sent: 13 June 2007 14:01
To: antlr-interest at antlr.org
Subject: [antlr-interest] ANTLR 2.7 Lexer problem

Thanks, Michael for taking the time to respond to my problem.

However, I cannot find the lexer you are referring to on the antlr2 website.
There are five C# references, four of them are described as html files and
it turns out they are ordinary text files (not your fault, but seems pretty
silly) some don't even have a lexer and I can't find a  CSharpLexer.g
anywhere.

I'm wondering whether you discounted your first suggestions in your previous
message because you thought they might not work or whether you thought the
example in the CSharpLexer.g would be clearer. Can I take it that the
suggestions you made first are still worth pursuing?

What was wrong with either of these two attempts of mine to stop the
nondetermism warning with ID?

AsmStatements
 :
 (options{generateAmbigWarnings = false;}:
 ("__asm"|"_asm"|"asm") // etc.
 )
 ;

AsmStatements
 options{generateAmbigWarnings = false;}
 :
 ("__asm"|"_asm"|"asm") // etc.
 ;

Thanks,

David Wigg

Message: 6 (from Digest Vol 31 Issue 49)
Date: Tue, 12 Jun 2007 22:26:54 +0100
From: "Micheal J" <open.zone at virgin.net>
Subject: Re: [antlr-interest] ANTLR 2.7 Lexer problem
To: <antlr-interest at antlr.org>
Message-ID: <002b01c7ad38$671483c0$c704a8c0 at hercules>
Content-Type: text/plain; charset="us-ascii"

Ooops!. Scratch my comments about the 2.x examples using that idiom. I'm
apprently remembering something else. The csharp_v1 example (only available
for C# in v2.x) has a CSharpLexer.g file that you could look at instead.

Micheal

-----------------------
The best way to contact me is via the list/forum. My time is very limited.

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[ <mailto:antlr-interest-bounces at antlr.org>
mailto:antlr-interest-bounces at antlr.org] On Behalf Of Micheal J
Sent: 12 June 2007 22:01
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] ANTLR 2.7 Lexer problem

David,

Have you tried declaring your literals (i.e. "__asm", "_asm"...etc) using
the tokens {...} construct in ANTLR 2.x. You would then need to set the
testLiterals option to 'true' for your ID rule. This ensures that all match
ID strings are checked against your literals.

Many of the 2.x examples (e.g. the Java parser which is also available in
C++) can serve as an example of this idiom. They declare keywords in the
tokens {..} section and set the testLiterals option for their
ID/IDENT/IDENTIFIER rule.

Micheal

PS    Any plans to port your C++ grammar to v3?. The C target is maturing
and is seems very performant.

-----------------------
The best way to contact me is via the list/forum. My time is very limited.

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[ <mailto:antlr-interest-bounces at antlr.org>
mailto:antlr-interest-bounces at antlr.org] On Behalf Of Wigg, J D
Sent: 12 June 2007 15:16
To: antlr-interest at antlr.org
Subject: [antlr-interest] ANTLR 2.7 Lexer problem

Thanks to those who replied to my lexer problem about coping with whitespace
in the lexer described in Digest vol 31 issue 22 of 6 June. As suggested
parsing specifically for white space solved this problem.

However, I am still getting an ambiguity warning between ( __asm | _asm |
asm ) and ID. I don't want to accept this because it looks as though
something could be wrong (although it does in fact work).

I have tried the following options but they don't work (Cut down to minimum
for demonstration).

AsmStatements
 :
 (options{generateAmbigWarnings = false;}:
  ("__asm"|"_asm"|"asm") //etc.
 )
 ;

AsmStatements
 options{generateAmbigWarnings = false;}
 :
  ("__asm"|"_asm"|"asm") // etc.
 ;

Antlr accepts these statements but I still get lexical nondeterminsim
between rules AsmStatements and ID.

I can't find any similar example in the ANTLR2 documentation and I would be
grateful if someone could let me know what I doing wrong.

Thanks.

David Wigg

--
Copyright in this email and in any attachments belongs to London South Bank

--
Copyright in this email and in any attachments belongs to London South Bank
University. This email, and its attachments if any, may be confidential or
legally privileged and is intended to be seen only by the person to whom it
is addressed. If you are not the intended recipient, please note the
following: (1) You should take immediate action to notify the sender and
delete the original email and all copies from your computer systems; (2) You
should not read copy or use the contents of the email nor disclose it or its
existence to anyone else.

The views expressed herein are those of the author(s) and should not be
taken as those of London South Bank University, unless this is specifically
stated.

London South Bank University is a company limited by guarantee registered in
England and Wales. The following details apply to London South Bank
University: Company number - 00986761; Registered office and trading address
- 103 Borough Road London SE1 0AA; VAT number - 778 1116 17; Email address -
lsbuinfo at lsbu.ac.uk

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070613/508c1167/attachment-0001.html 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: CSharpLexer.g
Type: application/octet-stream
Size: 18560 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20070613/508c1167/attachment-0001.obj