[antlr-interest] Lexer rule for single quote

Tue Jul 24 10:40:09 PDT 2007

In ANTLR v 3, single quotes are used for character literals.  I have my
entire lexer working correctly except for the TIC.  I have the rule:

TIC    :'\''; which never gets discovered.  What I'm trying to recognize
is a single TIC (apostrophe).

Is my lexer rule written correctly, or do I need a different escape
character?

Thank you,
-Sam

________________________________

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Barnes, Jeff
Sent: Tuesday, July 24, 2007 1:37 PM
To: Igor Murashkin; antlr-interest at antlr.org
Subject: Re: [antlr-interest] Troubles lexing a decimal,(from an antlr
beginner)

Sorry I didn't look at the last line of your post...

The problem is with the DECIMAL rule, most likely. Does this help?

DECIMAL    :    Digit+ DOT Digit+;

In 2.7 you would have to add a sem pred to disambiguate. I don't know
about the new antlr.

Regards,

Jeff

________________________________

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Barnes, Jeff
Sent: Tuesday, July 24, 2007 1:29 PM
To: Igor Murashkin; antlr-interest at antlr.org
Subject: Re: [antlr-interest] Troubles lexing a decimal,(from an antlr
beginner)

Does changing your IDENT rule to something like this fix it?

IDENT    :     ('a'..'z'|'A'..'Z'|'_')
('a'..'z'|'A'..'Z'|'0'..'9'|'_'|WS)*; 

________________________________

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Igor Murashkin
Sent: Tuesday, July 24, 2007 12:45 PM
To: antlr-interest at antlr.org
Subject: [antlr-interest] Troubles lexing a decimal, (from an antlr
beginner)

Hello,

Well let me just say, its my first time using ANTLR. I needed a C#
parser generator so using flex/bison as I have done before was simply
out of the question, and I figured learning an LL(k) parser should be a
nice variation to just using LR(k). 

Unfortunately before I can even get to the parsing, I need to fix my
lexing.. right now it doesn't work for matching decimals properly. Here
are the lexing rules in question:

===============

DOT        : '.'   ; 
INTEGER    :    Digit+;
DECIMAL    :    Digit+ '.' Digit+;
fragment Digit
    :    '0'..'9';
IDENT    :     ('a'..'z'|'A'..'Z'|'_')
('a'..'z'|'A'..'Z'|'0'..'9'|'_')*; 

NL    :    ('\r\n' // DOS/Windows
    |     '\r'  // Macintosh
          |     '\n') // Unix
          { $channel=HIDDEN; };

WS
      :     (' '
        |     '\t'
        |     '\f')
        { $channel=HIDDEN; };

===============

Unfortunately with simple output such as this it crashes with an
EarlyExitException:

===============
console.flushBuffer
general.holdMsec 1000
object 1.doSomeAction withThis
=============== 
The third line should produce "IDENT INTEGER DOT IDENT IDENT" but
instead it tries to match "1." as a DECIMAL and then once it sees the
"d" it fails and throws an EarlyExitException. 

I am completely unsure what is going on.. I tried to set k=2 in options
figuring that if it looked at the period AND the next character it would
get a ('.' , 'd') clearly that does not match the DECIMAL rule.. but
then I just got a bunch of warnings in my lexer grammar so I removed the
k=2 line altogether. Looking at the generated code though its always
calling LA(1) and maybe there should be a way to get it to call LA(2) ? 

Probably I am completely misunderstanding how the whole process of
lexing is working too. Looking at the generated code it is generating
some DFAs, which would imply some kind of regular language being at work
here? Or does it still use LL(k) parsing even for lexing? 

I'm going to try to get the book asap too, probably it explains some of
this...

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20070724/323f57f8/attachment.html