[antlr-interest] Lexer question (Update on Input from Dominik)

Thu Jul 27 05:53:08 PDT 2006

I was looking at ms sql grammar by Tomasz Jastrzebski (on antlr.org) and
this is what I found:

protected
Integer :;

protected
Real :; 

Number
    :
      ( (Digit)+ ('.' | 'e') ) => (Digit)+ ( '.' (Digit)* (Exponent)? |
Exponent) { _ttype = Real; }
    | '.' { _ttype = DOT; } ( (Digit)+ (Exponent)? { _ttype = Real; } )?
    | (Digit)+ { _ttype = Integer; }
    | "0x" ('a'..'f' | Digit)* { _ttype = HexLiteral; } // "0x" is valid
hex literal
    ;

k is set to 2.

I haven't tested it so I don't know.
Furthermore, since I'm a newbie, I am not even sure what the two Integer
and Real rule is supposed to do either.  What does it do?  It's empty?

Jiho

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Dominik
Holenstein
Sent: Thursday, July 27, 2006 8:41 AM
To: dotnet fr; antlr-interest at antlr.org
Subject: Fwd: [antlr-interest] Lexer question (Update on Input from
Dominik)

Tomy,

When using the provided code in the previous e-mail ANTLR is showing
this warning:
lexical nondeterminism between rules DOUBLE and INTEGER upon
k==1:'0'..'9'

I have raised k up to 12 and this did not help. But interestingly, the
output in the console is correct.

Regards,
Dominik

---------- Forwarded message ----------
From: Dominik Holenstein <dholenstein at gmail.com>
Date: Jul 27, 2006 2:13 PM
Subject: Re: [antlr-interest] Lexer question
To: dotnet fr <dotnetfr at gmail.com>, antlr-interest at antlr.org

Tomy,
I have played around with your Lexer and Parser code and have found this
solution:

-------------------------------------------------------
ANTLR Grammar (file n.g):

class NumParser extends Parser;

startRule : (line)* ;

line      : (
                 d:DOUBLE
                 {System.out.println("Double: "+d.getText());}
                 |
                  i:INTEGER
                 {System.out.println("Integer: "+i.getText());}
                  )
                  ;

class NumLexer extends Lexer;

DOUBLE          : (('-')? ('0'..'9')+ '.' ('0'..'9')* )=> ('-')?
('0'..'9')+ '.' ('0'..'9')* | ('0'..'9')+ {$setType(INTEGER);} ;

INTEGER         : ('0'..'9')+ ;

SEMICOLON    : ';' { $setType(Token.SKIP); } ;

NEWLINE        : (('\r''\n')=> '\r''\n'
              | '\r'
              | '\n'
              ) { $setType(Token.SKIP); }
                       ;
WS                  : (' '|'\t') { $setType(Token.SKIP); } ;

---------------------------------------------------

The Java test code (Main.java):

import java.io.DataInputStream;
import java.io.FileInputStream;
import java.io.FileNotFoundException;
import java.io.FileWriter;
import java.io.IOException;

public class Main {
       public static void main (String[] args) {
               try {
                       // Make sure you change the path for your input
file
                       DataInputStream input = new DataInputStream(new
FileInputStream("E:\\ANTLR\\Examples\\Numbers\\input.txt"));
                       NumLexer lexer = new NumLexer(input);
               NumParser parser = new NumParser(lexer);
               try {
                       parser.startRule();
               } catch(Exception e) {}
       } catch (FileNotFoundException e) {
               System.out.println("Error: Cannot open file for
reading");
       }
       }
}

--------------------------------------------------------------
Data in the input file (input.txt):
10;
1500;
0.50;
35;
7.25;
3000;

---------------------------------------------------------------

I have added all files as attachments to this e-mail.

You can set k=1 because of the semantic predicate what makes the parser
a bit faster.
The System.out... messages are for testing purposes. I can see then in
the console the output of the parser. I am working with Eclipse 3.2 and
ANTLR Studio. I am not sure whether this is 'good' programming style but
it works ;-) . Inputs, feedbacks and better solutions are welcomed.

I hope it helps!

Regards,
Dominik

On 7/27/06, dotnet fr <dotnetfr at gmail.com> wrote:
> Hi Dominik,
>
> I'm happy to meet a person like me!
> I'm a beginner with antlr and codeworker too ;) I'm each minute I'm 
> learning new key. Antlr seems very powerful yeah.
> My project is to create first a class generator, structure generator 
> and in final a structure (or class loader). It means I use parsing and

> generation code.
> What do you do with antlr, what is your interest in informatics ?
>
> Cheers
> Tomy
>
> 2006/7/27, Dominik Holenstein <dholenstein at gmail.com>:
> > Hi Tomy,
> > I don't know codeworker but will have a look at it.
> > ANTLR is very powerful and with v3 coming in fall it will get much
better.
> > I am a beginner with Java and ANTLR so everything is 'difficult' at 
> > the moment. But I am progressing and learning every day!
> > I will look at your issue this afternoon.
> >
> > Regards,
> > Dominik
> >
> >
> >
> > On 7/27/06, dotnet fr <dotnetfr at gmail.com> wrote:
> > > Hi Dominik,
> > >
> > > I have seen in the Predicated LL(k) Lexing in the ANTLR 
> > > documentation witch treats about this kind of problem. It works 
> > > but it's not the best solution I think ;) I thought that the antlr

> > > lexer try the first token and if it doesn't match, it go to the 
> > > second etc..
> > >
> > > My parser grammar :
> > >
> > > startRule
> > >        :
> > >                nbp_debug
> > >        ;
> > >
> > > protected
> > > debug    :
> > >        (
> > >                DATE
> > >        |       DOUBLE
> > >        |       INTEGER
> > >        |       SEMICOLON
> > >        )*
> > >        ;
> > >
> > > What do you think about Antlr ? I have to do the same project with

> > > codeworker and antlr. Antlr seems more difficult to manipulate.
> > >
> > > Cheers,
> > >
> > > Tomy
> > >
> > > 2006/7/27, Dominik Holenstein <dholenstein at gmail.com>:
> > > > Tomy,
> > > >
> > > > What is you grammar in the parser?
> > > > Thanks.
> > > >
> > > > Dominik
> > > >
> > > >
> > > > On 7/27/06, dotnet fr <dotnetfr at gmail.com> wrote:
> > > > > Hi Everyone,
> > > > >
> > > > > I have a problem about the antlr lexer.
> > > > >
> > > > > In input I have :
> > > > > 10;
> > > > > 1500;
> > > > > 0.50;
> > > > >
> > > > > In my lexer I have :
> > > > > DOUBLE          : ('-')? ('0'..'9')+ '.' ('0'..'9')* ;
> > > > > INTEGER         : ('0'..'9')+ ;
> > > > > SEMICOLON       : ';' ;
> > > > >
> > > > > In my parser and lexer I have k=5.
> > > > >
> > > > > But I've got an error, the lexer seems to get his TOKENS in
the order.
> > > > > It gets the 10 like a double (the first in the list) and send 
> > > > > an exception
> > > > > (exception: expecting ''.'', found '';'')
> > > > >
> > > > > I want the lexer to skip and try the next TOKEN and send an 
> > > > > exception only if there isn't any solutions.
> > > > >
> > > > > Is anyone got this problem too ?
> > > > >
> > > > > Cheers,
> > > > >
> > > > > Tomy
> > > > >
> > > >
> > >
> > >
> > > --
> > > dotnet
> > >
> >
>
>
> --
> dotnet
>