[antlr-interest] Lexer question

Thu Jul 27 07:57:25 PDT 2006

Tomy,

I would try something like this:

In the Lexer:

INTEGER: ('0'..'9')+;

In the parser:

integer: INTEGER;
double: INTEGER '.'  (INTEGER)?;

Or to be a bit fancier, try this:

integer	:  ('-')? INTEGER;
double	:  ('-')? INTEGER '.'  (INTEGER)?
		|  ('-')? '.' INTEGER
		;

The advantage of this approach is that it only requires 1 level of  
lookahead (k=1) and no semantic rules, so the lexer can do its work  
in a single pass over the input characters.

More generally, if multiple rules begin with a non-trivial  
overlapping definition, you are better off to create a separate rule  
to recognize just the overlapping part, and rewriting the original  
rules to begin with the overlap rule.  In this case, the overlapping  
part was the INTEGER definition.

Greg

On Jul 27, 2006, at 6:05 AM, dotnet fr wrote:

> Dominik,
> Thank you for your solution, it works very well.
> I have another one, it looks like the same ;)
>
> In the lexer -------------------
> INTORDOUBLE
> 	: (INTEGER '.') => DOUBLE	{ $setType(DOUBLE); }
> 	| INTEGER			{ $setType(INTEGER); }
> 	;
>
> protected
> DOUBLE		: ('-')? ('0'..'9')+ '.' ('0'..'9')* ;
> protected
> INTEGER		: ('0'..'9')+ ;
>
> and in the parser ---------------------------
>
> startRule : (line)* ;
>
> line : DOUBLE | INTEGER;
>
> Regards,
> Tomy
>
> 2006/7/27, Dominik Holenstein <dholenstein at gmail.com>:
>> Tomy,
>> I have played around with your Lexer and Parser code and have found
>> this solution:
>>
>>
>> -------------------------------------------------------
>> ANTLR Grammar (file n.g):
>>
>> class NumParser extends Parser;
>>
>> startRule : (line)* ;
>>
>> line      : (
>>                  d:DOUBLE
>>                  {System.out.println("Double: "+d.getText());}
>>                  |
>>                   i:INTEGER
>>                  {System.out.println("Integer: "+i.getText());}
>>                   )
>>                   ;
>>
>>
>> class NumLexer extends Lexer;
>>
>> DOUBLE          : (('-')? ('0'..'9')+ '.' ('0'..'9')* )=> ('-')?
>> ('0'..'9')+ '.' ('0'..'9')* | ('0'..'9')+ {$setType(INTEGER);} ;
>>
>> INTEGER         : ('0'..'9')+ ;
>>
>> SEMICOLON    : ';' { $setType(Token.SKIP); } ;
>>
>> NEWLINE        : (('\r''\n')=> '\r''\n'
>>               | '\r'
>>               | '\n'
>>               ) { $setType(Token.SKIP); }
>>                        ;
>> WS                  : (' '|'\t') { $setType(Token.SKIP); } ;
>>
>> ---------------------------------------------------
>>
>> The Java test code (Main.java):
>>
>> import java.io.DataInputStream;
>> import java.io.FileInputStream;
>> import java.io.FileNotFoundException;
>> import java.io.FileWriter;
>> import java.io.IOException;
>>
>> public class Main {
>>        public static void main (String[] args) {
>>                try {
>>                        // Make sure you change the path for your  
>> input file
>>                        DataInputStream input = new DataInputStream 
>> (new
>> FileInputStream("E:\\ANTLR\\Examples\\Numbers\\input.txt"));
>>                        NumLexer lexer = new NumLexer(input);
>>                NumParser parser = new NumParser(lexer);
>>                try {
>>                        parser.startRule();
>>                } catch(Exception e) {}
>>        } catch (FileNotFoundException e) {
>>                System.out.println("Error: Cannot open file for  
>> reading");
>>        }
>>        }
>> }
>>
>> --------------------------------------------------------------
>> Data in the input file (input.txt):
>> 10;
>> 1500;
>> 0.50;
>> 35;
>> 7.25;
>> 3000;
>>
>> ---------------------------------------------------------------
>>
>> I have added all files as attachments to this e-mail.
>>
>> You can set k=1 because of the semantic predicate what makes the
>> parser a bit faster.
>> The System.out... messages are for testing purposes. I can see  
>> then in
>> the console the output of the parser. I am working with Eclipse 3.2
>> and ANTLR Studio. I am not sure whether this is 'good' programming
>> style but it works ;-) . Inputs, feedbacks and better solutions are
>> welcomed.
>>
>> I hope it helps!
>>
>> Regards,
>> Dominik
>>
>>
>>
>>
>>
>> On 7/27/06, dotnet fr <dotnetfr at gmail.com> wrote:
>> > Hi Dominik,
>> >
>> > I'm happy to meet a person like me!
>> > I'm a beginner with antlr and codeworker too ;)
>> > I'm each minute I'm learning new key. Antlr seems very powerful  
>> yeah.
>> > My project is to create first a class generator, structure  
>> generator
>> > and in final a structure (or class loader). It means I use  
>> parsing and
>> > generation code.
>> > What do you do with antlr, what is your interest in informatics ?
>> >
>> > Cheers
>> > Tomy
>> >
>> > 2006/7/27, Dominik Holenstein <dholenstein at gmail.com>:
>> > > Hi Tomy,
>> > > I don't know codeworker but will have a look at it.
>> > > ANTLR is very powerful and with v3 coming in fall it will get  
>> much better.
>> > > I am a beginner with Java and ANTLR so everything is  
>> 'difficult' at
>> > > the moment. But I am progressing and learning every day!
>> > > I will look at your issue this afternoon.
>> > >
>> > > Regards,
>> > > Dominik
>> > >
>> > >
>> > >
>> > > On 7/27/06, dotnet fr <dotnetfr at gmail.com> wrote:
>> > > > Hi Dominik,
>> > > >
>> > > > I have seen in the Predicated LL(k) Lexing in the ANTLR  
>> documentation
>> > > > witch treats about this kind of problem. It works but it's  
>> not the
>> > > > best solution I think ;)
>> > > > I thought that the antlr lexer try the first token and if it  
>> doesn't
>> > > > match, it go to
>> > > > the second etc..
>> > > >
>> > > > My parser grammar :
>> > > >
>> > > > startRule
>> > > >        :
>> > > >                nbp_debug
>> > > >        ;
>> > > >
>> > > > protected
>> > > > debug    :
>> > > >        (
>> > > >                DATE
>> > > >        |       DOUBLE
>> > > >        |       INTEGER
>> > > >        |       SEMICOLON
>> > > >        )*
>> > > >        ;
>> > > >
>> > > > What do you think about Antlr ? I have to do the same  
>> project with
>> > > > codeworker and antlr. Antlr seems more difficult to manipulate.
>> > > >
>> > > > Cheers,
>> > > >
>> > > > Tomy
>> > > >
>> > > > 2006/7/27, Dominik Holenstein <dholenstein at gmail.com>:
>> > > > > Tomy,
>> > > > >
>> > > > > What is you grammar in the parser?
>> > > > > Thanks.
>> > > > >
>> > > > > Dominik
>> > > > >
>> > > > >
>> > > > > On 7/27/06, dotnet fr <dotnetfr at gmail.com> wrote:
>> > > > > > Hi Everyone,
>> > > > > >
>> > > > > > I have a problem about the antlr lexer.
>> > > > > >
>> > > > > > In input I have :
>> > > > > > 10;
>> > > > > > 1500;
>> > > > > > 0.50;
>> > > > > >
>> > > > > > In my lexer I have :
>> > > > > > DOUBLE          : ('-')? ('0'..'9')+ '.' ('0'..'9')* ;
>> > > > > > INTEGER         : ('0'..'9')+ ;
>> > > > > > SEMICOLON       : ';' ;
>> > > > > >
>> > > > > > In my parser and lexer I have k=5.
>> > > > > >
>> > > > > > But I've got an error, the lexer seems to get his TOKENS  
>> in the order.
>> > > > > > It gets the 10 like a double (the first in the list) and  
>> send an
>> > > > > > exception
>> > > > > > (exception: expecting ''.'', found '';'')
>> > > > > >
>> > > > > > I want the lexer to skip and try the next TOKEN and send  
>> an exception
>> > > > > > only if there isn't any solutions.
>> > > > > >
>> > > > > > Is anyone got this problem too ?
>> > > > > >
>> > > > > > Cheers,
>> > > > > >
>> > > > > > Tomy
>> > > > > >
>> > > > >
>> > > >
>> > > >
>> > > > --
>> > > > dotnet
>> > > >
>> > >
>> >
>> >
>> > --
>> > dotnet
>> >
>>
>>
>>
>
>
> -- 
> dotnet
>