[antlr-interest] Re: Reading contents of file using Antlr

Thu Jan 23 03:03:12 PST 2003

Hi Sharon,

I'll touch on a few areas where ANTLR can make your life easier. 
Completing the grammar should then be a much easier (and rewarding) 
experience.

[Caveat: I haven't run your example with ANTLR. Just looked at it on 
the site. I may have missed some issues.]

1. Keywords - See changes ###[1] below.

ANTLR provides the tokens {...} options to let you specify the 
keywords in your language. The assumption is that keywords can't also 
be identifiers.
NOTE: You can override that assumption in your Parser grammar for 
individual keywords as shown in:
http://groups.yahoo.com/group/antlr-interest/message/6503

2. NUMERIC rule - 

Are you sure this is what NUMERIC values look like in your system?. 
This rule will accept the following: 
-...,-
,,9-0,..
,,,
... etc.
Studying a few samples of what a NUMERIC looks like (and musn't look 
like) would help make it clearer. I did try to clean it up though be 
aware that RECORD and NUMERIC properly belongs in the Parser not the 
Lexer - where they would be re-named to start with a lower-case 
letter of course.

// Is this what you really meant?
protected LETTER  : ( 'a'..'z' | 'A'..'Z' ) ; 
protected DIGIT   : ( '0'..'9' ) ; 
protected NUMBER  : ('-')? ((DIGIT)* '.')? (DIGIT)+ ;
ID      : LETTER ( DIGIT | LETTER )* ;
NUMERIC : ( NUMBER ( ',' NUMBER )*; 
RECORD  : (ID | NUMERIC)(!~('\r'|'\n'|':'))+ ; 

// Your original
RECORD  : ('a'..'z' | 'A'..'Z'| NUMERIC)(!~('\r'|'\n'|':'))+ ; 
NUMERIC : ('0'..'9'|','|'.'|'-')+; 

3. RECORD rule

What does the trailing pattern '(!~('\r'|'\n'|':'))+' do?. What elase 
is part of a record that you are trying to match? Whitespace and 
SEMICOLON are already taken care of.

Anyways, good luck and be sure to give Ter's Getting started guide a 
good workover ;-)

Micheal

> 
> Hi,
> 
> Thanks for th help. Below is the code for my grammer file.
> /*******************************************************************
****/
> class CSVParser extends Parser;
> options { k=4; }
> {
>  LsystemsString ls = new LsystemsString();
>  
>  public LsystemsString getLsystemsString(){
>   return ls;
>  }
> }
> file   : ( line (NEWLINE line)*(NEWLINE)? EOF)
>        {System.out.println("file matched");}
>        ;
> line   : ((record)+ )
>        ;
> record : ((r:KEYWORD) (sc: SEMICOLON)? (n:RECORD)*  (COMMENT)?)
>        {
>     System.out.println("attribute = " + r.getText());
>         System.out.println("value = "+n.getText());
>     ls.addNext(r.getText(),n.getText());
>     System.out.println("LS size: "+ls.getArrayListSize());
>     }
>     ;
>     
> class CSVLexer extends Lexer;
> options { 
>  charVocabulary='\3'..'\377'; 
>  k = 4;
> }

// change ###[1] - use tokens for keywords
tokens
{
 ANGLE      = "angle";
 FACTOR     = "factor";
 INITIAL    = "initial";
 .........
 .........
 .........
 Z          = "z";
 ELASTICITY = "elasticity";
 INCREMENT  = "increment";
 RENDER     = "render";
 MODE       = "mode";

}

> RECORD  : ('a'..'z' | 'A'..'Z'| NUMERIC)(!~('\r'|'\n'|':'))+ ;
> NUMERIC : ('0'..'9'|','|'.'|'-')+;
> SEMICOLON : ':';
> BRACKET : ('(' | ')');
> COMMENT : "/*" (options {greedy=false;} :.)* "*/" ;
> NEWLINE : ('\r''\n')=> '\r''\n' //DOS
>         | '\r'                  //MAC
>         | '\n'                  //UNIX
>         { newline(); }
>         ;
>     
> WS      : (' '|'\t') { $setType(Token.SKIP); } ;
> /*******************************************************************
*****************/
>  mzukowski at y... wrote:Lexical nondeterminism means you have two 
lexical rules that are in
> conflict, meaning they have the same prefix. Post a small but 
complete
> example which has the error message and we'll be able to help you. 
> 
> Monty
> 
> 
> -----Original Message-----
> From: Sharon Li [mailto:hushlee83 at y...]
> Sent: Tuesday, January 21, 2003 11:43 PM
> To: Antlr Interest Group
> Subject: [antlr-interest] Reading contents of file using Antlr
> 
> 
> Hi, 
> I'm a Java programmer and relatively new to Antlr. I need to write 
Antlr
> code to read in a text file and extract only the necessary 
information. How
> can I go about doing that? An example of the contents of the file 
might look
> like that : 
> angle focus : 0.0005
> color : blue
> line width : 12
> I often get the error msg:
> warning : lexical nondeterminism upon ...
> Also when do we use the TreeParser and what is the different 
between a
> Parser and a TreeParser? When do we define tokens and what is it 
for ? Pls
> help! Thanks very much.
> Yahoo! Travel
> - Get the latest travel deals in town! 
> Your use of Yahoo! Groups is subject to the Yahoo! Terms of 
Service. 
> 
> 
> 
> Your use of Yahoo! Groups is subject to 
http://docs.yahoo.com/info/terms/ 
> 
> 
>  Yahoo! Travel
> - Get the latest travel deals in town!

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/