[antlr-interest] Perl -> Antlr

Simon Taylor set at nortelnetworks.com
Tue Dec 2 12:04:09 PST 2003


I have a Perl script that "parses" a file into a memory structure.
I'm learning Java and wanted to convert the functionality into Java.
Antlr seems to be the tool for the job. I'm not sure I understand the best
way to use it though.

If the file looks like:-

DES  MAIL1 
TN   001 0 02 00 
TYPE SL1 
CDEN DD
CUST 0 
KLS  1 
FDN  
TGAR 0 
LDN  NO
NCOS 4 
SGRP 0 
RNPG 0 
SCI  0 
SSU  
XLST 
SCPW 
SFLT NO
CAC_MFC 0
CLS  CTD FBD WTA LPR MTD FND HTD NDD 
     MWA RMMD SMWD AAD IMA XHD IRD NID OLD VCE 
     POD DSX VMA CMSD CCSD SWD LND CNDD
     CFTD SFD MRD DDV CNID MSID BFED RCBD 
     ICDD CDMD MCTD CLBD AUTU
     GPUD DPUD DNDD CFXD ARHD CNTD CLTD ASCD 
     ABDD CFHD FICD NAID 
     UDI RCC HBTD AHD IPND  DDGA NAMA MIND PRSD NRWD NRCD NROD 
     EXR0 
     USRD ULAD RTDD RBDD RBHD PGND OCBD FLXD FTTC DNDY DNO3 
RCO  0 

What is the best way to configure the parser and lexer.

I have tried the following grammar on a simple file and it seems to work.
Most of the recognising is done based on the literal word at the beginning
of the line. Is this the best way to approach this problem?:-

class TNBParser extends Parser;
options { k = 4;
		  defaultErrorHandler = true;
	    }
	    //A tnbfile consists of one or more tnbrecords
tnbfile
  :
   (record)+
   EOF
  ;

record
  :
   (des)*
   (tn)?
   date
  ;
  
des : (d:DES) {System.out.println("DES: " + d.getText());};
tn : (t:TN){System.out.println("TN: " + t.getText());};
date :(da:DATE){System.out.println("DATE: " + da.getText());};	
//anything:(az:ALPHA|SP)*{System.out.println("ANYTHING: " + az.getText());};


class TNBLexer extends Lexer;
options { k = 3;
		  defaultErrorHandler = true;
		  //charVocabulary = '\3'..'\377';
		  //testLiterals=true;
		  //caseSensitive = false;
	      //caseSensitiveLiterals = false;
    	  charVocabulary='\u0000'..'\uFFFE';
    	  filter=IGNORE;
	    }
// TNB is mostly uppercase but we need lowercase in here because of the CPND
// How do we define it so we can break it up in key value pairs in the
parser


TN:"TN"(SP)+(INT)+(SP)*(INT)*(SP)*(INT)*(SP)*(INT)*;
DES:"DES"(SP)+(ALPHA|INT|PUNCTUATION|SP)*;
DATE:"DATE"(SP)+(INT)(INT)'/'(INT)(INT)'/'(INT)(INT)(INT)(INT);

//protected
//Letter
//    : 'A'..'Z' | '_' | '#' | '@' | '\u0080'..'\ufffe'
//    ;
//
//protected
//Digit
//    : '0'..'9'
//    ;

protected IGNORE
  : ( "\r\n" | '\r' | '\n' )
    {newline(); System.out.println("");}
  | c:. {}
  //System.out.print(c);
  ;

protected ALPHA : ('a'..'z'|'A'..'Z');
protected INT :('0'..'9');
protected PUNCTUATION
:('_'|'-'|'+'|'/'|';'|'#'|'*'|'\\'|':'|','|'\''|'.'|'?');
//NEWLINE: ((('\r' '\n')+ |('\n')+ | ('\r')) { newline(); });

SP: ' ';
WS: (
     '\t'
    |'\r' '\n' { newline(); }
    |'\n' { newline(); }
    |'\r'
    )
    {$setType(Token.SKIP);};




Simon Taylor
Managed Services Technology Consultant
Nortel Networks
p -  01279 404289 (ESN 742 4289)
m - 07740 533743 (ESN 748 3743)
e -  set at nortelnetworks.com

"I code therefore I am"

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20031202/8d01d7ee/attachment.html


More information about the antlr-interest mailing list