[antlr-interest] Perl -> Antlr
Simon Taylor
set at nortelnetworks.com
Tue Dec 2 12:04:09 PST 2003
I have a Perl script that "parses" a file into a memory structure.
I'm learning Java and wanted to convert the functionality into Java.
Antlr seems to be the tool for the job. I'm not sure I understand the best
way to use it though.
If the file looks like:-
DES MAIL1
TN 001 0 02 00
TYPE SL1
CDEN DD
CUST 0
KLS 1
FDN
TGAR 0
LDN NO
NCOS 4
SGRP 0
RNPG 0
SCI 0
SSU
XLST
SCPW
SFLT NO
CAC_MFC 0
CLS CTD FBD WTA LPR MTD FND HTD NDD
MWA RMMD SMWD AAD IMA XHD IRD NID OLD VCE
POD DSX VMA CMSD CCSD SWD LND CNDD
CFTD SFD MRD DDV CNID MSID BFED RCBD
ICDD CDMD MCTD CLBD AUTU
GPUD DPUD DNDD CFXD ARHD CNTD CLTD ASCD
ABDD CFHD FICD NAID
UDI RCC HBTD AHD IPND DDGA NAMA MIND PRSD NRWD NRCD NROD
EXR0
USRD ULAD RTDD RBDD RBHD PGND OCBD FLXD FTTC DNDY DNO3
RCO 0
What is the best way to configure the parser and lexer.
I have tried the following grammar on a simple file and it seems to work.
Most of the recognising is done based on the literal word at the beginning
of the line. Is this the best way to approach this problem?:-
class TNBParser extends Parser;
options { k = 4;
defaultErrorHandler = true;
}
//A tnbfile consists of one or more tnbrecords
tnbfile
:
(record)+
EOF
;
record
:
(des)*
(tn)?
date
;
des : (d:DES) {System.out.println("DES: " + d.getText());};
tn : (t:TN){System.out.println("TN: " + t.getText());};
date :(da:DATE){System.out.println("DATE: " + da.getText());};
//anything:(az:ALPHA|SP)*{System.out.println("ANYTHING: " + az.getText());};
class TNBLexer extends Lexer;
options { k = 3;
defaultErrorHandler = true;
//charVocabulary = '\3'..'\377';
//testLiterals=true;
//caseSensitive = false;
//caseSensitiveLiterals = false;
charVocabulary='\u0000'..'\uFFFE';
filter=IGNORE;
}
// TNB is mostly uppercase but we need lowercase in here because of the CPND
// How do we define it so we can break it up in key value pairs in the
parser
TN:"TN"(SP)+(INT)+(SP)*(INT)*(SP)*(INT)*(SP)*(INT)*;
DES:"DES"(SP)+(ALPHA|INT|PUNCTUATION|SP)*;
DATE:"DATE"(SP)+(INT)(INT)'/'(INT)(INT)'/'(INT)(INT)(INT)(INT);
//protected
//Letter
// : 'A'..'Z' | '_' | '#' | '@' | '\u0080'..'\ufffe'
// ;
//
//protected
//Digit
// : '0'..'9'
// ;
protected IGNORE
: ( "\r\n" | '\r' | '\n' )
{newline(); System.out.println("");}
| c:. {}
//System.out.print(c);
;
protected ALPHA : ('a'..'z'|'A'..'Z');
protected INT :('0'..'9');
protected PUNCTUATION
:('_'|'-'|'+'|'/'|';'|'#'|'*'|'\\'|':'|','|'\''|'.'|'?');
//NEWLINE: ((('\r' '\n')+ |('\n')+ | ('\r')) { newline(); });
SP: ' ';
WS: (
'\t'
|'\r' '\n' { newline(); }
|'\n' { newline(); }
|'\r'
)
{$setType(Token.SKIP);};
Simon Taylor
Managed Services Technology Consultant
Nortel Networks
p - 01279 404289 (ESN 742 4289)
m - 07740 533743 (ESN 748 3743)
e - set at nortelnetworks.com
"I code therefore I am"
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20031202/8d01d7ee/attachment.html
More information about the antlr-interest
mailing list