[antlr-interest] Re: Perl -> Antlr
lgcraymer
lgc at mail1.jpl.nasa.gov
Tue Dec 2 13:47:21 PST 2003
Simon--
Think of doing more in the parser using the lexer to tokenize the arguments on the line using whitespace as separators. Something
like
des
:
"DES" (NUMBER | TEXT)* NEWLINE
;
Then you've already sorted out the arguments and can avoid re-lexing the line contents. That will give you more processing flexibility.
--Loring
--- In antlr-interest at yahoogroups.com, "Simon Taylor" <set at n...> wrote:
> I have a Perl script that "parses" a file into a memory structure.
> I'm learning Java and wanted to convert the functionality into Java.
> Antlr seems to be the tool for the job. I'm not sure I understand the best
> way to use it though.
>
> If the file looks like:-
>
> DES MAIL1
> TN 001 0 02 00
> TYPE SL1
> CDEN DD
> CUST 0
> KLS 1
> FDN
> TGAR 0
> LDN NO
> NCOS 4
> SGRP 0
> RNPG 0
> SCI 0
> SSU
> XLST
> SCPW
> SFLT NO
> CAC_MFC 0
> CLS CTD FBD WTA LPR MTD FND HTD NDD
> MWA RMMD SMWD AAD IMA XHD IRD NID OLD VCE
> POD DSX VMA CMSD CCSD SWD LND CNDD
> CFTD SFD MRD DDV CNID MSID BFED RCBD
> ICDD CDMD MCTD CLBD AUTU
> GPUD DPUD DNDD CFXD ARHD CNTD CLTD ASCD
> ABDD CFHD FICD NAID
> UDI RCC HBTD AHD IPND DDGA NAMA MIND PRSD NRWD NRCD NROD
> EXR0
> USRD ULAD RTDD RBDD RBHD PGND OCBD FLXD FTTC DNDY DNO3
> RCO 0
>
> What is the best way to configure the parser and lexer.
>
> I have tried the following grammar on a simple file and it seems to work.
> Most of the recognising is done based on the literal word at the beginning
> of the line. Is this the best way to approach this problem?:-
>
> class TNBParser extends Parser;
> options { k = 4;
> defaultErrorHandler = true;
> }
> //A tnbfile consists of one or more tnbrecords
> tnbfile
> :
> (record)+
> EOF
> ;
>
> record
> :
> (des)*
> (tn)?
> date
> ;
>
> des : (d:DES) {System.out.println("DES: " + d.getText());};
> tn : (t:TN){System.out.println("TN: " + t.getText());};
> date :(da:DATE){System.out.println("DATE: " + da.getText());};
> //anything:(az:ALPHA|SP)*{System.out.println("ANYTHING: " + az.getText());};
>
>
> class TNBLexer extends Lexer;
> options { k = 3;
> defaultErrorHandler = true;
> //charVocabulary = '\3'..'\377';
> //testLiterals=true;
> //caseSensitive = false;
> //caseSensitiveLiterals = false;
> charVocabulary='\u0000'..'\uFFFE';
> filter=IGNORE;
> }
> // TNB is mostly uppercase but we need lowercase in here because of the CPND
> // How do we define it so we can break it up in key value pairs in the
> parser
>
>
> TN:"TN"(SP)+(INT)+(SP)*(INT)*(SP)*(INT)*(SP)*(INT)*;
> DES:"DES"(SP)+(ALPHA|INT|PUNCTUATION|SP)*;
> DATE:"DATE"(SP)+(INT)(INT)'/'(INT)(INT)'/'(INT)(INT)(INT)(INT);
>
> //protected
> //Letter
> // : 'A'..'Z' | '_' | '#' | '@' | '\u0080'..'\ufffe'
> // ;
> //
> //protected
> //Digit
> // : '0'..'9'
> // ;
>
> protected IGNORE
> : ( "\r\n" | '\r' | '\n' )
> {newline(); System.out.println("");}
> | c:. {}
> //System.out.print(c);
> ;
>
> protected ALPHA : ('a'..'z'|'A'..'Z');
> protected INT :('0'..'9');
> protected PUNCTUATION
> :('_'|'-'|'+'|'/'|';'|'#'|'*'|'\\'|':'|','|'\''|'.'|'?');
> //NEWLINE: ((('\r' '\n')+ |('\n')+ | ('\r')) { newline(); });
>
> SP: ' ';
> WS: (
> '\t'
> |'\r' '\n' { newline(); }
> |'\n' { newline(); }
> |'\r'
> )
> {$setType(Token.SKIP);};
>
>
>
>
> Simon Taylor
> Managed Services Technology Consultant
> Nortel Networks
> p - 01279 404289 (ESN 742 4289)
> m - 07740 533743 (ESN 748 3743)
> e - set at n...
>
> "I code therefore I am"
Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list