[antlr-interest] Re: Perl -> Antlr

lgcraymer lgc at mail1.jpl.nasa.gov
Tue Dec 2 13:47:21 PST 2003


Simon--

Think of doing more in the parser using the lexer to tokenize the arguments on the line using whitespace as separators.  Something 
like

des
    :
    "DES" (NUMBER | TEXT)* NEWLINE
    ;

Then you've already sorted out the arguments and can avoid re-lexing the line contents.  That will give you more processing flexibility.

--Loring


--- In antlr-interest at yahoogroups.com, "Simon Taylor" <set at n...> wrote:
> I have a Perl script that "parses" a file into a memory structure.
> I'm learning Java and wanted to convert the functionality into Java.
> Antlr seems to be the tool for the job. I'm not sure I understand the best
> way to use it though.
> 
> If the file looks like:-
> 
> DES  MAIL1 
> TN   001 0 02 00 
> TYPE SL1 
> CDEN DD
> CUST 0 
> KLS  1 
> FDN  
> TGAR 0 
> LDN  NO
> NCOS 4 
> SGRP 0 
> RNPG 0 
> SCI  0 
> SSU  
> XLST 
> SCPW 
> SFLT NO
> CAC_MFC 0
> CLS  CTD FBD WTA LPR MTD FND HTD NDD 
>      MWA RMMD SMWD AAD IMA XHD IRD NID OLD VCE 
>      POD DSX VMA CMSD CCSD SWD LND CNDD
>      CFTD SFD MRD DDV CNID MSID BFED RCBD 
>      ICDD CDMD MCTD CLBD AUTU
>      GPUD DPUD DNDD CFXD ARHD CNTD CLTD ASCD 
>      ABDD CFHD FICD NAID 
>      UDI RCC HBTD AHD IPND  DDGA NAMA MIND PRSD NRWD NRCD NROD 
>      EXR0 
>      USRD ULAD RTDD RBDD RBHD PGND OCBD FLXD FTTC DNDY DNO3 
> RCO  0 
> 
> What is the best way to configure the parser and lexer.
> 
> I have tried the following grammar on a simple file and it seems to work.
> Most of the recognising is done based on the literal word at the beginning
> of the line. Is this the best way to approach this problem?:-
> 
> class TNBParser extends Parser;
> options { k = 4;
> 		  defaultErrorHandler = true;
> 	    }
> 	    //A tnbfile consists of one or more tnbrecords
> tnbfile
>   :
>    (record)+
>    EOF
>   ;
> 
> record
>   :
>    (des)*
>    (tn)?
>    date
>   ;
>   
> des : (d:DES) {System.out.println("DES: " + d.getText());};
> tn : (t:TN){System.out.println("TN: " + t.getText());};
> date :(da:DATE){System.out.println("DATE: " + da.getText());};	
> //anything:(az:ALPHA|SP)*{System.out.println("ANYTHING: " + az.getText());};
> 
> 
> class TNBLexer extends Lexer;
> options { k = 3;
> 		  defaultErrorHandler = true;
> 		  //charVocabulary = '\3'..'\377';
> 		  //testLiterals=true;
> 		  //caseSensitive = false;
> 	      //caseSensitiveLiterals = false;
>     	  charVocabulary='\u0000'..'\uFFFE';
>     	  filter=IGNORE;
> 	    }
> // TNB is mostly uppercase but we need lowercase in here because of the CPND
> // How do we define it so we can break it up in key value pairs in the
> parser
> 
> 
> TN:"TN"(SP)+(INT)+(SP)*(INT)*(SP)*(INT)*(SP)*(INT)*;
> DES:"DES"(SP)+(ALPHA|INT|PUNCTUATION|SP)*;
> DATE:"DATE"(SP)+(INT)(INT)'/'(INT)(INT)'/'(INT)(INT)(INT)(INT);
> 
> //protected
> //Letter
> //    : 'A'..'Z' | '_' | '#' | '@' | '\u0080'..'\ufffe'
> //    ;
> //
> //protected
> //Digit
> //    : '0'..'9'
> //    ;
> 
> protected IGNORE
>   : ( "\r\n" | '\r' | '\n' )
>     {newline(); System.out.println("");}
>   | c:. {}
>   //System.out.print(c);
>   ;
> 
> protected ALPHA : ('a'..'z'|'A'..'Z');
> protected INT :('0'..'9');
> protected PUNCTUATION
> :('_'|'-'|'+'|'/'|';'|'#'|'*'|'\\'|':'|','|'\''|'.'|'?');
> //NEWLINE: ((('\r' '\n')+ |('\n')+ | ('\r')) { newline(); });
> 
> SP: ' ';
> WS: (
>      '\t'
>     |'\r' '\n' { newline(); }
>     |'\n' { newline(); }
>     |'\r'
>     )
>     {$setType(Token.SKIP);};
> 
> 
> 
> 
> Simon Taylor
> Managed Services Technology Consultant
> Nortel Networks
> p -  01279 404289 (ESN 742 4289)
> m - 07740 533743 (ESN 748 3743)
> e -  set at n...
> 
> "I code therefore I am"


 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 




More information about the antlr-interest mailing list