[antlr-interest] Guidance Required

Thu Jul 3 10:42:33 PDT 2003

I'm no expert, but it looks to me like you have
combined the work of your parser and lexer.  I'm just
guessing here, but something like this might be what
you want:

//parser rules

file //assuming all are optional
  : (des)? (tn)? (type)? (cden)?
    (cust)? (kls)? (fdn)? (tgar)?
    (ldn)? (ncos)? (sgrp)? (rnpg)?
    (sci)? (ssu)? (xlst)? (scpw)? (sflt)?
//don't know where the date rule goes
    EOF
  ;

protected tn : "TN" (NUMERIC)+ NEWLINE ;
protected des : "DES" anything NEWLINE ;
protected anything
  : (
     (ALPHA)+
     | (NUMERIC)+
     | PUNCTUATION
    )+ //I'm guessing...
  ;

protected date
  : "DATE"
    NUMERIC NUMERIC '/'
    NUMERIC NUMERIC '/'
    NUMERIC NUMERIC
  ;

//lexer rules

/* there are probably better ways to wrap
   single-character tokens into "word" tokens...
*/
WS:   ' ' | '\t' {$setType(Token.SKIP);} ;
ALPHA : ('a'..'z'|'A'..'Z');
NUMERIC :('0'..'9');
PUNCTUATION
  : '_' | '-' | '+' | '/' | ';' | '#'
  | '*' | '\\' | ':' | ',' | '\'' | '.' | '?'
  ;

NEWLINE
  : ('\r' '\n')+ | ('\n')+ | ('\r')+
    { newline(); })
  ;

-Matt

--- setuk_x <set at nortelnetworks.com> wrote:
> I am new Java and Antlr.
> I have written a basic parser in Perl before - but
> it is proving slow 
> and unwieldy and so I am looking to Antlr to fill
> the gap.
> I need to parse a text file which contains text in
> the format 
> (simplest form)
> DES  MAIL1 
> TN   001 0 02 00 
> TYPE SL1 
> CDEN DD
> CUST 0 
> KLS  1 
> FDN  
> TGAR 0 
> LDN  NO
> NCOS 4 
> SGRP 0 
> RNPG 0 
> SCI  0 
> SSU  
> XLST 
> SCPW 
> SFLT NO
> 
> I need to be able to classify each line a specific
> type so I can pass 
> these types to the parser and validate that what I
> have is a valid 
> record.
> 
> Is the best way to do this using Lexer tokens? Such
> as:-
> 
> class TNBLexer extends Lexer;
> options { k = 5;
> 		  defaultErrorHandler = true;
> 	    }
> // TNB is mostly uppercase but we need lowercase in
> here because of 
> the CPND
> 
> TN  : (("TN")+ (NUMERIC)+ NEWLINE);
> DES : (("DES") (ANYTHING)+);
> DATE: (("DATE")+ (NUMERIC NUMERIC '/'NUMERIC NUMERIC
> '/'NUMERIC 
> NUMERIC));
> WS:   ((' ')|('\t')){$setType(Token.SKIP);};
> 
> protected ANYTHING : ((ALPHA|NUMERIC|PUNCTUATION));
> protected ALPHA : ('a'..'z'|'A'..'Z');
> protected NUMERIC :('0'..'9');
> protected PUNCTUATION :('_'|'-
> '|'+'|'/'|';'|'#'|'*'|'\\'|':'|','|'\''|'.'|'?');
> protected NEWLINE: ((('\r' '\n')+ |('\n')+ | ('\r'))
> { newline(); });
> 
> Or am I completely on the wrong track.
> I am wading my way through the doc at the moment so
> any advice would 
> be helpful.
> 
> Thanks Simon
> 
> 
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to
> http://docs.yahoo.com/info/terms/ 
> 
> 

__________________________________
Do you Yahoo!?
SBC Yahoo! DSL - Now only $29.95 per month!
http://sbc.yahoo.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/