[antlr-interest] Re: Guidance Required

Thu Jul 3 11:51:34 PDT 2003

OK recoded with the following.
Given a file which starts "TN 000 00 00 00"
It errors not recognising character
When debugging the parser code the first comparison against a literal 
TN doesn't evalate to true - even though the first token is a literal 
TN.
Any ideas.

//Attempt to classify the TNB File into TNB Records
// Containing explicit tokens

class TNBParser extends Parser;
options { k = 3;
		  defaultErrorHandler = true;
	    }
	    //A tnbfile consists of one or more tnbrecords
tnbfile  :  (record)+  EOF;

record //A tnbrecord consists of this number of explicit values
	    :  ( (tn) (des)? (date))
	    ;

protected tn : "TN" (NUMERIC)+ NEWLINE ;
protected des : "DES" anything NEWLINE ;
protected anything :
      (
		(ALPHA)+
	  | (NUMERIC)+
	  | (PUNCTUATION)+
	  );
protected date
  : "DATE"
    NUMERIC NUMERIC FW_SLASH
    NUMERIC NUMERIC FW_SLASH
    NUMERIC NUMERIC
  ;

class TNBLexer extends Lexer;
options { k = 3;
		  defaultErrorHandler = true;
	    }
// TNB is mostly uppercase but we need lowercase in here because of 
the CPND

WS: '\t' {$setType(Token.SKIP);} ;

protected ALPHA : ('a'..'z'|'A'..'Z');
protected NUMERIC :('0'..'9');
protected PUNCTUATION :('_'|'-
'|'+'|FW_SLASH|';'|'#'|'*'|'\\'|':'|','|'\''|'.'|'?');
protected NEWLINE: ((('\r' '\n')+ |('\n')+ | ('\r')) { newline(); });
protected SPACE: ' ';
protected FW_SLASH: '/';

--- In antlr-interest at yahoogroups.com, Matt Benson <gudnabrsam at y...> 
wrote:
> I'm no expert, but it looks to me like you have
> combined the work of your parser and lexer.  I'm just
> guessing here, but something like this might be what
> you want:
> 
> //parser rules
> 
> file //assuming all are optional
>   : (des)? (tn)? (type)? (cden)?
>     (cust)? (kls)? (fdn)? (tgar)?
>     (ldn)? (ncos)? (sgrp)? (rnpg)?
>     (sci)? (ssu)? (xlst)? (scpw)? (sflt)?
> //don't know where the date rule goes
>     EOF
>   ;
> 
> protected tn : "TN" (NUMERIC)+ NEWLINE ;
> protected des : "DES" anything NEWLINE ;
> protected anything
>   : (
>      (ALPHA)+
>      | (NUMERIC)+
>      | PUNCTUATION
>     )+ //I'm guessing...
>   ;
> 
> protected date
>   : "DATE"
>     NUMERIC NUMERIC '/'
>     NUMERIC NUMERIC '/'
>     NUMERIC NUMERIC
>   ;
> 
> //lexer rules
> 
> /* there are probably better ways to wrap
>    single-character tokens into "word" tokens...
> */
> WS:   ' ' | '\t' {$setType(Token.SKIP);} ;
> ALPHA : ('a'..'z'|'A'..'Z');
> NUMERIC :('0'..'9');
> PUNCTUATION
>   : '_' | '-' | '+' | '/' | ';' | '#'
>   | '*' | '\\' | ':' | ',' | '\'' | '.' | '?'
>   ;
> 
> NEWLINE
>   : ('\r' '\n')+ | ('\n')+ | ('\r')+
>     { newline(); })
>   ;
> 
> 
> -Matt
> 
> 
> --- setuk_x <set at n...> wrote:
> > I am new Java and Antlr.
> > I have written a basic parser in Perl before - but
> > it is proving slow 
> > and unwieldy and so I am looking to Antlr to fill
> > the gap.
> > I need to parse a text file which contains text in
> > the format 
> > (simplest form)
> > DES  MAIL1 
> > TN   001 0 02 00 
> > TYPE SL1 
> > CDEN DD
> > CUST 0 
> > KLS  1 
> > FDN  
> > TGAR 0 
> > LDN  NO
> > NCOS 4 
> > SGRP 0 
> > RNPG 0 
> > SCI  0 
> > SSU  
> > XLST 
> > SCPW 
> > SFLT NO
> > 
> > I need to be able to classify each line a specific
> > type so I can pass 
> > these types to the parser and validate that what I
> > have is a valid 
> > record.
> > 
> > Is the best way to do this using Lexer tokens? Such
> > as:-
> > 
> > class TNBLexer extends Lexer;
> > options { k = 5;
> > 		  defaultErrorHandler = true;
> > 	    }
> > // TNB is mostly uppercase but we need lowercase in
> > here because of 
> > the CPND
> > 
> > TN  : (("TN")+ (NUMERIC)+ NEWLINE);
> > DES : (("DES") (ANYTHING)+);
> > DATE: (("DATE")+ (NUMERIC NUMERIC '/'NUMERIC NUMERIC
> > '/'NUMERIC 
> > NUMERIC));
> > WS:   ((' ')|('\t')){$setType(Token.SKIP);};
> > 
> > protected ANYTHING : ((ALPHA|NUMERIC|PUNCTUATION));
> > protected ALPHA : ('a'..'z'|'A'..'Z');
> > protected NUMERIC :('0'..'9');
> > protected PUNCTUATION :('_'|'-
> > '|'+'|'/'|';'|'#'|'*'|'\\'|':'|','|'\''|'.'|'?');
> > protected NEWLINE: ((('\r' '\n')+ |('\n')+ | ('\r'))
> > { newline(); });
> > 
> > Or am I completely on the wrong track.
> > I am wading my way through the doc at the moment so
> > any advice would 
> > be helpful.
> > 
> > Thanks Simon
> > 
> > 
> > 
> > 
> >  
> > 
> > Your use of Yahoo! Groups is subject to
> > http://docs.yahoo.com/info/terms/ 
> > 
> > 
> 
> 
> __________________________________
> Do you Yahoo!?
> SBC Yahoo! DSL - Now only $29.95 per month!
> http://sbc.yahoo.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/