[antlr-interest] Re: Guidance Required

Thu Jul 3 13:25:59 PDT 2003

"protected" in ANTLR is not the same as "protected" in Java or C++.  
In the lexer, a "protected" rule cannot be invoked by nextToken().  
I'd have to look at the ANTLR source to figure out what "protected" 
means for parser rules--I think that the protected keyword is passed 
through to the code generator in 2.7.2, but I'm not at all sure.

As a first step, none of your lexer rules should be "protected", and 
that is probably true of the parser rules as well.

--Loring

--- In antlr-interest at yahoogroups.com, "setuk_x" <set at n...> wrote:
> OK recoded with the following.
> Given a file which starts "TN 000 00 00 00"
> It errors not recognising character
> When debugging the parser code the first comparison against a 
literal 
> TN doesn't evalate to true - even though the first token is a 
literal 
> TN.
> Any ideas.
> 
> //Attempt to classify the TNB File into TNB Records
> // Containing explicit tokens
> 
> class TNBParser extends Parser;
> options { k = 3;
> 		  defaultErrorHandler = true;
> 	    }
> 	    //A tnbfile consists of one or more tnbrecords
> tnbfile  :  (record)+  EOF;
> 
> record //A tnbrecord consists of this number of explicit values
> 	    :  ( (tn) (des)? (date))
> 	    ;
> 
> protected tn : "TN" (NUMERIC)+ NEWLINE ;
> protected des : "DES" anything NEWLINE ;
> protected anything :
>       (
> 		(ALPHA)+
> 	  | (NUMERIC)+
> 	  | (PUNCTUATION)+
> 	  );
> protected date
>   : "DATE"
>     NUMERIC NUMERIC FW_SLASH
>     NUMERIC NUMERIC FW_SLASH
>     NUMERIC NUMERIC
>   ;
> 	
> class TNBLexer extends Lexer;
> options { k = 3;
> 		  defaultErrorHandler = true;
> 	    }
> // TNB is mostly uppercase but we need lowercase in here because of 
> the CPND
> 
> WS: '\t' {$setType(Token.SKIP);} ;
> 
> protected ALPHA : ('a'..'z'|'A'..'Z');
> protected NUMERIC :('0'..'9');
> protected PUNCTUATION :('_'|'-
> '|'+'|FW_SLASH|';'|'#'|'*'|'\\'|':'|','|'\''|'.'|'?');
> protected NEWLINE: ((('\r' '\n')+ |('\n')+ | ('\r')) { newline(); 
});
> protected SPACE: ' ';
> protected FW_SLASH: '/';
> 
> 
> 
> --- In antlr-interest at yahoogroups.com, Matt Benson <gudnabrsam at y...> 
> wrote:
> > I'm no expert, but it looks to me like you have
> > combined the work of your parser and lexer.  I'm just
> > guessing here, but something like this might be what
> > you want:
> > 
> > //parser rules
> > 
> > file //assuming all are optional
> >   : (des)? (tn)? (type)? (cden)?
> >     (cust)? (kls)? (fdn)? (tgar)?
> >     (ldn)? (ncos)? (sgrp)? (rnpg)?
> >     (sci)? (ssu)? (xlst)? (scpw)? (sflt)?
> > //don't know where the date rule goes
> >     EOF
> >   ;
> > 
> > protected tn : "TN" (NUMERIC)+ NEWLINE ;
> > protected des : "DES" anything NEWLINE ;
> > protected anything
> >   : (
> >      (ALPHA)+
> >      | (NUMERIC)+
> >      | PUNCTUATION
> >     )+ //I'm guessing...
> >   ;
> > 
> > protected date
> >   : "DATE"
> >     NUMERIC NUMERIC '/'
> >     NUMERIC NUMERIC '/'
> >     NUMERIC NUMERIC
> >   ;
> > 
> > //lexer rules
> > 
> > /* there are probably better ways to wrap
> >    single-character tokens into "word" tokens...
> > */
> > WS:   ' ' | '\t' {$setType(Token.SKIP);} ;
> > ALPHA : ('a'..'z'|'A'..'Z');
> > NUMERIC :('0'..'9');
> > PUNCTUATION
> >   : '_' | '-' | '+' | '/' | ';' | '#'
> >   | '*' | '\\' | ':' | ',' | '\'' | '.' | '?'
> >   ;
> > 
> > NEWLINE
> >   : ('\r' '\n')+ | ('\n')+ | ('\r')+
> >     { newline(); })
> >   ;
> > 
> > 
> > -Matt
> > 
> > 
> > --- setuk_x <set at n...> wrote:
> > > I am new Java and Antlr.
> > > I have written a basic parser in Perl before - but
> > > it is proving slow 
> > > and unwieldy and so I am looking to Antlr to fill
> > > the gap.
> > > I need to parse a text file which contains text in
> > > the format 
> > > (simplest form)
> > > DES  MAIL1 
> > > TN   001 0 02 00 
> > > TYPE SL1 
> > > CDEN DD
> > > CUST 0 
> > > KLS  1 
> > > FDN  
> > > TGAR 0 
> > > LDN  NO
> > > NCOS 4 
> > > SGRP 0 
> > > RNPG 0 
> > > SCI  0 
> > > SSU  
> > > XLST 
> > > SCPW 
> > > SFLT NO
> > > 
> > > I need to be able to classify each line a specific
> > > type so I can pass 
> > > these types to the parser and validate that what I
> > > have is a valid 
> > > record.
> > > 
> > > Is the best way to do this using Lexer tokens? Such
> > > as:-
> > > 
> > > class TNBLexer extends Lexer;
> > > options { k = 5;
> > > 		  defaultErrorHandler = true;
> > > 	    }
> > > // TNB is mostly uppercase but we need lowercase in
> > > here because of 
> > > the CPND
> > > 
> > > TN  : (("TN")+ (NUMERIC)+ NEWLINE);
> > > DES : (("DES") (ANYTHING)+);
> > > DATE: (("DATE")+ (NUMERIC NUMERIC '/'NUMERIC NUMERIC
> > > '/'NUMERIC 
> > > NUMERIC));
> > > WS:   ((' ')|('\t')){$setType(Token.SKIP);};
> > > 
> > > protected ANYTHING : ((ALPHA|NUMERIC|PUNCTUATION));
> > > protected ALPHA : ('a'..'z'|'A'..'Z');
> > > protected NUMERIC :('0'..'9');
> > > protected PUNCTUATION :('_'|'-
> > > '|'+'|'/'|';'|'#'|'*'|'\\'|':'|','|'\''|'.'|'?');
> > > protected NEWLINE: ((('\r' '\n')+ |('\n')+ | ('\r'))
> > > { newline(); });
> > > 
> > > Or am I completely on the wrong track.
> > > I am wading my way through the doc at the moment so
> > > any advice would 
> > > be helpful.
> > > 
> > > Thanks Simon
> > > 
> > > 
> > > 
> > > 
> > >  
> > > 
> > > Your use of Yahoo! Groups is subject to
> > > http://docs.yahoo.com/info/terms/ 
> > > 
> > > 
> > 
> > 
> > __________________________________
> > Do you Yahoo!?
> > SBC Yahoo! DSL - Now only $29.95 per month!
> > http://sbc.yahoo.com

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/