[antlr-interest] Re: Why No Error?

Bogdan Mitu bogdan_mt at yahoo.com
Thu Aug 15 07:41:04 PDT 2002


>...I am still interested
> in knowing why no error was generated in the original
> post however.

The parser does not necessary consume all input. One start a parser by
calling one of its rules. In your case, file(). The parser call nextToken()
on Lexer until the rule is finished, then stops. In your case, file()
matches as many lines as it can, then the parser stops, although there
follows still another (incorrect) line. This is correct behavior.

If want to avoid this, put EOF at the end of the main rule. If you look to
the examples in ANTLR distribution - java.g, tinyC.g etc. - you will see
this.

So try:

file   : line (NEWLINE line)* (NEWLINE)? EOF
line   : (record)+ ;
record : (r:RECORD) (COMMA)? ;

And take care that actually it's (NEWLINE)+ instead of NEWLINE, and
(NEWLINE)* instead of (NEWLINE)? .

--bogdan


--- genericised <trigonometric at softhome.net> wrote:
> Actually your solution is incorrect:
> 
> file : (line)+ EOF ; 
> 
> would be wrong because a line would still expect a
> NEWLINE token at the end, the correct solution is:
> 
> file   : (line)+ ;
> line   : (record)+ (NEWLINE|EOF) ;
> record : (r:RECORD) (COMMA)? ;
> 
> well at least I think this is the correct solution, it
> looks like it is, and it is hard to think how something
> so simple could be wrong anyway. I am still interested
> in knowing why no error was generated in the original
> post however.



> --- In antlr-interest at y..., "genericised" <trigonometric at s...> wrote:
> > oh didn't realise it was so easy, and I wanted
> > comma to be optional, checkout my latest post however,
> > it is a bit more tricky, hehe ;)
> > 
> > --- In antlr-interest at y..., Bogdan Mitu <bogdan_mt at y...> wrote:
> > > Hi,
> > > 
> > > If you want to be sure that all the input has been parsed, you 
> > should finish
> > > the main rule with EOF:
> > > 
> > > file : (line)+ EOF ; 
> > > 
> > > As a side note, the way you defined the grammar, Comma between 
> > records is
> > > optional. If you want Comma to be mandatory between records, try:
> > > 
> > > line : rec (COMMA rec)* NEWLINE ;
> > > rec  : r:RECORD { action ... }
> > > 
> > > Cheers,
> > > Bogdan
> > > 
> > > --- genericised <trigonometric at s...> wrote:
> > > > I created the following parser, as an example of how to
> > > > parse comma separated variable (CSV) files:
> > > > 
> > > > class CSVParser extends Parser;
> > > > file : (line)+ ;
> > > > line : (rec)+ NEWLINE ;
> > > > rec  : (r:RECORD) (COMMA)?
> > > >        {System.out.println(r.getText());}
> > > >      ;
> > > > 
> > > > The corresponding Lexer is:
> > > > 
> > > > class CSVLexer extends Lexer;
> > > > options { charVocabulary='\3'..'\377'; }
> > > > RECORD  : (~(','|'\r'|'\n'|' '|'\t'))+ ;
> > > > COMMA   : ',' ;
> > > > NEWLINE : ('\r''\n')=> '\r''\n' //DOS
> > > >         | '\r'                  //MAC
> > > >         | '\n'                  //UNIX
> > > >         { newline(); }
> > > >         ;
> > > > WS      : (' '|'\t') { $setType(Token.SKIP); } ;
> > > > 
> > > > Pretty straightforward, but, when I run this on a
> > > > CSV it produces no error.
> > > > 
> > > > The last line of a CSV is:
> > > > 
> > > > blah, blah, blah
> > > > 
> > > > so the line does not consist of
> > > > 
> > > > rec+ NEWLINE
> > > > 
> > > > but
> > > > 
> > > > rec+
> > > > 
> > > > When 
> > > > 
> > > > match(NEWLINE)
> > > > 
> > > > is called from the parser, why does it not throw
> > > > a mismatchedTokenException?
> > > > 
> > > > Or does it throw some kind of exception that is
> > > > caught and causes the parsing of the inputstream
> > > > to terminate gracefully?
> > > > 
> > > > The parser is invoked from some main file like this:
> > > > 
> > > > csvParser.file();
> > > > 
> > > > I have spent a couple of hours investigating this,
> > > > looking through the ANTLR source and stuff but I
> > > > have not yet found where this is dealt with?
> > > > 
> > > > I might do a bit of weekend investigation into this
> > > > because of what I will learn in the process of
> > > > determining this but at the moment I am supposed to
> > > > be writing this ANTLR tutorial and then got side
> > > > tracked trying to explain why it is OK that the
> > > > parser does not match the final NEWLINE.
> > > > 
> > > > Well actually, is it ok, or should the rule for file
> > > > be defined something like:
> > > > 
> > > > file : (line)+ EOFCHAR;
> > > > 
> > > > Regards
> > > > 
> > > > A Person
> > > > 
> > > > 
> > > >  
> > > > 
> > > > Your use of Yahoo! Groups is subject to 
> > http://docs.yahoo.com/info/terms/ 
> > > > 
> > > > 
> > > > 
> > > 
> > > 
> > > __________________________________________________
> > > Do You Yahoo!?
> > > HotJobs - Search Thousands of New Jobs
> > > http://www.hotjobs.com
> 
> 
>  
> 
> Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 
> 
> 
> 




__________________________________________________
Do You Yahoo!?
HotJobs - Search Thousands of New Jobs
http://www.hotjobs.com

 

Your use of Yahoo! Groups is subject to http://docs.yahoo.com/info/terms/ 



More information about the antlr-interest mailing list