[antlr-interest] Re: How to set filename in parser

David Wigg wiggjd at lsbu.ac.uk
Fri Jul 15 05:14:05 PDT 2005


Original message.

Message: 2
Date: Wed, 13 Jul 2005 19:26:29 +0300
From: shmuel siegel <antlr at shmuelhome.mine.nu>
Subject: [antlr-interest] How to set filename in parser
To: 'ANTLR Interest' <antlr-interest at antlr.org>
Message-ID: <42D540B5.6000705 at shmuelhome.mine.nu>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

My grammar parses the output of a c preprocessor. It interprets 
the #line directives so that syntax errors can be easily tracked 
back to the original file. My lexer sets the filename and 
linenumber parameters appropriately and then treats the 
directive like a single line comment.

This works fine for the lexer; when an error occurs, the proper 
error message is printed. But the parser never finds out about 
this resequencing, so syntactical errors at the parser level do 
not reflect the original file information. Short of overriding 
the various match routines, is there any simple way to propagate 
the #line information from the lexer to the parser?

Reply.

I agree this is a real problem.

At one time I was hoping to be able to pretty print the 
preprocessed input file as well but it turned out to be so 
difficult that I gave up (it would only have been a by-product 
of what we wanted to do).

However, we do need to know when the file of the source code we 
are interested in is being read so when we parse the line 
directives we call a module to extract the data from the line 
and to store it in main.cpp at the highest level where it is 
available for other modules to see when the User file of 
interest is being read.

This is too complex to describe in full here but to give you a 
starting point here is a copy of the relevant productions from 
the lexer,

PREPROC_DIRECTIVE
     options{paraphrase = "a line directive";}
     :    '#' LineDirective
         {_ttype = ANTLR_USE_NAMESPACE(antlr)Token::SKIP; 
newline();}
     ;

protected
LineDirective
     :
         ("line")?  // this would be for if the directive 
started "#line"
         (Space)+
         n:Decimal
         (Space)+
         (sl:StringLiteral)
         ((Space)+ Decimal)*	// To support cpp flags (GNU)
         {
         process_line_directive((sl->getText()).data(), 
(n->getText()).data());  // see main()
         }
         EndOfLine
     ;

Further information can be obtained from our C/C++ parser on the 
  Antlr website.

We were originally interested in including comments in the 
pretty print but since they could turn up inside statements more 
or less at random this was an even greater problem than line 
directives.

I'm wondering now, if we are only interested in line directives, 
whether we could pass them to the parser as a token or tokens. 
Since they cannot split statements perhaps they could be treated 
as just another statement and processed appropriately?

Any comments?

David Wigg



More information about the antlr-interest mailing list