[antlr-interest] Re: Problem With Special Chars - Detailed

Premkumar Rathanavelu rprememail at yahoo.com
Mon Jul 25 05:36:35 PDT 2005


Hi., 
  Thanks Martin and David.
 
Let me clear my question in detail.
The follwing are the lines frequently seen inside comments
 in the souce files from Borland C++ Builder 5.5 free command line tool compiler 
(www.borland.com/bcppbuilder/freecompiler/)
check the files vector.h, utility.h, streambu.h in /Borland/BCC55/Include/
 
/*******************************************************************************
 * U.S. Government Restricted Rights.  This computer software is provided
 * with Restricted Rights.  Use, duplication, or disclosure by the
 * Government is subject to restrictions as set forth in subparagraph (c)
 * (1) (ii) of The Rights in Technical Data and Computer Software clause
 * at DFARS 252.227-7013 or subparagraphs (c) (1) and (2) of the
 * Commercial Computer Software – Restricted Rights at 48 CFR 52.227-19,
 * as applicable.  Manufacturer is Rogue Wave Software, Inc., 5500
 * Flatiron Parkway, Boulder, Colorado 80301 USA.
 *
 **************************************************************************/ 
 
While parsing these lines with the general COMMENT (multiline) rule, 
parser stops by throwing error 

C:/Borland/BCC55/Include/utility.h: expecting '*', found '–' 

Got this message and included a special char token (û) in the LEXER, but of no use... eventhough my charVocabulary was included with that special char token.

Already I tried and tired a lot to skip this error but fruitless. Actually I want to 
analyze the whole package of Borland application in order to evaluate my 
Master Thesis (Theme: Obtaining variants in C/C++ through conditional compilation) and
 due to the above error i could not able to parse all the files completely. 
Its still annoying.
 
I kindly request ANTLRs' to give me some tips to overcome the error.
 
I'm puzzled but not puzz'ling anybody.
 
Thanks.,
Prem
 
 
 
 
 
 
 
*******************************************************************
Replied Message from David on 25 July
 
 
Hello,

What interested me about the message from Prekumar of 24 July 
was how in some source code a hyphen ("-") could become 
displayed as a "u" circumflex ("û")in DOS mode when the ISO 
8859-1 value of the first is 45 and the second 251 (with a 
difference of 206.

What happens when you use the hyphen for the subtraction 
operator in your source code?

What is the significance of it being in a comment?

What coding system is being used in the source code?

Is this a problem with a particular IDE?

Are we talking MS or UNIX?

Yours puzzled,

David.
*************************************************************************************
 
Original Message from Prem on 24 July
 
 
Hi Everyone.,
   In source codes often comments comes with some kind of
   special characters like û etc.,
Consider a Comment Line:
/*  Computer Software - Restricted Rights */

In the above comment line, hyphen ('-') between "Software" and 
"Restricted"
looks normal but when we view that in DOS editor it shows
'-' as û.
My comment line token
Comment
  : "/*"
   ( {LA(2) != '/'}? '*'
   | EndOfLine //{newline();}
   | ~('*'| '\r' | '\n')
   )*
   "*/"  {$setType(Token.SKIP);}// newline();}
  ;

So, I placed a token with that special character in the parser.
But still I'm getting error. The file could not be parsed anymore.

I'm a newbie..please help me to overcome the error.

Thanks in advance.,
Prem



		
---------------------------------
 Start your day with Yahoo! - make it your home page 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20050725/09bda978/attachment.html


More information about the antlr-interest mailing list