[antlr-interest] Re: Problem With Special Chars - Detailed
Premkumar Rathanavelu
rprememail at yahoo.com
Mon Jul 25 05:36:35 PDT 2005
Hi.,
Thanks Martin and David.
Let me clear my question in detail.
The follwing are the lines frequently seen inside comments
in the souce files from Borland C++ Builder 5.5 free command line tool compiler
(www.borland.com/bcppbuilder/freecompiler/)
check the files vector.h, utility.h, streambu.h in /Borland/BCC55/Include/
/*******************************************************************************
* U.S. Government Restricted Rights. This computer software is provided
* with Restricted Rights. Use, duplication, or disclosure by the
* Government is subject to restrictions as set forth in subparagraph (c)
* (1) (ii) of The Rights in Technical Data and Computer Software clause
* at DFARS 252.227-7013 or subparagraphs (c) (1) and (2) of the
* Commercial Computer Software Restricted Rights at 48 CFR 52.227-19,
* as applicable. Manufacturer is Rogue Wave Software, Inc., 5500
* Flatiron Parkway, Boulder, Colorado 80301 USA.
*
**************************************************************************/
While parsing these lines with the general COMMENT (multiline) rule,
parser stops by throwing error
C:/Borland/BCC55/Include/utility.h: expecting '*', found ''
Got this message and included a special char token (û) in the LEXER, but of no use... eventhough my charVocabulary was included with that special char token.
Already I tried and tired a lot to skip this error but fruitless. Actually I want to
analyze the whole package of Borland application in order to evaluate my
Master Thesis (Theme: Obtaining variants in C/C++ through conditional compilation) and
due to the above error i could not able to parse all the files completely.
Its still annoying.
I kindly request ANTLRs' to give me some tips to overcome the error.
I'm puzzled but not puzz'ling anybody.
Thanks.,
Prem
*******************************************************************
Replied Message from David on 25 July
Hello,
What interested me about the message from Prekumar of 24 July
was how in some source code a hyphen ("-") could become
displayed as a "u" circumflex ("û")in DOS mode when the ISO
8859-1 value of the first is 45 and the second 251 (with a
difference of 206.
What happens when you use the hyphen for the subtraction
operator in your source code?
What is the significance of it being in a comment?
What coding system is being used in the source code?
Is this a problem with a particular IDE?
Are we talking MS or UNIX?
Yours puzzled,
David.
*************************************************************************************
Original Message from Prem on 24 July
Hi Everyone.,
In source codes often comments comes with some kind of
special characters like û etc.,
Consider a Comment Line:
/* Computer Software - Restricted Rights */
In the above comment line, hyphen ('-') between "Software" and
"Restricted"
looks normal but when we view that in DOS editor it shows
'-' as û.
My comment line token
Comment
: "/*"
( {LA(2) != '/'}? '*'
| EndOfLine //{newline();}
| ~('*'| '\r' | '\n')
)*
"*/" {$setType(Token.SKIP);}// newline();}
;
So, I placed a token with that special character in the parser.
But still I'm getting error. The file could not be parsed anymore.
I'm a newbie..please help me to overcome the error.
Thanks in advance.,
Prem
---------------------------------
Start your day with Yahoo! - make it your home page
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20050725/09bda978/attachment.html
More information about the antlr-interest
mailing list