[antlr-interest] lexical rule overlapping
=?gb2312?q?=CB=EF=BC=CD=B8=D5=20Jigang=20(Robert)=20Sun?=
sunjigang1965 at yahoo.com.cn
Fri Jun 2 11:59:32 PDT 2006
Hello Micheal,
I am doing an MSc project, the topic is implementing Clean(subset). I plan firstly choose a subset
as the core. After the core working, then translate other part of Clean into this core. What I
know is that Clean 0.8 was developed as an experimental language and used to implement other
functional language. Moreover it is small so it is easy to start with.
I followed sample project MiniJava on Appel's book Modern Compiler Implementation in Java until I
got stuck at Continuation Based I/O. I feel his book is not easy, since, when introducing
functional extension to MiniJava, he simply supposes readers already know functional I/O, that
seems related/is Monads. Originally I planned to start Clean implentation after realizing MiniJava
functional extension. I had to give up and to begin with basic functional aspect such as pattern
matching. I am not sure if this is viable, I have to move on.
I want interpret the core first, finally change into a compiler.
Jigang (robert)
--- Micheal J <open.zone at virgin.net>дµÀ:
> > Hi,
> >
> > I am attempting implement Clean 0.8.4. According the
> > following grammar (extracted from page 80 of appendix.ps on
> > ftp://ftp.cs.kun.nl/pub/Clean/old/Clean08/doc/ps/), +1234
> > could be either considered as SymbolID or IntDenot.
> >
> > SymbolId = UpperCaseChar.{RestChar} | Class1Char.{RestChar} ;
> >
> > RestChar = LowerCaseChar | Class1Char | Digit| UpperCaseChar
> > | CharDel | StringDel ; LowerCaseChar = 'a' | 'b' | ... | 'z'
> > ; UpperCaseChar = 'A' | 'B' | ... | 'Z' ;
> >
> > Class1Char = '@' | '#' | '$' | '%' | '^' | '&'| '?' | '*' |
> > '-' | '+' | '/' | '='| '<' | '>' | '_'
> > | '.' | '`' | ''' ;
> >
> > IntDenot = [Sign].[Digit]+ ;
> > Sign = '+' | '-' ;
> > Digit = '0' | '1' | ... | '9' ;
> >
> >
> > I am using Antlr to do the work. The rules IntDenot and
> > SymbolId are overlapped, thus caused lexical nondeterminism
> > between the two grammar rules in lexer.
> >
> > Could anyone give me an idea?
>
> Combine the rules where they conflict. That is handle all tokens that can
> begin with UpperCaseChar or Class1Char in the same rule. Use $setType() to
> fix eventual token type. KCSParse (and the java grammar too I think) does
> the same for handling DOT and numeric literals - see the INT_LITERAL rule in
> CSharpLexer.g.
>
> I'm curious. Are you implementing a parser/lexer for Clean or a full Clean
> compiler/interpreter?
>
>
> Micheal
>
> -----------------------
> The best way to contact me is via the list/forum. My time is very limited.
>
__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com
More information about the antlr-interest
mailing list