[antlr-interest] lexical rule overlapping

Fri Jun 2 11:59:32 PDT 2006

Hello Micheal, 

I am doing an MSc project, the topic is implementing Clean(subset). I plan firstly choose a subset
as the core. After the core working, then translate other part of Clean into this core. What I
know is that Clean 0.8 was developed as an experimental language and used to implement other
functional language. Moreover it is small so it is easy to start with. 

I followed sample project MiniJava on Appel's book Modern Compiler Implementation in Java until I
got stuck at Continuation Based I/O. I feel his book is not easy, since, when introducing
functional extension to MiniJava, he simply supposes readers already know functional I/O, that
seems related/is Monads. Originally I planned to start Clean implentation after realizing MiniJava
functional extension. I had to give up and to begin with basic functional aspect such as pattern
matching. I am not sure if this is viable, I have to move on.

I want interpret the core first, finally change into a compiler. 

Jigang (robert)

--- Micheal J <open.zone at virgin.net>Ð´µÀ:

> > Hi,
> > 
> > I am attempting implement Clean 0.8.4. According the 
> > following grammar (extracted from page 80 of appendix.ps on 
> > ftp://ftp.cs.kun.nl/pub/Clean/old/Clean08/doc/ps/), +1234 
> > could be either considered as SymbolID or IntDenot.
> > 
> > SymbolId = UpperCaseChar.{RestChar} | Class1Char.{RestChar} ;
> > 
> > RestChar = LowerCaseChar | Class1Char | Digit| UpperCaseChar 
> > | CharDel | StringDel ; LowerCaseChar = 'a' | 'b' | ... | 'z' 
> > ; UpperCaseChar = 'A' | 'B' | ... | 'Z' ;
> > 
> > Class1Char = '@' | '#' | '$' | '%' | '^' | '&'| '?' | '*' | 
> > '-' | '+' | '/' | '='| '<' | '>' | '_'
> > | '.' | '`' | ''' ;
> > 
> > IntDenot = [Sign].[Digit]+ ;
> > Sign = '+' | '-' ;
> > Digit = '0' | '1' | ... | '9' ;
> > 
> > 
> > I am using Antlr to do the work. The rules IntDenot and 
> > SymbolId are overlapped, thus caused lexical nondeterminism 
> > between the two grammar rules in lexer.  
> > 
> > Could anyone give me an idea?
> 
> Combine the rules where they conflict. That is handle all tokens that can
> begin with UpperCaseChar or Class1Char in the same rule. Use $setType() to
> fix eventual token type. KCSParse (and the java grammar too I think) does
> the same for handling DOT and numeric literals - see the INT_LITERAL rule in
> CSharpLexer.g.
> 
> I'm curious. Are you implementing a parser/lexer for Clean or a full Clean
> compiler/interpreter?
> 
> 
> Micheal
> 
> -----------------------
> The best way to contact me is via the list/forum. My time is very limited.
> 

__________________________________________________
¸Ï¿ì×¢²áÑÅ»¢³¬´óÈÝÁ¿Ãâ·ÑÓÊÏä?
http://cn.mail.yahoo.com