[antlr-interest] unicode, predicates, exceptions

Tom Moog tmoog at polhode.com
Sun Mar 14 21:08:58 PST 2004



> THM: Support for Unicode up to at least 0x10ffff (current xml
> range).

> TJP: I have an example that matches binary stuff :)  The
> UNICODE works up to 0xFFFFE :)

THM: I forgot that antlr 2 can handle large unicode values, but I
had the impression that this was not practical because the bit
sets were too large.  It's not that people use Linear B all that
often; it's that without support for unicode up to 0x10ffff, the
programmer isn't guarantted that it is compatible with xml
based software.

> THM:  Regarding ambiguity for: A ( B | epsilon )* C

> TJP: I'm not sure I can agree...you have provided two ways to
> match the same input: an ambiguity by definition, right?  On
> the other hand, are you saying for action processing this is
> very desirable?  For example,

> (A | B)*
>
> matches the same thing as
>
> (A | B | {foo} )*
>
> but {foo} is executed as it exits or if nothing is matched?

THM: My suggestions wasn't very well thought out.  Let's drop it.

> TJP: How does isExtCmdName test the input symbol?  Is it a
> variable?  If so, where is it set?

THM: "isExtCmdName" is a boolean valued function of the lookahead
token that consults a dictionary.

> TJP: I'm hoping to "hoist > 0 lookahead distance" with the
> combined NFA->DFA conversion + collect predicts.  I'll run this
> past you when I figure it out ;)

Ahhh, right, regular expression lookahead gives you more freedom.

> TJP: Concerning exceptions for code gen.  I agree that
> try/catch is the easiest to use as I want to use exceptions for
> error handling.  What would we do for C?  I'd resist the
> longjmp this time probably ;)

THM: Yes C is a problem.  Not only are there the transfer of
control problems, but it doesn't have the concept of dtors
for cleanup as the stack is unwound.  You would have to
create something which allowed dtor-like actions on stack
local variables.

> TJP: Do you mean like
>
> ( x
> | y
> | .
> )

THM: Ahhh, previously, "." would match everything and
resulted in lots of ambiguity warnings.  This would be ok.



 
Yahoo! Groups Links

<*> To visit your group on the web, go to:
     http://groups.yahoo.com/group/antlr-interest/

<*> To unsubscribe from this group, send an email to:
     antlr-interest-unsubscribe at yahoogroups.com

<*> Your use of Yahoo! Groups is subject to:
     http://docs.yahoo.com/info/terms/
 



More information about the antlr-interest mailing list