[antlr-interest] How to specify ‘any non-control symbol’?

Hendrik Maryns qwizv9b02 at sneakemail.com
Tue Oct 28 06:26:18 PDT 2008


Johannes Luber schreef:
> Hendrik Maryns schrieb:
>> Hi,
>>
>> I want to define a LABEL lexer rule which should match almost anything.
>>  Let’s say any non-control Unicode symbol.  Antlr wouldn’t accept .* or
>> .+.  I probably don’t want a closing brace in there since it is a
>> lisp-like grammar, but even space would be fine (although it probably
>> won’t occur), so I did ~(')')+ but that feels like a hack.  Can I use
>> POSIX regex classes such as p{alphnum} or something of the like?
> 
> Currently ANTLR doesn't support Unicode classes. The only workaround
> would be to define manually all code points (manually means
> semi-automatic via use of some existing table as starting point). You
> should be aware that ANTLR doesn't accept code points above \uffff, so
> you'd have to translate UTF-32 into UTF-16 surrogates.

This is what it already seem to do internally, see the attached image
Antlrworks produced.

> BTW, while it at first seems to be good idea to this kind of
> discrimination in the lexer, you get far better error messages if you
> push the error checking into the parser. Doing so requires merely to
> make the lexer discriminate the potential classes in the minimal way. If
> you like I can send you a lexer of mine using this strategy for
> comparison purposes.

I don’t understand this.   What do you mean by ‘this kind of
discriminations’ and in which way am I putting it in the lexer and could
push it into the parser?  I am afraid I am too new in this area to
follow you here.

H.
-- 
Hendrik Maryns
http://tcl.sfs.uni-tuebingen.de/~hendrik/
==================
Ask smart questions, get good answers:
http://www.catb.org/~esr/faqs/smart-questions.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: label.eps
Type: application/postscript
Size: 14010 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20081028/64e6d937/attachment.eps 
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 257 bytes
Desc: OpenPGP digital signature
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20081028/64e6d937/attachment.bin 


More information about the antlr-interest mailing list