[antlr-interest] lexing shell-like strings

Colin Walters walters at verbum.org
Wed Jan 7 15:35:14 PST 2009


I have a project for which I would like to lex strings that have
somewhat "Unix shell-like" quoting semantics.  Unix shell strings are
quite funky, but I'd be happy if I could express the following:

// Some straightforward stuff
somestring => [Token("somestring")]
"somestring" => [Token("somestring")]
two strings => [Token("two"), Token("strings")]
"one string" => [Token("one string")]
"error => parse error

// Here it gets a bit more subtle
one"string" => [Token("onestring")]
one"string"only =>  [Token("onestringonly")]
one\"string\"only => [Token("one\"string\"only")]

etc.  At this point I don't need to replicate the differences between
' and ", though knowing how to would be interesting.

I'm sort of embarassed to show you my attempts, but I've attached the
closest I have.  It doesn't work for the one\"string\" case though.

One thing I wanted to try but couldn't find much documentation on is
writing a essentially a totally custom lexer; I know how to parse
these strings in raw Java, but it wasn't completely clear to me which
methods to override, etc.  Ideally of course I could express these
strings in the ANTLR lexer language, hopefully someone can point me
the right way there!
-------------- next part --------------
A non-text attachment was scrubbed...
Name: shellLike.g
Type: application/octet-stream
Size: 258 bytes
Desc: not available
Url : http://www.antlr.org/pipermail/antlr-interest/attachments/20090107/ddc15201/attachment.obj 


More information about the antlr-interest mailing list