[antlr-interest] Problem in JavaScript grammar

Ana-Maria Farcasi farcasia at gmail.com
Fri Jul 13 04:21:16 PDT 2012


HI,

I am currently working at a fuzzer tester for JavaScript  and I need antlr
to parse the input JavaScript files.
I am using antlr 3.1.3 with the python target language.
I have started from Chris Lambrou's grammar for JavaScript
(http://www.antlr.org/grammar/1206736738015/JavaScript.g)
and I have added a parser rule for regular
expression literals to that grammar. I have not added the regular
expressions in the lexer because I don't want
the lexer to mistakenly think that a '/' is the starting character of a
regular expression when it is a division operator
in fact.

Now that the regular expression is a parsing rule, there is another
problem. If there is an unescaped quote
inside the regular expression, the lexer mistakenly thinks that that quote
is the starting of a string literal
and treats it like that.

One way of solving this would have been the solution proposed in
http://www.antlr.org/wiki/display/ANTLR3/Island+Grammars+Under+Parser+Control
but this does not work in antlr 3 because cs.index() will in fact be the
end of the
character stream in the above solution (tokenization is done completely
before
parsing). So, I could not change the lexers or call another parser for
regular expressions.
Also, I cannot transfer any information from the parser to the lexer
because lexing
is done before parsing.

I am really stuck at this point. If you know of any workaround for this
problem, please
let me know.

Thank you,
Ana.


More information about the antlr-interest mailing list