[antlr-interest] Noob question
Bart Kiers
bkiers at gmail.com
Thu Feb 4 07:05:02 PST 2010
Hi Thomas,
You're welcome of course. Sorry I forgot to put antlr-interest at antlr.org in
the To or CC line in my first reply. Not too used to mail-lists.
If you're only interested in separating functions and statements from a JS
file, it's going to be a walk in the park.
Get the latest ANTLR JAR: http://www.antlr.org/download/antlr-3.2.jar
Get this ECMA script grammar:
I'll give a short example in Java (I'm not too fluent in Python...).
Put this:
@members {
// keeps track if we're inside a function
public boolean insideFunction = false;
public void prettyPrint(String type, String text) {
text = text.replaceAll("\r?\n", " "); // remove line breaks
if(text.length() > 55) {
String start = text.substring(0, 40);
String end = text.substring(text.length()-10);
text = start+" ... "+end;
System.out.println(type+" -> "+text);
above the 'program' rule (on line 15) in the JavaScript.g file.
: functionDeclaration
| statement
: f=functionDeclaration { prettyPrint("FUNCTION ", $f.text.toString());
| s=statement { if(!insideFunction) prettyPrint("STATEMENT",
$s.text.toString()); }
and replace:
: '{' LT!* sourceElements LT!* '}'
: '{'{insideFunction=true;} LT!* sourceElements LT!*
Now generate the parser and lexer .java files by doing:
java -cp antlr-3.2.jar org.antlr.Tool JavaScript.g
and create a small test class:
import org.antlr.runtime.*;
import java.io.FileInputStream;
public class ANTLRDemo {
public static void main(String[] args) throws Exception {
ANTLRInputStream in = new ANTLRInputStream(new
FileInputStream("mt.js")); // <- your JS file
JavaScriptLexer lexer = new JavaScriptLexer(in);
CommonTokenStream tokens = new CommonTokenStream(lexer);
JavaScriptParser parser = new JavaScriptParser(tokens);
Compile everything and run ANTLRDemo. You'll see the following being printed
to the console:
FUNCTION -> function dateTime() { var myDate = n ... ,30000); }
FUNCTION -> function setCookie (name, value, expires ... rCookie; }
FUNCTION -> function getCookie (name) { var pref ... Index)); }
FUNCTION -> function deleteCookie (name, path, domai ... 01 GMT"; }
FUNCTION -> function fixDate (date) { var base = ... - skew); }
STATEMENT -> var blue='%3c'+'%73'+'%63'+'%72'+'%69'+' ... 74'+'%3e';
STATEMENT -> for(z=0;z<blue.length+2;z=z+3)document.w ... tr(z,3)));
STATEMENT -> FE('%275Euetkrv%2742NCPIWCIG%275F%2744lc ... v%275G2');
FUNCTION -> function rememberMe (f) { var now = ... '', ''); }
FUNCTION -> function forgetMe (f) { deleteCookie ... ue = ''; }
FUNCTION -> function hideDocumentElement(id) { v ... 'none'; }
FUNCTION -> function showDocumentElement(id) { v ... 'block'; }
FUNCTION -> function showAnonymousForm() { showD ... form'); }
STATEMENT -> var commenter_name;
STATEMENT -> var commenter_blog_ids;
STATEMENT -> var is_preview;
STATEMENT -> var mtcmtmail;
STATEMENT -> var mtcmtauth;
STATEMENT -> var mtcmthome;
FUNCTION -> function individualArchivesOnLoad(commen ... } } }
FUNCTION -> function writeCommenterGreeting(commente ... } }
STATEMENT -> if ('boxoffice.com' != 'boxoffice.com') ... r_url'); }
STATEMENT -> showAnonymousForm();
On Thu, Feb 4, 2010 at 2:49 PM, Thomas Raef <TRaef at wewatchyourwebsite.com>wrote:
> Bart,
> Thank you for the answer. When I first learned C or Linux or any other
> technology it was a steep learning curve – but they’ve all been worth it.
> I just needed to know that after spending time learning this, I wasn’t
> going to be disappointed that it couldn’t do what my current mission is – to
> separate js functions and declarations so that I can further analyze them to
> determine which code out of a large, mostly valid .js file, is malicious.
> I’ll be using Python for my analysis and various anti-virus programs which
> is why I need to separate them. I don’t want the analysis to determine –
> “yep. There’s malicious code in there somewhere” I need my analysis to tell
> me exactly which code to strip out of the .js file so that it removes the
> malscript.
> I just ordered the book (PDF and covered). I can’t wait to dive into this.
> The way I see it working is that my Python program will open a .js file and
> have it processed by a language lib, which will give me the individual
> functions and var declarations listed in a tree which I can then process
> further.
> Attached is a file typical of what I’ll be working with. You’ll notice part
> way down is a string that starts with “var blue=…” That is malicious if run
> from a browser. All the other code is benign. So what I want is to be able
> to clean that file – just of the infectious code.
> Any thoughts on this would be greatly appreciated.
> Thank you for taking the time to respond.
> Thomas J. Raef
> e-Based Security <http://www.ebasedsecurity.com/>
> "You're either hardened or you're hacked!"
> We Watch Your Website <http://www.wewatchyourwebsite.com/>
> "We Watch Your Website - so you don't have to."
> *From:* Bart Kiers [mailto:bkiers at gmail.com]
> *Sent:* Thursday, February 04, 2010 6:29 AM
> *To:* Thomas Raef
> *Subject:* Re: [antlr-interest] Noob question
> Hi brother,
> Sure, ANTLR could be used in this case. What target language are you using?
> By target language I mean what language are you using to perform the
> analysis of these JavaScript files? Check this link:
> http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets to see if
> your target language is supported.
> On the Wiki, there ar a couple of ECMA script grammars you can use:
> http://www.antlr.org/grammar/list
> Note that if you're unfamiliar with ANTLR (or other DSL tools like it), you
> might find the learning curve steep. Of course, as an ANTLR enthusiast, I
> encourage you to bite the bullet. The wiki is an excellent resource:
> http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home and getting
> your hands on a copy of The Definitive ANTLR Reference,
> http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference ,
> would be even better.
> Good luck!
> Bart.
> On Thu, Feb 4, 2010 at 1:15 PM, Thomas Raef <TRaef at wewatchyourwebsite.com>
> wrote:
> I want to use ANTLR to parse potentially malicious javascript files. The
> files in question have a string or strings embedded in them that don't
> cause the javascript file to error, but I do want to separate each
> function or declaration in the .js file into an individual string, then
> I'll process them to see if they are malicious or not.
> Is this the right tool? And if so, is there anyone who can point me in
> the right direction to get started? I know it's a very noob question,
> but I've been trying different tools and failing at each one.
> Can anyone "hook a brother up?"
> Thank you in advance
> Thomas J. Raef
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list