[antlr-interest] Noob question

Bart Kiers bkiers at gmail.com
Thu Feb 4 08:21:14 PST 2010


Hi Thomas,

Good to hear it.
Note that when converting from Java to Python that you only need to replace
the code between the '{' and '}':

@members {
  ... your python code here ...
}

sourceElement
    : f=functionDeclaration { ... your python code here ... }
    | s=statement           { ... your python code here ... }
    ;

functionBody
    : '{'{ ... your python code here ... } LT!* sourceElements LT!* '}'{ ...
your python code here ... }
    ;

And instead of the default Java target, you need to specify that you want to
generate Python parser- and lexer-source files. Do that by adding:

language=Python;

inside the options-block of the JavaScript grammar file.

You'll be on my skills-level in no time (which is not that impressive
really) when you read through *The Definitive ANTLR Reference* you ordered.

Regards,

Bart.




On Thu, Feb 4, 2010 at 4:46 PM, Thomas Raef <TRaef at wewatchyourwebsite.com>wrote:

>  HTH?
>
>
>
> Help? You did exactly what I was looking for. You rock dude.
>
>
>
> I’m going to go to work on converting that to Python code, or learn Java,
> which I’ve been putting off for some time now.
>
>
>
> My only concern is in grouping the STATEMENT lines. In your example these
> three lines work together:
>
>
>
> STATEMENT -> var blue='%3c'+'%73'+'%63'+'%72'+'%69'+' ... 74'+'%3e';
> STATEMENT -> for(z=0;z<blue.length+2;z=z+3)document.w ... tr(z,3)));
> STATEMENT -> FE('%275Euetkrv%2742NCPIWCIG%275F%2744lc ... v%275G2');
>
>
>
> So, I’ll have to figure out a way to combine them, which shouldn’t be too
> difficult. That way I can test the entire malscript.
>
>
>
> You are awesome! Thank you so much.
>
>
>
> How long have you been working with ANTLR?
>
>
>
> Thomas J. Raef
>
> e-Based Security <http://www.ebasedsecurity.com/>
>
> "You're either hardened or you're hacked!"
>
> We Watch Your Website <http://www.wewatchyourwebsite.com/>
>
> "We Watch Your Website - so you don't have to."
>
>
>
> *From:* Bart Kiers [mailto:bkiers at gmail.com]
> *Sent:* Thursday, February 04, 2010 9:05 AM
> *To:* Thomas Raef; antlr-interest at antlr.org interest
>
> *Subject:* Re: [antlr-interest] Noob question
>
>
>
> Hi Thomas,
>
> You're welcome of course. Sorry I forgot to put antlr-interest at antlr.orgin the To or CC line in my first reply. Not too used to mail-lists.
>
> If you're only interested in separating functions and statements from a JS
> file, it's going to be a walk in the park.
>
> Get the latest ANTLR JAR: http://www.antlr.org/download/antlr-3.2.jar
>
> Get this ECMA script grammar:
> http://www.antlr.org/grammar/1206736738015/JavaScript.g
>
> I'll give a short example in Java (I'm not too fluent in Python...).
>
> Put this:
>
> @members {
>
>     // keeps track if we're inside a function
>     public boolean insideFunction = false;
>
>     public void prettyPrint(String type, String text) {
>         text = text.replaceAll("\r?\n", " "); // remove line breaks
>         if(text.length() > 55) {
>             String start = text.substring(0, 40);
>             String end = text.substring(text.length()-10);
>             text = start+" ... "+end;
>         }
>         System.out.println(type+" -> "+text);
>     }
> }
>
>
> above the 'program' rule (on line 15) in the JavaScript.g file.
> Replace:
>
> sourceElement
>     : functionDeclaration
>     | statement
>     ;
>
>
> with:
>
> sourceElement
>     : f=functionDeclaration { prettyPrint("FUNCTION ", $f.text.toString());
> }
>     | s=statement           { if(!insideFunction) prettyPrint("STATEMENT",
> $s.text.toString()); }
>     ;
>
> and replace:
>
> functionBody
>     : '{' LT!* sourceElements LT!* '}'
>     ;
>
>
> with:
>
> functionBody
>     : '{'{insideFunction=true;} LT!* sourceElements LT!*
> '}'{insideFunction=false;}
>     ;
>
>
> Now generate the parser and lexer .java files by doing:
>
> java -cp antlr-3.2.jar org.antlr.Tool JavaScript.g
>
>
> and create a small test class:
>
> import org.antlr.runtime.*;
> import java.io.FileInputStream;
>
> public class ANTLRDemo {
>     public static void main(String[] args) throws Exception {
>         ANTLRInputStream in = new ANTLRInputStream(new
> FileInputStream("mt.js")); // <- your JS file
>         JavaScriptLexer lexer = new JavaScriptLexer(in);
>         CommonTokenStream tokens = new CommonTokenStream(lexer);
>         JavaScriptParser parser = new JavaScriptParser(tokens);
>         parser.program();
>     }
> }
>
>
> Compile everything and run ANTLRDemo. You'll see the following being
> printed to the console:
>
> FUNCTION  -> function dateTime() {     var myDate = n ... ,30000); }
> FUNCTION  -> function setCookie (name, value, expires ... rCookie; }
> FUNCTION  -> function getCookie (name) {     var pref ... Index)); }
> FUNCTION  -> function deleteCookie (name, path, domai ... 01 GMT"; }
> FUNCTION  -> function fixDate (date) {     var base = ... - skew); }
> STATEMENT -> var blue='%3c'+'%73'+'%63'+'%72'+'%69'+' ... 74'+'%3e';
> STATEMENT -> for(z=0;z<blue.length+2;z=z+3)document.w ... tr(z,3)));
> STATEMENT -> FE('%275Euetkrv%2742NCPIWCIG%275F%2744lc ... v%275G2');
> FUNCTION  -> function rememberMe (f) {     var now =  ... '', ''); }
> FUNCTION  -> function forgetMe (f) {     deleteCookie ... ue = ''; }
> FUNCTION  -> function hideDocumentElement(id) {     v ...  'none'; }
> FUNCTION  -> function showDocumentElement(id) {     v ... 'block'; }
> FUNCTION  -> function showAnonymousForm() {     showD ... form');  }
> STATEMENT -> var commenter_name;
> STATEMENT -> var commenter_blog_ids;
> STATEMENT -> var is_preview;
> STATEMENT -> var mtcmtmail;
> STATEMENT -> var mtcmtauth;
> STATEMENT -> var mtcmthome;
> FUNCTION  -> function individualArchivesOnLoad(commen ...  }     } }
> FUNCTION  -> function writeCommenterGreeting(commente ...       }  }
> STATEMENT -> if ('boxoffice.com' != 'boxoffice.com')  ... r_url'); }
> STATEMENT -> showAnonymousForm();
>
>
> HTH,
>
> Bart.
>
>  On Thu, Feb 4, 2010 at 2:49 PM, Thomas Raef <TRaef at wewatchyourwebsite.com>
> wrote:
>
> Bart,
>
>
>
> Thank you for the answer. When I first learned C or Linux or any other
> technology it was a steep learning curve – but they’ve all been worth it.
>
>
>
> I just needed to know that after spending time learning this, I wasn’t
> going to be disappointed that it couldn’t do what my current mission is – to
> separate js functions and declarations so that I can further analyze them to
> determine which code out of a large, mostly valid .js file, is malicious.
>
>
>
> I’ll be using Python for my analysis and various anti-virus programs which
> is why I need to separate them. I don’t want the analysis to determine –
> “yep. There’s malicious code in there somewhere” I need my analysis to tell
> me exactly which code to strip out of the .js file so that it removes the
> malscript.
>
>
>
> I just ordered the book (PDF and covered). I can’t wait to dive into this.
>
>
>
> The way I see it working is that my Python program will open a .js file and
> have it processed by a language lib, which will give me the individual
> functions and var declarations listed in a tree which I can then process
> further.
>
>
>
> Attached is a file typical of what I’ll be working with. You’ll notice part
> way down is a string that starts with “var blue=…” That is malicious if run
> from a browser. All the other code is benign. So what I want is to be able
> to clean that file – just of the infectious code.
>
>
>
> Any thoughts on this would be greatly appreciated.
>
>
>
> Thank you for taking the time to respond.
>
>
>
> Thomas J. Raef
>
> e-Based Security <http://www.ebasedsecurity.com/>
>
> "You're either hardened or you're hacked!"
>
> We Watch Your Website <http://www.wewatchyourwebsite.com/>
>
> "We Watch Your Website - so you don't have to."
>
>
>
> *From:* Bart Kiers [mailto:bkiers at gmail.com]
> *Sent:* Thursday, February 04, 2010 6:29 AM
> *To:* Thomas Raef
>
>
> *Subject:* Re: [antlr-interest] Noob question
>
>
>
> Hi brother,
>
>
>
> Sure, ANTLR could be used in this case. What target language are you using?
> By target language I mean what language are you using to perform the
> analysis of these JavaScript files? Check this link:
> http://www.antlr.org/wiki/display/ANTLR3/Code+Generation+Targets to see if
> your target language is supported.
>
> On the Wiki, there ar a couple of ECMA script grammars you can use:
> http://www.antlr.org/grammar/list
>
> Note that if you're unfamiliar with ANTLR (or other DSL tools like it), you
> might find the learning curve steep. Of course, as an ANTLR enthusiast, I
> encourage you to bite the bullet. The wiki is an excellent resource:
> http://www.antlr.org/wiki/display/ANTLR3/ANTLR+3+Wiki+Home and getting
> your hands on a copy of The Definitive ANTLR Reference,
> http://www.pragprog.com/titles/tpantlr/the-definitive-antlr-reference ,
> would be even better.
>
> Good luck!
>
> Bart.
>
> On Thu, Feb 4, 2010 at 1:15 PM, Thomas Raef <TRaef at wewatchyourwebsite.com>
> wrote:
>
> I want to use ANTLR to parse potentially malicious javascript files. The
> files in question have a string or strings embedded in them that don't
> cause the javascript file to error, but I do want to separate each
> function or declaration in the .js file into an individual string, then
> I'll process them to see if they are malicious or not.
>
>
>
> Is this the right tool? And if so, is there anyone who can point me in
> the right direction to get started? I know it's a very noob question,
> but I've been trying different tools and failing at each one.
>
>
>
> Can anyone "hook a brother up?"
>
>
>
> Thank you in advance
>
>
>
> Thomas J. Raef
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>
>
>


More information about the antlr-interest mailing list