[antlr-interest] Fixed field

Brian Lavender brian at brie.com
Fri Jul 30 19:07:12 PDT 2010


Meant to send this to the list, in case there is additional info others
have to add.

Well, my grammar does work on GNU/Linux and gets both "bills". Maybe I
am using an old version of the runtime on Windows? I put a bunch of JARs
in my classpath including a reference to an old ANTLR one for use with
Jasper Reports, but I believe the newest ANTLR jar is first.

I guess this would explain the silence. The guys doing analysis on network
IDS are using formal grammars using binpac which includes a lot of fixed
field data. I haven't looked specifically at how they constructed their
grammars for bro (http://www.bro-ids.org), but using awk or a buffered
reader suffers from numeric constants that fly right past the compiler
(for Java anyway). Currently, the code I have checks to see if the result
that comes back has a modulus of 8 lines, a very weak check. It seems
like I am going in the right direction, still digging though.

Could you please point to where my grammar is ambiguous? 

I guess I am little baffled that if the field does not match the exact
width, then it will all fall over. It's supposed to fall over because
moron dinked with the query on the Mainframe that produced the wrong
fixed field data result and foobared it.

I am trying to instatiate some typing safety. In some areas, rather than
being just characters, they should be 'Y' or 'N' flags. Others, should 
actually reference a value that was previously set.

brian

On Fri, Jul 30, 2010 at 10:26:08AM -0700, Jim Idle wrote:
> I think that you are barking up the wrong tree here. All your rules are
> completely ambiguous and if any of the fields do not exactly correspond to
> the number of letters, this will all fall over. ANTLR is not really meant
> for parsing fixed width fields where each field is just some arbitrary text.
> You should just use something like awk to do this, or even a very simple
> java class that just reads a buffered input stream line by line and picks
> out the fields.
> 
> Jim
> 
> > -----Original Message-----
> > From: antlr-interest-bounces at antlr.org [mailto:antlr-interest-
> > bounces at antlr.org] On Behalf Of Brian Lavender
> > Sent: Friday, July 30, 2010 10:04 AM
> > To: antlr-interest at antlr.org
> > Subject: Re: [antlr-interest] Fixed field
> > 
> > Hello empty antlr echo chambers. Is there anybody out there?
> > 
> > I figured out that if I create a lexer rule to match the newline, that it
> will
> > match rather than putting it in the parser rule. Now, it only seems to
> parse
> > the first bill that it finds. When I test it in the antlr works, it parses
> both
> > entries from input. Any ideas?
> > 
> > 
> > grammar Agenda;
> > 
> > agenda	:	bill+;
> > bill	:	title author NL { System.out.println("All " + $bill.text +
> "\n"); }
> > ;
> > title 	:	LTR LTR LTR LTR LTR LTR LTR LTR LTR LTR LTR LTR LTR
> LTR LTR {
> > System.out.println("Title " + $title.text + "\n"); };
> > author	:	LTR LTR LTR LTR LTR LTR LTR LTR LTR {
> > System.out.println("Author " + $author.text + "\n"); };
> > 
> > 
> > LTR  :   ('a'..'z'|'A'..'Z' | ' ' | '1'..'9') ;
> > NL	:	('\n'|'\r');
> > 
> > 
> > import org.antlr.runtime.*;
> > 
> > public class Test {
> >     public static void main(String[] args) throws Exception {
> >         ANTLRInputStream input = new ANTLRInputStream(System.in);
> >         AgendaLexer lexer = new AgendaLexer(input);
> >         CommonTokenStream tokens = new CommonTokenStream(lexer);
> >         AgendaParser parser = new AgendaParser(tokens);
> >         parser.agenda();
> >     }
> > }
> > 
> > Sample input
> > 
> > 
> > construct bill frank burn
> > eazememnt bill billy bob
> > 
> > 
> > 
> > 
> > 
> > On Thu, Jul 29, 2010 at 06:07:40PM -0700, Brian Lavender wrote:
> > > Well, it looks like my attempt isn't so feeble, but I can't seem to
> > > get the input to match on the newline. Do I need to do something
> > different?
> > >
> > >
> > > grammar Agenda;
> > >
> > > agenda	:	bill+;
> > > bill	:	title author '\n' { System.out.println($bill.text);
> }
> > > ;
> > > author	:	LTR LTR LTR LTR LTR LTR LTR LTR LTR {
> > System.out.println($author.text); };
> > > title 	:	LTR LTR LTR LTR LTR LTR LTR LTR LTR LTR LTR LTR LTR
> LTR LTR {
> > System.out.println($title.text); };
> > >
> > > LTR  :   ('a'..'z'|'A'..'Z' | ' ' | '1'..'9') ;
> > >
> > > Input is the following.
> > > Bill to allow eBill Joy
> > > Bill to preventFrank Dist
> > >
> > >
> > >
> > >
> > > On Mon, Jul 26, 2010 at 08:27:41PM -0700, Brian Lavender wrote:
> > > > What's the best way to get the words out of a fixed field file? Say
> > > > the title is in the first 20 columns, and then the author is in the
> next 20?
> > > >
> > > > Below is a feeble attempt that will get four letters, but I would
> > > > like to ignore any whitespace that occurs after the last letter before
> the
> > end column.
> > > >
> > > > brian
> > > >
> > > >
> > > > grammar Foo;
> > > >
> > > > title	:	LTR LTR LTR LTR
> '\n'{System.out.println($title.text);};
> > > >
> > > > LTR 	:	('a'..'z'|'A'..'Z');
> > > >
> > > > --
> > > > Brian Lavender
> > > > http://www.brie.com/brian/
> > > >
> > > > "There are two ways of constructing a software design. One way is to
> > > > make it so simple that there are obviously no deficiencies. And the
> > > > other way is to make it so complicated that there are no obvious
> > deficiencies."
> > > >
> > > > Professor C. A. R. Hoare
> > > > The 1980 Turing award lecture
> > > >
> > > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > > Unsubscribe:
> > > > http://www.antlr.org/mailman/options/antlr-interest/your-email-addre
> > > > ss
> > >
> > > --
> > > Brian Lavender
> > > http://www.brie.com/brian/
> > >
> > > "There are two ways of constructing a software design. One way is to
> > > make it so simple that there are obviously no deficiencies. And the
> > > other way is to make it so complicated that there are no obvious
> > deficiencies."
> > >
> > > Professor C. A. R. Hoare
> > > The 1980 Turing award lecture
> > >
> > > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > > Unsubscribe:
> > > http://www.antlr.org/mailman/options/antlr-interest/your-email-address
> > 
> > --
> > Brian Lavender
> > http://www.brie.com/brian/
> > 
> > "There are two ways of constructing a software design. One way is to make
> it
> > so simple that there are obviously no deficiencies. And the other way is
> to
> > make it so complicated that there are no obvious deficiencies."
> > 
> > Professor C. A. R. Hoare
> > The 1980 Turing award lecture
> > 
> > List: http://www.antlr.org/mailman/listinfo/antlr-interest
> > Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-
> > email-address
> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address

-- 
Brian Lavender
http://www.brie.com/brian/

"There are two ways of constructing a software design. One way is to
make it so simple that there are obviously no deficiencies. And the other
way is to make it so complicated that there are no obvious deficiencies."

Professor C. A. R. Hoare
The 1980 Turing award lecture


More information about the antlr-interest mailing list