[antlr-interest] Parsing with a character that represents null

Priolo, Scott spriolo at walkerinfo.com
Mon Jan 26 06:58:43 PST 2009


Hello,

 

I'm writing a grammar based on Apache Tomcat's Access Log format.  I'm
able to parse these lines into host, timestamp, command, protocol,
response, bytes, etc...

 

A problem occurs when the log file uses a '-' for "no data".  I'd like
to capture this as a "null" field when I walk the tree so it's important
not to treat '-' as white space.  The issue I'm having is '-' can match
any of the patterns such as host, username, timestamp, bytes, etc....
When I try to use

 

field : HOST L_USER A_USER TIMESTAMP '"' COMMAND PATH PROTOCOL '"'
RESPONSE_CODE BYTES;

 

L_USER : '-' | ('a'..'z'|'A'..'Z')+;

BYTES : '-' | ('0'..'9')+;

 

The parser can't distinguish if the '-' is L_USER or BYTES (see below
that the second test line has a '-' for the last number because there
was no data.

 

Actual data line

66.249.71.45 - - [25/Jan/2009:00:00:56 -0500] "GET
/abc/slides/180001.1.jsp HTTP/1.1" 200 11785

67.68.5.63 - - [10/Nov/2008:16:26:12 -0500] "HEAD /abc/res/prev1.gif
HTTP/1.0" 200 -

 

How do I manage these '-' without changing the logging pattern?

 

Thanks!

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090126/f2658494/attachment.html 


More information about the antlr-interest mailing list