[antlr-interest] Parsing with a character that represents null
Priolo, Scott
spriolo at walkerinfo.com
Mon Jan 26 06:58:43 PST 2009
Hello,
I'm writing a grammar based on Apache Tomcat's Access Log format. I'm
able to parse these lines into host, timestamp, command, protocol,
response, bytes, etc...
A problem occurs when the log file uses a '-' for "no data". I'd like
to capture this as a "null" field when I walk the tree so it's important
not to treat '-' as white space. The issue I'm having is '-' can match
any of the patterns such as host, username, timestamp, bytes, etc....
When I try to use
field : HOST L_USER A_USER TIMESTAMP '"' COMMAND PATH PROTOCOL '"'
RESPONSE_CODE BYTES;
L_USER : '-' | ('a'..'z'|'A'..'Z')+;
BYTES : '-' | ('0'..'9')+;
The parser can't distinguish if the '-' is L_USER or BYTES (see below
that the second test line has a '-' for the last number because there
was no data.
Actual data line
66.249.71.45 - - [25/Jan/2009:00:00:56 -0500] "GET
/abc/slides/180001.1.jsp HTTP/1.1" 200 11785
67.68.5.63 - - [10/Nov/2008:16:26:12 -0500] "HEAD /abc/res/prev1.gif
HTTP/1.0" 200 -
How do I manage these '-' without changing the logging pattern?
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090126/f2658494/attachment.html
More information about the antlr-interest
mailing list