[antlr-interest] Parsing with a character that represents null
Priolo, Scott
spriolo at walkerinfo.com
Tue Jan 27 05:59:42 PST 2009
I've tinkered with the idea Jim states below, but I'm afraid I don't
fully understand. Don't do this at the lexer level, do it at the
parser. (I think there is a learning curve I'm missing here).
Can you clarify?
Thanks,
Scott
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
Sent: Monday, January 26, 2009 12:25 PM
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Parsing with a character that represents
null
Priolo, Scott wrote:
Hello,
I'm writing a grammar based on Apache Tomcat's Access Log format. I'm
able to parse these lines into host, timestamp, command, protocol,
response, bytes, etc...
A problem occurs when the log file uses a '-' for "no data". I'd like
to capture this as a "null" field when I walk the tree so it's important
not to treat '-' as white space. The issue I'm having is '-' can match
any of the patterns such as host, username, timestamp, bytes, etc....
When I try to use
field : HOST L_USER A_USER TIMESTAMP '"' COMMAND PATH PROTOCOL '"'
RESPONSE_CODE BYTES;
L_USER : '-' | ('a'..'z'|'A'..'Z')+;
BYTES : '-' | ('0'..'9')+;
The parser can't distinguish if the '-' is L_USER or BYTES (see below
that the second test line has a '-' for the last number because there
was no data.
Actual data line
66.249.71.45 - - [25/Jan/2009:00:00:56 -0500] "GET
/abc/slides/180001.1.jsp HTTP/1.1" 200 11785
67.68.5.63 - - [10/Nov/2008:16:26:12 -0500] "HEAD /abc/res/prev1.gif
HTTP/1.0" 200 -
How do I manage these '-' without changing the logging pattern?
Thanks!
Don't do this at the lexer level, do it at the parser:
field: HOST l_user ....
l_user : L_USER | DEFAULT;
L_USER ('a'..'z'|'A'..'Z')+;
DEFAULT: '-';
Jim
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090127/0a05ad7d/attachment.html
More information about the antlr-interest
mailing list