[antlr-interest] Parsing with a character that represents null

Priolo, Scott spriolo at walkerinfo.com
Tue Jan 27 05:59:42 PST 2009


I've tinkered with the idea Jim states below, but I'm afraid I don't
fully understand.  Don't do this at the lexer level, do it at the
parser.  (I think there is a learning curve I'm missing here).

 

Can you clarify?

 

Thanks,

Scott

 

From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of Jim Idle
Sent: Monday, January 26, 2009 12:25 PM
Cc: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Parsing with a character that represents
null

 

Priolo, Scott wrote: 

Hello,

 

I'm writing a grammar based on Apache Tomcat's Access Log format.  I'm
able to parse these lines into host, timestamp, command, protocol,
response, bytes, etc...

 

A problem occurs when the log file uses a '-' for "no data".  I'd like
to capture this as a "null" field when I walk the tree so it's important
not to treat '-' as white space.  The issue I'm having is '-' can match
any of the patterns such as host, username, timestamp, bytes, etc....
When I try to use

 

field : HOST L_USER A_USER TIMESTAMP '"' COMMAND PATH PROTOCOL '"'
RESPONSE_CODE BYTES;

 

L_USER : '-' | ('a'..'z'|'A'..'Z')+;

BYTES : '-' | ('0'..'9')+;

 

The parser can't distinguish if the '-' is L_USER or BYTES (see below
that the second test line has a '-' for the last number because there
was no data.

 

Actual data line

66.249.71.45 - - [25/Jan/2009:00:00:56 -0500] "GET
/abc/slides/180001.1.jsp HTTP/1.1" 200 11785

67.68.5.63 - - [10/Nov/2008:16:26:12 -0500] "HEAD /abc/res/prev1.gif
HTTP/1.0" 200 -

 

How do I manage these '-' without changing the logging pattern?

 

Thanks!

Don't do this at the lexer level, do it at the parser:

field: HOST l_user ....

l_user : L_USER | DEFAULT;

L_USER ('a'..'z'|'A'..'Z')+;
DEFAULT: '-';

Jim

-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20090127/0a05ad7d/attachment.html 


More information about the antlr-interest mailing list