[antlr-interest] each keyword allowed as Identifier

Wed Jan 6 02:36:01 PST 2010

Hi,

I try to parse a log file which probably was never intented to be
parsed. It is an log file of an poker client. My problem is that there
are nearly no constraints are existing for playernames.

A playername could be a sequens of any charactor of the full unicode
range. The only contraints are:

min  length = 4
max length = 12
no leading or trailing white space
white spaces in between are allowed, but never more than one in a row

Here are some examples:

INPUT:
Seat 9: The Player ( ($76 in chips)

Where the Playername is  "The Player ("

INPUT:

posts small:: posts small blind $2

Where the Playername is "posts small:"

I have no glue how to solve this problem. I already tried some stuff I
found in the FAQs like:

- syncing to the follow set (Article  Custom Syntax Error Recovery)
which dosnt work if a token of the follow set is also part of the name
- non greedy matching ( .+ to match the name)
- a list of all tokens in the rule playername which dosnt work because
the playername can consist not just of one token but an sequense of
tokens

Generelly it must be possible because out ther are severeal commercial
tools which are able to parse these log files. So I hope somebody of
you has an Idea.

Thanks and regards,
Christian