[antlr-interest] Lexer consumes input but doesn't emit all tokens
Андрей Асеев
andron-eiu at mail.ru
Wed Aug 8 03:03:03 PDT 2012
Hello, Glenn.
With a closer look to your problem, I found reasons.
ANTLR algorith uses greedy method for parsing loops.
And ':' token activate inner loop in ALPHA_NUM and not backtrack on
unexpected EOF.
I hacked my brain for an hour to help you...
Only way I invented is:
Parse NAME_LITERALS without token ':'
NAME_LITERAL
: '\\'? ALPHA_NUM ( ( '_' | '-' | ALPHA_NUM )* ALPHA_NUM )? ;
Then, process lexer stream in your target language.
Somehow you may process it and change token sequence
<NAME_LITERAL>a, <COLON>col, <NAME_LITERAL>b
to
<NAME_LITERAL>(a+col+b)
or vice versa, allow token ':' on end
NAME_LITERAL
: '\\'? ALPHA_NUM ( ':' | '_' | '-' | ALPHA_NUM )*;
then process lexer stream and split last special chars to separated
tokens manually
<NAME_LITERAL>"abc:"
to
<NAME_LITERAL>"abc", <COLON>":"
08.08.2012 3:39, Glenn McGregor пишет:
> On 8/7/2012 3:05 PM, Андрей Асеев wrote:
>> It would be, at example, if you choose incorrect rule in ANTLRWorks
>> interpreter rule box. :)
>>
>> Or show there your gramar rule you use to parse test input.
> If i change the input to 'test:ack :', it parses just fine, and returns
> appropriate tokens.
>
>
> The grammar is about 500 lines long, and I tried to show just the
> hopefully relevant entries.
>
> But it the rule in AntlrWorks starts at my
>
> start_program
> : program EOF! ;
>
>
> I can post my grammar somewhere if it becomes necessary to pursue this.
>
>
> The output of the interpreter shows (in bad ascii art)
>
> <grammar Tal>
> start_program
> program
> <epsilon>
> t
>
> with the altered input, I get
>
> <grammar Tal>
> start_program
> program
> statement
> label_statement
> string_literal
> test:ack
> :
> <EOF>
>
> Thanks
>
> Glenn
>
>>> Given the partial grammar from a much larger...
>>>
>>>
>>> tokens { COLON = ':' }
>>>
>>> fragment
>>> ALPHA_NUM
>>> : 'A'..'Z' | 'a'..'z' | '0'..'9';
>>>
>>> NAME_LITERAL
>>> : '\\'? ALPHA_NUM ( ( ':' | '_' | '-' | ALPHA_NUM )* ALPHA_NUM )? ;
>>>
>>> ANY : . ;
>>>
>>>
>>>
>>> I would like the input
>>>
>>> test:ack:
>>>
>>> to arrive as two tokens, a NAME_LITERAL of 'test:ack', and a COLON.
>>>
>>> Instead, this input disappears entirely, but parses successfully.
>>>
>>> Any suggestions?
>>>
>>> Glenn McGregor
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list