[antlr-interest] Whitespace matching
Jason Jones
jmjones5 at gmail.com
Fri Apr 13 04:06:09 PDT 2012
Stranger... Okay will I've done a manual test using this class:
import org.antlr.runtime.*;
public class Main {
public static void main(String[] args) throws Exception {
prologLexer lexer = new prologLexer(new
ANTLRStringStream("\r\nL\r\n"));
prologParser parser = new prologParser(new
CommonTokenStream(lexer));
parser.start();
}
}
After running it like so:
$ java -cp .:/usr/local/antlr-3.4/lib/antlr-3.4-complete.jar Main
line 1:0 mismatched input '\r\n' expecting WHITESPACE
I still seem to be getting the same issue ^. Here's the current grammar
that I used to create the parser and lexer:
start : program EOF;
program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
line : 'L';
query : 'Q';
WHITESPACE : (' ' | '\t' | '\r' | '\n')+ ;
Jason.
On 13 April 2012 07:12, Bart Kiers <bkiers at gmail.com> wrote:
> Both the interpreter and the debugger from ANTLRWorks (1.4.3) parse the
> input just fine.
>
> I'm assuming you're not entering "\r" and "\n" as literals, but are
> actually entering line breaks in the text areas of ANTLRWorks'
> interpreter... Perhaps you've selected ANTLRWorks to start parsing with a
> different rule than the `start` rule? Anyway, forget about ANTLRWorks for a
> moment and whip up a manual test:
>
> public class Main {
> public static void main(String[] args) throws Exception {
> TLexer lexer = new TLexer(new ANTLRStringStream("\r\nL\r\n"));
> TParser parser = new TParser(new CommonTokenStream(lexer));
> parser.start();
> }
> }
>
>
> Bart.
>
>
> On Fri, Apr 13, 2012 at 12:09 AM, Jason Jones <jmjones5 at gmail.com> wrote:
>
>> Hi Bart,
>>
>> I thing we're using different version of ANTLR (or something along those
>> lines) as using your grammar I get a MismatchedTokenException using the
>> input you've used "\r\nL\r\n". I'm currently using ANTLRWorks version
>> 1.4.3, could this be the reason why your end seems to be working and mine
>> not?
>>
>> Jason.
>>
>>
>> On 12 April 2012 22:06, Bart Kiers <bkiers at gmail.com> wrote:
>>
>>> Hi Jason,
>>>
>>> Then there's something other than what you've posted going wrong, since
>>> the parser generated from:
>>>
>>> start : program EOF;
>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>> line : 'L';
>>> query : 'Q';
>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+;
>>>
>>> parses the input "\r\nL\r\n" just fine.
>>>
>>> Regards,
>>>
>>> Bart.
>>>
>>>
>>> On Thu, Apr 12, 2012 at 10:48 PM, Jason Jones <jmjones5 at gmail.com>wrote:
>>>
>>>> Hi Bart,
>>>>
>>>> Thanks for the suggestion, although it doesn't work either... The skip
>>>> option does work but since I'll be doing something with the whitespace
>>>> later I don't want to take this option. Is there something else we're
>>>> missing?
>>>>
>>>> Jason.
>>>>
>>>>
>>>> On 12 April 2012 19:10, Bart Kiers <bkiers at gmail.com> wrote:
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> On Thu, Apr 12, 2012 at 6:43 PM, Jason Jones <jmjones5 at gmail.com>wrote:
>>>>>
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> start : program ;
>>>>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>>>>>
>>>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')* ; //currently only used in
>>>>>> string
>>>>>>
>>>>>>
>>>>> A lexer rule must always match something: if it can match zero chars,
>>>>> it can/will go in an infinite loop.
>>>>>
>>>>> Do something like this:
>>>>>
>>>>> start : program ;
>>>>> program : WHITESPACE? line+ WHITESPACE? (query WHITESPACE?)*;
>>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+ ;
>>>>>
>>>>> or simply skip spaces like this:
>>>>>
>>>>> start : program ;
>>>>> program : line+ query*;
>>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+ {skip();} ;
>>>>>
>>>>> Regards,
>>>>>
>>>>> Bart.
>>>>>
>>>>
>>>>
>>>
>>
>
More information about the antlr-interest
mailing list