[antlr-interest] Whitespace matching

Jason Jones jmjones5 at gmail.com
Fri Apr 13 04:06:09 PDT 2012


Stranger... Okay will I've done a manual test using this class:

import org.antlr.runtime.*;

public class Main {
          public static void main(String[] args) throws Exception {
              prologLexer lexer = new prologLexer(new
ANTLRStringStream("\r\nL\r\n"));
              prologParser parser = new prologParser(new
CommonTokenStream(lexer));
              parser.start();
          }
}

After running it like so:

$ java -cp .:/usr/local/antlr-3.4/lib/antlr-3.4-complete.jar Main
line 1:0 mismatched input '\r\n' expecting WHITESPACE

I still seem to be getting the same issue ^. Here's the current grammar
that I used to create the parser and lexer:

start : program EOF;
program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
line    :       'L';
query   :       'Q';

WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ;

Jason.

On 13 April 2012 07:12, Bart Kiers <bkiers at gmail.com> wrote:

> Both the interpreter and the debugger from ANTLRWorks (1.4.3) parse the
> input just fine.
>
> I'm assuming you're not entering "\r" and "\n" as literals, but are
> actually entering line breaks in the text areas of ANTLRWorks'
> interpreter... Perhaps you've selected ANTLRWorks to start parsing with a
> different rule than the `start` rule? Anyway, forget about ANTLRWorks for a
> moment and whip up a manual test:
>
> public class Main {
>   public static void main(String[] args) throws Exception {
>     TLexer lexer = new TLexer(new ANTLRStringStream("\r\nL\r\n"));
>     TParser parser = new TParser(new CommonTokenStream(lexer));
>     parser.start();
>   }
> }
>
>
> Bart.
>
>
> On Fri, Apr 13, 2012 at 12:09 AM, Jason Jones <jmjones5 at gmail.com> wrote:
>
>> Hi Bart,
>>
>> I thing we're using different version of ANTLR (or something along those
>> lines) as using your grammar I get a MismatchedTokenException using the
>> input you've used "\r\nL\r\n". I'm currently using ANTLRWorks version
>> 1.4.3, could this be the reason why your end seems to be working and mine
>> not?
>>
>> Jason.
>>
>>
>> On 12 April 2012 22:06, Bart Kiers <bkiers at gmail.com> wrote:
>>
>>> Hi Jason,
>>>
>>> Then there's something other than what you've posted going wrong, since
>>> the parser generated from:
>>>
>>> start      : program EOF;
>>> program    : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>> line       : 'L';
>>> query      : 'Q';
>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+;
>>>
>>> parses the input "\r\nL\r\n" just fine.
>>>
>>> Regards,
>>>
>>> Bart.
>>>
>>>
>>> On Thu, Apr 12, 2012 at 10:48 PM, Jason Jones <jmjones5 at gmail.com>wrote:
>>>
>>>> Hi Bart,
>>>>
>>>> Thanks for the suggestion, although it doesn't work either... The skip
>>>> option does work but since I'll be doing something with the whitespace
>>>> later I don't want to take this option. Is there something else we're
>>>> missing?
>>>>
>>>> Jason.
>>>>
>>>>
>>>> On 12 April 2012 19:10, Bart Kiers <bkiers at gmail.com> wrote:
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> On Thu, Apr 12, 2012 at 6:43 PM, Jason Jones <jmjones5 at gmail.com>wrote:
>>>>>
>>>>>> ...
>>>>>>
>>>>>>
>>>>>> start : program ;
>>>>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>>>>>
>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')* ; //currently only used in
>>>>>> string
>>>>>>
>>>>>>
>>>>> A lexer rule must always match something: if it can match zero chars,
>>>>> it can/will go in an infinite loop.
>>>>>
>>>>> Do something like this:
>>>>>
>>>>> start : program ;
>>>>> program : WHITESPACE? line+ WHITESPACE? (query WHITESPACE?)*;
>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ;
>>>>>
>>>>> or simply skip spaces like this:
>>>>>
>>>>> start : program ;
>>>>> program : line+ query*;
>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();} ;
>>>>>
>>>>> Regards,
>>>>>
>>>>> Bart.
>>>>>
>>>>
>>>>
>>>
>>
>


More information about the antlr-interest mailing list