[antlr-interest] Whitespace matching

Jason Jones jmjones5 at gmail.com
Fri Apr 13 06:03:15 PDT 2012


Ah, I see. I think I get what's been happening (whether I understand it is
a different matter) there must be something else in the prolog grammar of
mine that's changing the behaviour of the lexer/parser. I assumed that if I
just added the rules you have that it would work the same as yours but
apparently not. Here's the full grammar that I've been playing with:

//TODO: Add grammar for operators
//TODO: Add grammar for lists - DONE
//TODO: Add grammar for comments - DONE
//TODO: Add grammar for whitespace

grammar prolog;

//options {
//output=template;
//rewrite=true;
//}

start : program EOF;
program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
line    :    'L';
query    :    'Q';
//line : clause | comment ;
comment : '% ' string '\r\n' | '/*' string '*/' ; //Doesn't allow commas,
parenthese, square brakets, etc. in comments. Consider fixing!
//Another issue being how the single line comment is ended is it determined
by the newline character?
clause : predicate ('.' | ':-' predicate_list '.') ;
predicate : atom | atom '(' term_list ')' ;
predicate_list : predicate (',' predicate)* ;
list : '[' term_list ('|' term)? ']' ;

structure : atom '(' term_list ')' ;
term_list : term (',' term)* ;

//query : '?-' predicate_list '.' ;

term : numeral | atom | variable | structure | list ;
atom : small_atom | '\'' string '\'';
small_atom : LOWERCASE_LETTER character*;
variable : UPPERCASE_LETTER character* ;
numeral : DIGIT+ ;
character : LOWERCASE_LETTER | UPPERCASE_LETTER | DIGIT | SPECIAL ;
string : character+ (WHITESPACE+ character+)* ;

WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ; //currently only used in string
//NEWLINE : '\r\n' | '\n' ;
LOWERCASE_LETTER : 'a' .. 'z' ;
UPPERCASE_LETTER : 'A' .. 'Z' | '_' ;
DIGIT : '0' .. '9' ;
SPECIAL : '+' | '-' | '*' | '/' | '\\' | '^' | '~' | ':' | '.' | '?' | '@'
| '#' | '$' | '&' ;

So when I create a grammar just including the rules you've suggested it
works fine but why when I use the same rules in this grammar does it not
work?

Jason.

On 13 April 2012 12:39, Bart Kiers <bkiers at gmail.com> wrote:

> You must be doing something wrong/different. Perhaps you're running an old
> .class file?
> I copied your prolog.g grammar and Main.java file and did this:
>
> wget http://www.antlr.org/download/antlr-3.4-complete.jar
> java -cp antlr-3.4-complete.jar org.antlr.Tool prolog.g
> javac -cp antlr-3.4-complete.jar *.java
> java -cp .:antlr-3.4-complete.jar Main
>
> which didn't produce any error or warning.
>
> Regards,
>
> Bart.
>
>
>
> On Fri, Apr 13, 2012 at 1:06 PM, Jason Jones <jmjones5 at gmail.com> wrote:
>
>> Stranger... Okay will I've done a manual test using this class:
>>
>> import org.antlr.runtime.*;
>>
>>
>> public class Main {
>>           public static void main(String[] args) throws Exception {
>>                prologLexer lexer = new prologLexer(new
>> ANTLRStringStream("\r\nL\r\n"));
>>               prologParser parser = new prologParser(new
>> CommonTokenStream(lexer));
>>               parser.start();
>>           }
>> }
>>
>> After running it like so:
>>
>> $ java -cp .:/usr/local/antlr-3.4/lib/antlr-3.4-complete.jar Main
>> line 1:0 mismatched input '\r\n' expecting WHITESPACE
>>
>> I still seem to be getting the same issue ^. Here's the current grammar
>> that I used to create the parser and lexer:
>>
>>
>> start : program EOF;
>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>> line    :       'L';
>> query   :       'Q';
>>
>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ;
>>
>> Jason.
>>
>>
>> On 13 April 2012 07:12, Bart Kiers <bkiers at gmail.com> wrote:
>>
>>> Both the interpreter and the debugger from ANTLRWorks (1.4.3) parse the
>>> input just fine.
>>>
>>> I'm assuming you're not entering "\r" and "\n" as literals, but are
>>> actually entering line breaks in the text areas of ANTLRWorks'
>>> interpreter... Perhaps you've selected ANTLRWorks to start parsing with a
>>> different rule than the `start` rule? Anyway, forget about ANTLRWorks for a
>>> moment and whip up a manual test:
>>>
>>> public class Main {
>>>   public static void main(String[] args) throws Exception {
>>>     TLexer lexer = new TLexer(new ANTLRStringStream("\r\nL\r\n"));
>>>     TParser parser = new TParser(new CommonTokenStream(lexer));
>>>     parser.start();
>>>   }
>>> }
>>>
>>>
>>> Bart.
>>>
>>>
>>> On Fri, Apr 13, 2012 at 12:09 AM, Jason Jones <jmjones5 at gmail.com>wrote:
>>>
>>>> Hi Bart,
>>>>
>>>> I thing we're using different version of ANTLR (or something along
>>>> those lines) as using your grammar I get a MismatchedTokenException using
>>>> the input you've used "\r\nL\r\n". I'm currently using ANTLRWorks version
>>>> 1.4.3, could this be the reason why your end seems to be working and mine
>>>> not?
>>>>
>>>> Jason.
>>>>
>>>>
>>>> On 12 April 2012 22:06, Bart Kiers <bkiers at gmail.com> wrote:
>>>>
>>>>> Hi Jason,
>>>>>
>>>>> Then there's something other than what you've posted going wrong,
>>>>> since the parser generated from:
>>>>>
>>>>> start      : program EOF;
>>>>> program    : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>>>> line       : 'L';
>>>>> query      : 'Q';
>>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+;
>>>>>
>>>>> parses the input "\r\nL\r\n" just fine.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Bart.
>>>>>
>>>>>
>>>>> On Thu, Apr 12, 2012 at 10:48 PM, Jason Jones <jmjones5 at gmail.com>wrote:
>>>>>
>>>>>> Hi Bart,
>>>>>>
>>>>>> Thanks for the suggestion, although it doesn't work either... The
>>>>>> skip option does work but since I'll be doing something with the whitespace
>>>>>> later I don't want to take this option. Is there something else we're
>>>>>> missing?
>>>>>>
>>>>>> Jason.
>>>>>>
>>>>>>
>>>>>> On 12 April 2012 19:10, Bart Kiers <bkiers at gmail.com> wrote:
>>>>>>
>>>>>>> Hi Jason,
>>>>>>>
>>>>>>> On Thu, Apr 12, 2012 at 6:43 PM, Jason Jones <jmjones5 at gmail.com>wrote:
>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>
>>>>>>>> start : program ;
>>>>>>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>>>>>>>
>>>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')* ; //currently only used
>>>>>>>> in string
>>>>>>>>
>>>>>>>>
>>>>>>> A lexer rule must always match something: if it can match zero
>>>>>>> chars, it can/will go in an infinite loop.
>>>>>>>
>>>>>>> Do something like this:
>>>>>>>
>>>>>>> start : program ;
>>>>>>> program : WHITESPACE? line+ WHITESPACE? (query WHITESPACE?)*;
>>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ;
>>>>>>>
>>>>>>> or simply skip spaces like this:
>>>>>>>
>>>>>>> start : program ;
>>>>>>> program : line+ query*;
>>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();} ;
>>>>>>>
>>>>>>> Regards,
>>>>>>>
>>>>>>> Bart.
>>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>
>


More information about the antlr-interest mailing list