[antlr-interest] Whitespace matching

Fri Apr 13 17:10:58 PDT 2012

Did you read my reply?

On Apr 13, 2012, at 3:55 PM, Jason Jones <jmjones5 at gmail.com> wrote:

> Yeah thanks, looks a bit better and definitely makes more sense, but still
> having the weird whitespace mismatch issue... :S
> 
> On 13 April 2012 14:34, Charles Daniels <cjdaniels4 at gmail.com> wrote:
> 
>> Try the following changes (note that some of your parser rules become
>> lexer rules):
>> 
>> atom : SMALL_ATOM | STRING;
>> 
>> COMMENT : '% ' ~('\n'|'\r')* '\r'? '\n' | '/*' ( options {greedy=false;} :
>> . )* '*/' ;
>> SMALL_ATOM : LOWERCASE_LETTER CHARACTER* ;
>> VARIABLE : UPPERCASE_LETTER CHARACTER* ;
>> NUMERAL : DIGIT+ ;
>> STRING : '"' (CHARACTER | WHITESPACE)* '"' ;
>> 
>> fragment CHARACTER : LOWERCASE_LETTER | UPPERCASE_LETTER | DIGIT | SPECIAL
>> ;
>> fragment LOWERCASE_LETTER : 'a' .. 'z' ;
>> fragment UPPERCASE_LETTER : 'A' .. 'Z' | '_' ;
>> fragment DIGIT : '0' .. '9' ;
>> fragment SPECIAL : '+' | '-' | '*' | '/' | '\\' | '^' | '~' | ':' | '.' |
>> '?' | '@' | '#' | '$' | '&' ;
>> 
>> 
>> I haven't tested this, but it should get you closer to what you need, if
>> it doesn't completely address the issue.
>> 
>> Regards,
>> Chuck
>> 
>> On Fri, Apr 13, 2012 at 9:03 AM, Jason Jones <jmjones5 at gmail.com> wrote:
>> 
>>> Ah, I see. I think I get what's been happening (whether I understand it is
>>> a different matter) there must be something else in the prolog grammar of
>>> mine that's changing the behaviour of the lexer/parser. I assumed that if
>>> I
>>> just added the rules you have that it would work the same as yours but
>>> apparently not. Here's the full grammar that I've been playing with:
>>> 
>>> //TODO: Add grammar for operators
>>> //TODO: Add grammar for lists - DONE
>>> //TODO: Add grammar for comments - DONE
>>> //TODO: Add grammar for whitespace
>>> 
>>> grammar prolog;
>>> 
>>> //options {
>>> //output=template;
>>> //rewrite=true;
>>> //}
>>> 
>>> start : program EOF;
>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>> line    :    'L';
>>> query    :    'Q';
>>> //line : clause | comment ;
>>> comment : '% ' string '\r\n' | '/*' string '*/' ; //Doesn't allow commas,
>>> parenthese, square brakets, etc. in comments. Consider fixing!
>>> //Another issue being how the single line comment is ended is it
>>> determined
>>> by the newline character?
>>> clause : predicate ('.' | ':-' predicate_list '.') ;
>>> predicate : atom | atom '(' term_list ')' ;
>>> predicate_list : predicate (',' predicate)* ;
>>> list : '[' term_list ('|' term)? ']' ;
>>> 
>>> structure : atom '(' term_list ')' ;
>>> term_list : term (',' term)* ;
>>> 
>>> //query : '?-' predicate_list '.' ;
>>> 
>>> term : numeral | atom | variable | structure | list ;
>>> atom : small_atom | '\'' string '\'';
>>> small_atom : LOWERCASE_LETTER character*;
>>> variable : UPPERCASE_LETTER character* ;
>>> numeral : DIGIT+ ;
>>> character : LOWERCASE_LETTER | UPPERCASE_LETTER | DIGIT | SPECIAL ;
>>> string : character+ (WHITESPACE+ character+)* ;
>>> 
>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ; //currently only used in
>>> string
>>> //NEWLINE : '\r\n' | '\n' ;
>>> LOWERCASE_LETTER : 'a' .. 'z' ;
>>> UPPERCASE_LETTER : 'A' .. 'Z' | '_' ;
>>> DIGIT : '0' .. '9' ;
>>> SPECIAL : '+' | '-' | '*' | '/' | '\\' | '^' | '~' | ':' | '.' | '?' | '@'
>>> | '#' | '$' | '&' ;
>>> 
>>> So when I create a grammar just including the rules you've suggested it
>>> works fine but why when I use the same rules in this grammar does it not
>>> work?
>>> 
>>> Jason.
>>> 
>>> On 13 April 2012 12:39, Bart Kiers <bkiers at gmail.com> wrote:
>>> 
>>>> You must be doing something wrong/different. Perhaps you're running an
>>> old
>>>> .class file?
>>>> I copied your prolog.g grammar and Main.java file and did this:
>>>> 
>>>> wget http://www.antlr.org/download/antlr-3.4-complete.jar
>>>> java -cp antlr-3.4-complete.jar org.antlr.Tool prolog.g
>>>> javac -cp antlr-3.4-complete.jar *.java
>>>> java -cp .:antlr-3.4-complete.jar Main
>>>> 
>>>> which didn't produce any error or warning.
>>>> 
>>>> Regards,
>>>> 
>>>> Bart.
>>>> 
>>>> 
>>>> 
>>>> On Fri, Apr 13, 2012 at 1:06 PM, Jason Jones <jmjones5 at gmail.com>
>>> wrote:
>>>> 
>>>>> Stranger... Okay will I've done a manual test using this class:
>>>>> 
>>>>> import org.antlr.runtime.*;
>>>>> 
>>>>> 
>>>>> public class Main {
>>>>>          public static void main(String[] args) throws Exception {
>>>>>               prologLexer lexer = new prologLexer(new
>>>>> ANTLRStringStream("\r\nL\r\n"));
>>>>>              prologParser parser = new prologParser(new
>>>>> CommonTokenStream(lexer));
>>>>>              parser.start();
>>>>>          }
>>>>> }
>>>>> 
>>>>> After running it like so:
>>>>> 
>>>>> $ java -cp .:/usr/local/antlr-3.4/lib/antlr-3.4-complete.jar Main
>>>>> line 1:0 mismatched input '\r\n' expecting WHITESPACE
>>>>> 
>>>>> I still seem to be getting the same issue ^. Here's the current grammar
>>>>> that I used to create the parser and lexer:
>>>>> 
>>>>> 
>>>>> start : program EOF;
>>>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>>>> line    :       'L';
>>>>> query   :       'Q';
>>>>> 
>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ;
>>>>> 
>>>>> Jason.
>>>>> 
>>>>> 
>>>>> On 13 April 2012 07:12, Bart Kiers <bkiers at gmail.com> wrote:
>>>>> 
>>>>>> Both the interpreter and the debugger from ANTLRWorks (1.4.3) parse
>>> the
>>>>>> input just fine.
>>>>>> 
>>>>>> I'm assuming you're not entering "\r" and "\n" as literals, but are
>>>>>> actually entering line breaks in the text areas of ANTLRWorks'
>>>>>> interpreter... Perhaps you've selected ANTLRWorks to start parsing
>>> with a
>>>>>> different rule than the `start` rule? Anyway, forget about ANTLRWorks
>>> for a
>>>>>> moment and whip up a manual test:
>>>>>> 
>>>>>> public class Main {
>>>>>>  public static void main(String[] args) throws Exception {
>>>>>>    TLexer lexer = new TLexer(new ANTLRStringStream("\r\nL\r\n"));
>>>>>>    TParser parser = new TParser(new CommonTokenStream(lexer));
>>>>>>    parser.start();
>>>>>>  }
>>>>>> }
>>>>>> 
>>>>>> 
>>>>>> Bart.
>>>>>> 
>>>>>> 
>>>>>> On Fri, Apr 13, 2012 at 12:09 AM, Jason Jones <jmjones5 at gmail.com
>>>> wrote:
>>>>>> 
>>>>>>> Hi Bart,
>>>>>>> 
>>>>>>> I thing we're using different version of ANTLR (or something along
>>>>>>> those lines) as using your grammar I get a MismatchedTokenException
>>> using
>>>>>>> the input you've used "\r\nL\r\n". I'm currently using ANTLRWorks
>>> version
>>>>>>> 1.4.3, could this be the reason why your end seems to be working and
>>> mine
>>>>>>> not?
>>>>>>> 
>>>>>>> Jason.
>>>>>>> 
>>>>>>> 
>>>>>>> On 12 April 2012 22:06, Bart Kiers <bkiers at gmail.com> wrote:
>>>>>>> 
>>>>>>>> Hi Jason,
>>>>>>>> 
>>>>>>>> Then there's something other than what you've posted going wrong,
>>>>>>>> since the parser generated from:
>>>>>>>> 
>>>>>>>> start      : program EOF;
>>>>>>>> program    : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>>>>>>> line       : 'L';
>>>>>>>> query      : 'Q';
>>>>>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+;
>>>>>>>> 
>>>>>>>> parses the input "\r\nL\r\n" just fine.
>>>>>>>> 
>>>>>>>> Regards,
>>>>>>>> 
>>>>>>>> Bart.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> On Thu, Apr 12, 2012 at 10:48 PM, Jason Jones <jmjones5 at gmail.com
>>>> wrote:
>>>>>>>> 
>>>>>>>>> Hi Bart,
>>>>>>>>> 
>>>>>>>>> Thanks for the suggestion, although it doesn't work either... The
>>>>>>>>> skip option does work but since I'll be doing something with the
>>> whitespace
>>>>>>>>> later I don't want to take this option. Is there something else
>>> we're
>>>>>>>>> missing?
>>>>>>>>> 
>>>>>>>>> Jason.
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> On 12 April 2012 19:10, Bart Kiers <bkiers at gmail.com> wrote:
>>>>>>>>> 
>>>>>>>>>> Hi Jason,
>>>>>>>>>> 
>>>>>>>>>> On Thu, Apr 12, 2012 at 6:43 PM, Jason Jones <jmjones5 at gmail.com
>>>> wrote:
>>>>>>>>>> 
>>>>>>>>>>> ...
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>>> start : program ;
>>>>>>>>>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>>>>>>>>>> 
>>>>>>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')* ; //currently only used
>>>>>>>>>>> in string
>>>>>>>>>>> 
>>>>>>>>>>> 
>>>>>>>>>> A lexer rule must always match something: if it can match zero
>>>>>>>>>> chars, it can/will go in an infinite loop.
>>>>>>>>>> 
>>>>>>>>>> Do something like this:
>>>>>>>>>> 
>>>>>>>>>> start : program ;
>>>>>>>>>> program : WHITESPACE? line+ WHITESPACE? (query WHITESPACE?)*;
>>>>>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ;
>>>>>>>>>> 
>>>>>>>>>> or simply skip spaces like this:
>>>>>>>>>> 
>>>>>>>>>> start : program ;
>>>>>>>>>> program : line+ query*;
>>>>>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();} ;
>>>>>>>>>> 
>>>>>>>>>> Regards,
>>>>>>>>>> 
>>>>>>>>>> Bart.
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> 
>>>>>> 
>>>>> 
>>>> 
>>> 
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe:
>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>> 
>> 
>> 
> 
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address