[antlr-interest] Whitespace matching
Jim Idle
jimi at temporal-wave.com
Fri Apr 13 17:10:58 PDT 2012
Did you read my reply?
On Apr 13, 2012, at 3:55 PM, Jason Jones <jmjones5 at gmail.com> wrote:
> Yeah thanks, looks a bit better and definitely makes more sense, but still
> having the weird whitespace mismatch issue... :S
>
> On 13 April 2012 14:34, Charles Daniels <cjdaniels4 at gmail.com> wrote:
>
>> Try the following changes (note that some of your parser rules become
>> lexer rules):
>>
>> atom : SMALL_ATOM | STRING;
>>
>> COMMENT : '% ' ~('\n'|'\r')* '\r'? '\n' | '/*' ( options {greedy=false;} :
>> . )* '*/' ;
>> SMALL_ATOM : LOWERCASE_LETTER CHARACTER* ;
>> VARIABLE : UPPERCASE_LETTER CHARACTER* ;
>> NUMERAL : DIGIT+ ;
>> STRING : '"' (CHARACTER | WHITESPACE)* '"' ;
>>
>> fragment CHARACTER : LOWERCASE_LETTER | UPPERCASE_LETTER | DIGIT | SPECIAL
>> ;
>> fragment LOWERCASE_LETTER : 'a' .. 'z' ;
>> fragment UPPERCASE_LETTER : 'A' .. 'Z' | '_' ;
>> fragment DIGIT : '0' .. '9' ;
>> fragment SPECIAL : '+' | '-' | '*' | '/' | '\\' | '^' | '~' | ':' | '.' |
>> '?' | '@' | '#' | '$' | '&' ;
>>
>>
>> I haven't tested this, but it should get you closer to what you need, if
>> it doesn't completely address the issue.
>>
>> Regards,
>> Chuck
>>
>> On Fri, Apr 13, 2012 at 9:03 AM, Jason Jones <jmjones5 at gmail.com> wrote:
>>
>>> Ah, I see. I think I get what's been happening (whether I understand it is
>>> a different matter) there must be something else in the prolog grammar of
>>> mine that's changing the behaviour of the lexer/parser. I assumed that if
>>> I
>>> just added the rules you have that it would work the same as yours but
>>> apparently not. Here's the full grammar that I've been playing with:
>>>
>>> //TODO: Add grammar for operators
>>> //TODO: Add grammar for lists - DONE
>>> //TODO: Add grammar for comments - DONE
>>> //TODO: Add grammar for whitespace
>>>
>>> grammar prolog;
>>>
>>> //options {
>>> //output=template;
>>> //rewrite=true;
>>> //}
>>>
>>> start : program EOF;
>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>> line : 'L';
>>> query : 'Q';
>>> //line : clause | comment ;
>>> comment : '% ' string '\r\n' | '/*' string '*/' ; //Doesn't allow commas,
>>> parenthese, square brakets, etc. in comments. Consider fixing!
>>> //Another issue being how the single line comment is ended is it
>>> determined
>>> by the newline character?
>>> clause : predicate ('.' | ':-' predicate_list '.') ;
>>> predicate : atom | atom '(' term_list ')' ;
>>> predicate_list : predicate (',' predicate)* ;
>>> list : '[' term_list ('|' term)? ']' ;
>>>
>>> structure : atom '(' term_list ')' ;
>>> term_list : term (',' term)* ;
>>>
>>> //query : '?-' predicate_list '.' ;
>>>
>>> term : numeral | atom | variable | structure | list ;
>>> atom : small_atom | '\'' string '\'';
>>> small_atom : LOWERCASE_LETTER character*;
>>> variable : UPPERCASE_LETTER character* ;
>>> numeral : DIGIT+ ;
>>> character : LOWERCASE_LETTER | UPPERCASE_LETTER | DIGIT | SPECIAL ;
>>> string : character+ (WHITESPACE+ character+)* ;
>>>
>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+ ; //currently only used in
>>> string
>>> //NEWLINE : '\r\n' | '\n' ;
>>> LOWERCASE_LETTER : 'a' .. 'z' ;
>>> UPPERCASE_LETTER : 'A' .. 'Z' | '_' ;
>>> DIGIT : '0' .. '9' ;
>>> SPECIAL : '+' | '-' | '*' | '/' | '\\' | '^' | '~' | ':' | '.' | '?' | '@'
>>> | '#' | '$' | '&' ;
>>>
>>> So when I create a grammar just including the rules you've suggested it
>>> works fine but why when I use the same rules in this grammar does it not
>>> work?
>>>
>>> Jason.
>>>
>>> On 13 April 2012 12:39, Bart Kiers <bkiers at gmail.com> wrote:
>>>
>>>> You must be doing something wrong/different. Perhaps you're running an
>>> old
>>>> .class file?
>>>> I copied your prolog.g grammar and Main.java file and did this:
>>>>
>>>> wget http://www.antlr.org/download/antlr-3.4-complete.jar
>>>> java -cp antlr-3.4-complete.jar org.antlr.Tool prolog.g
>>>> javac -cp antlr-3.4-complete.jar *.java
>>>> java -cp .:antlr-3.4-complete.jar Main
>>>>
>>>> which didn't produce any error or warning.
>>>>
>>>> Regards,
>>>>
>>>> Bart.
>>>>
>>>>
>>>>
>>>> On Fri, Apr 13, 2012 at 1:06 PM, Jason Jones <jmjones5 at gmail.com>
>>> wrote:
>>>>
>>>>> Stranger... Okay will I've done a manual test using this class:
>>>>>
>>>>> import org.antlr.runtime.*;
>>>>>
>>>>>
>>>>> public class Main {
>>>>> public static void main(String[] args) throws Exception {
>>>>> prologLexer lexer = new prologLexer(new
>>>>> ANTLRStringStream("\r\nL\r\n"));
>>>>> prologParser parser = new prologParser(new
>>>>> CommonTokenStream(lexer));
>>>>> parser.start();
>>>>> }
>>>>> }
>>>>>
>>>>> After running it like so:
>>>>>
>>>>> $ java -cp .:/usr/local/antlr-3.4/lib/antlr-3.4-complete.jar Main
>>>>> line 1:0 mismatched input '\r\n' expecting WHITESPACE
>>>>>
>>>>> I still seem to be getting the same issue ^. Here's the current grammar
>>>>> that I used to create the parser and lexer:
>>>>>
>>>>>
>>>>> start : program EOF;
>>>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>>>> line : 'L';
>>>>> query : 'Q';
>>>>>
>>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+ ;
>>>>>
>>>>> Jason.
>>>>>
>>>>>
>>>>> On 13 April 2012 07:12, Bart Kiers <bkiers at gmail.com> wrote:
>>>>>
>>>>>> Both the interpreter and the debugger from ANTLRWorks (1.4.3) parse
>>> the
>>>>>> input just fine.
>>>>>>
>>>>>> I'm assuming you're not entering "\r" and "\n" as literals, but are
>>>>>> actually entering line breaks in the text areas of ANTLRWorks'
>>>>>> interpreter... Perhaps you've selected ANTLRWorks to start parsing
>>> with a
>>>>>> different rule than the `start` rule? Anyway, forget about ANTLRWorks
>>> for a
>>>>>> moment and whip up a manual test:
>>>>>>
>>>>>> public class Main {
>>>>>> public static void main(String[] args) throws Exception {
>>>>>> TLexer lexer = new TLexer(new ANTLRStringStream("\r\nL\r\n"));
>>>>>> TParser parser = new TParser(new CommonTokenStream(lexer));
>>>>>> parser.start();
>>>>>> }
>>>>>> }
>>>>>>
>>>>>>
>>>>>> Bart.
>>>>>>
>>>>>>
>>>>>> On Fri, Apr 13, 2012 at 12:09 AM, Jason Jones <jmjones5 at gmail.com
>>>> wrote:
>>>>>>
>>>>>>> Hi Bart,
>>>>>>>
>>>>>>> I thing we're using different version of ANTLR (or something along
>>>>>>> those lines) as using your grammar I get a MismatchedTokenException
>>> using
>>>>>>> the input you've used "\r\nL\r\n". I'm currently using ANTLRWorks
>>> version
>>>>>>> 1.4.3, could this be the reason why your end seems to be working and
>>> mine
>>>>>>> not?
>>>>>>>
>>>>>>> Jason.
>>>>>>>
>>>>>>>
>>>>>>> On 12 April 2012 22:06, Bart Kiers <bkiers at gmail.com> wrote:
>>>>>>>
>>>>>>>> Hi Jason,
>>>>>>>>
>>>>>>>> Then there's something other than what you've posted going wrong,
>>>>>>>> since the parser generated from:
>>>>>>>>
>>>>>>>> start : program EOF;
>>>>>>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>>>>>>> line : 'L';
>>>>>>>> query : 'Q';
>>>>>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+;
>>>>>>>>
>>>>>>>> parses the input "\r\nL\r\n" just fine.
>>>>>>>>
>>>>>>>> Regards,
>>>>>>>>
>>>>>>>> Bart.
>>>>>>>>
>>>>>>>>
>>>>>>>> On Thu, Apr 12, 2012 at 10:48 PM, Jason Jones <jmjones5 at gmail.com
>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Bart,
>>>>>>>>>
>>>>>>>>> Thanks for the suggestion, although it doesn't work either... The
>>>>>>>>> skip option does work but since I'll be doing something with the
>>> whitespace
>>>>>>>>> later I don't want to take this option. Is there something else
>>> we're
>>>>>>>>> missing?
>>>>>>>>>
>>>>>>>>> Jason.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> On 12 April 2012 19:10, Bart Kiers <bkiers at gmail.com> wrote:
>>>>>>>>>
>>>>>>>>>> Hi Jason,
>>>>>>>>>>
>>>>>>>>>> On Thu, Apr 12, 2012 at 6:43 PM, Jason Jones <jmjones5 at gmail.com
>>>> wrote:
>>>>>>>>>>
>>>>>>>>>>> ...
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> start : program ;
>>>>>>>>>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>>>>>>>>>>>
>>>>>>>>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')* ; //currently only used
>>>>>>>>>>> in string
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> A lexer rule must always match something: if it can match zero
>>>>>>>>>> chars, it can/will go in an infinite loop.
>>>>>>>>>>
>>>>>>>>>> Do something like this:
>>>>>>>>>>
>>>>>>>>>> start : program ;
>>>>>>>>>> program : WHITESPACE? line+ WHITESPACE? (query WHITESPACE?)*;
>>>>>>>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+ ;
>>>>>>>>>>
>>>>>>>>>> or simply skip spaces like this:
>>>>>>>>>>
>>>>>>>>>> start : program ;
>>>>>>>>>> program : line+ query*;
>>>>>>>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+ {skip();} ;
>>>>>>>>>>
>>>>>>>>>> Regards,
>>>>>>>>>>
>>>>>>>>>> Bart.
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>>
>>>>
>>>
>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>> Unsubscribe:
>>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>
>>
>>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
More information about the antlr-interest
mailing list