[antlr-interest] Whitespace matching

Jason Jones jmjones5 at gmail.com
Fri Apr 13 15:55:00 PDT 2012


Yeah thanks, looks a bit better and definitely makes more sense, but still
having the weird whitespace mismatch issue... :S

On 13 April 2012 14:34, Charles Daniels <cjdaniels4 at gmail.com> wrote:

> Try the following changes (note that some of your parser rules become
> lexer rules):
>
> atom : SMALL_ATOM | STRING;
>
> COMMENT : '% ' ~('\n'|'\r')* '\r'? '\n' | '/*' ( options {greedy=false;} :
> . )* '*/' ;
> SMALL_ATOM : LOWERCASE_LETTER CHARACTER* ;
> VARIABLE : UPPERCASE_LETTER CHARACTER* ;
> NUMERAL : DIGIT+ ;
> STRING : '"' (CHARACTER | WHITESPACE)* '"' ;
>
> fragment CHARACTER : LOWERCASE_LETTER | UPPERCASE_LETTER | DIGIT | SPECIAL
> ;
> fragment LOWERCASE_LETTER : 'a' .. 'z' ;
> fragment UPPERCASE_LETTER : 'A' .. 'Z' | '_' ;
> fragment DIGIT : '0' .. '9' ;
> fragment SPECIAL : '+' | '-' | '*' | '/' | '\\' | '^' | '~' | ':' | '.' |
> '?' | '@' | '#' | '$' | '&' ;
>
>
> I haven't tested this, but it should get you closer to what you need, if
> it doesn't completely address the issue.
>
> Regards,
> Chuck
>
> On Fri, Apr 13, 2012 at 9:03 AM, Jason Jones <jmjones5 at gmail.com> wrote:
>
>> Ah, I see. I think I get what's been happening (whether I understand it is
>> a different matter) there must be something else in the prolog grammar of
>> mine that's changing the behaviour of the lexer/parser. I assumed that if
>> I
>> just added the rules you have that it would work the same as yours but
>> apparently not. Here's the full grammar that I've been playing with:
>>
>> //TODO: Add grammar for operators
>> //TODO: Add grammar for lists - DONE
>> //TODO: Add grammar for comments - DONE
>> //TODO: Add grammar for whitespace
>>
>> grammar prolog;
>>
>> //options {
>> //output=template;
>> //rewrite=true;
>> //}
>>
>> start : program EOF;
>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>> line    :    'L';
>> query    :    'Q';
>> //line : clause | comment ;
>> comment : '% ' string '\r\n' | '/*' string '*/' ; //Doesn't allow commas,
>> parenthese, square brakets, etc. in comments. Consider fixing!
>> //Another issue being how the single line comment is ended is it
>> determined
>> by the newline character?
>> clause : predicate ('.' | ':-' predicate_list '.') ;
>> predicate : atom | atom '(' term_list ')' ;
>> predicate_list : predicate (',' predicate)* ;
>> list : '[' term_list ('|' term)? ']' ;
>>
>> structure : atom '(' term_list ')' ;
>> term_list : term (',' term)* ;
>>
>> //query : '?-' predicate_list '.' ;
>>
>> term : numeral | atom | variable | structure | list ;
>> atom : small_atom | '\'' string '\'';
>> small_atom : LOWERCASE_LETTER character*;
>> variable : UPPERCASE_LETTER character* ;
>> numeral : DIGIT+ ;
>> character : LOWERCASE_LETTER | UPPERCASE_LETTER | DIGIT | SPECIAL ;
>> string : character+ (WHITESPACE+ character+)* ;
>>
>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ; //currently only used in
>> string
>> //NEWLINE : '\r\n' | '\n' ;
>> LOWERCASE_LETTER : 'a' .. 'z' ;
>> UPPERCASE_LETTER : 'A' .. 'Z' | '_' ;
>> DIGIT : '0' .. '9' ;
>> SPECIAL : '+' | '-' | '*' | '/' | '\\' | '^' | '~' | ':' | '.' | '?' | '@'
>> | '#' | '$' | '&' ;
>>
>> So when I create a grammar just including the rules you've suggested it
>> works fine but why when I use the same rules in this grammar does it not
>> work?
>>
>> Jason.
>>
>> On 13 April 2012 12:39, Bart Kiers <bkiers at gmail.com> wrote:
>>
>> > You must be doing something wrong/different. Perhaps you're running an
>> old
>> > .class file?
>> > I copied your prolog.g grammar and Main.java file and did this:
>> >
>> > wget http://www.antlr.org/download/antlr-3.4-complete.jar
>> > java -cp antlr-3.4-complete.jar org.antlr.Tool prolog.g
>> > javac -cp antlr-3.4-complete.jar *.java
>> > java -cp .:antlr-3.4-complete.jar Main
>> >
>> > which didn't produce any error or warning.
>> >
>> > Regards,
>> >
>> > Bart.
>> >
>> >
>> >
>> > On Fri, Apr 13, 2012 at 1:06 PM, Jason Jones <jmjones5 at gmail.com>
>> wrote:
>> >
>> >> Stranger... Okay will I've done a manual test using this class:
>> >>
>> >> import org.antlr.runtime.*;
>> >>
>> >>
>> >> public class Main {
>> >>           public static void main(String[] args) throws Exception {
>> >>                prologLexer lexer = new prologLexer(new
>> >> ANTLRStringStream("\r\nL\r\n"));
>> >>               prologParser parser = new prologParser(new
>> >> CommonTokenStream(lexer));
>> >>               parser.start();
>> >>           }
>> >> }
>> >>
>> >> After running it like so:
>> >>
>> >> $ java -cp .:/usr/local/antlr-3.4/lib/antlr-3.4-complete.jar Main
>> >> line 1:0 mismatched input '\r\n' expecting WHITESPACE
>> >>
>> >> I still seem to be getting the same issue ^. Here's the current grammar
>> >> that I used to create the parser and lexer:
>> >>
>> >>
>> >> start : program EOF;
>> >> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>> >> line    :       'L';
>> >> query   :       'Q';
>> >>
>> >> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ;
>> >>
>> >> Jason.
>> >>
>> >>
>> >> On 13 April 2012 07:12, Bart Kiers <bkiers at gmail.com> wrote:
>> >>
>> >>> Both the interpreter and the debugger from ANTLRWorks (1.4.3) parse
>> the
>> >>> input just fine.
>> >>>
>> >>> I'm assuming you're not entering "\r" and "\n" as literals, but are
>> >>> actually entering line breaks in the text areas of ANTLRWorks'
>> >>> interpreter... Perhaps you've selected ANTLRWorks to start parsing
>> with a
>> >>> different rule than the `start` rule? Anyway, forget about ANTLRWorks
>> for a
>> >>> moment and whip up a manual test:
>> >>>
>> >>> public class Main {
>> >>>   public static void main(String[] args) throws Exception {
>> >>>     TLexer lexer = new TLexer(new ANTLRStringStream("\r\nL\r\n"));
>> >>>     TParser parser = new TParser(new CommonTokenStream(lexer));
>> >>>     parser.start();
>> >>>   }
>> >>> }
>> >>>
>> >>>
>> >>> Bart.
>> >>>
>> >>>
>> >>> On Fri, Apr 13, 2012 at 12:09 AM, Jason Jones <jmjones5 at gmail.com
>> >wrote:
>> >>>
>> >>>> Hi Bart,
>> >>>>
>> >>>> I thing we're using different version of ANTLR (or something along
>> >>>> those lines) as using your grammar I get a MismatchedTokenException
>> using
>> >>>> the input you've used "\r\nL\r\n". I'm currently using ANTLRWorks
>> version
>> >>>> 1.4.3, could this be the reason why your end seems to be working and
>> mine
>> >>>> not?
>> >>>>
>> >>>> Jason.
>> >>>>
>> >>>>
>> >>>> On 12 April 2012 22:06, Bart Kiers <bkiers at gmail.com> wrote:
>> >>>>
>> >>>>> Hi Jason,
>> >>>>>
>> >>>>> Then there's something other than what you've posted going wrong,
>> >>>>> since the parser generated from:
>> >>>>>
>> >>>>> start      : program EOF;
>> >>>>> program    : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>> >>>>> line       : 'L';
>> >>>>> query      : 'Q';
>> >>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+;
>> >>>>>
>> >>>>> parses the input "\r\nL\r\n" just fine.
>> >>>>>
>> >>>>> Regards,
>> >>>>>
>> >>>>> Bart.
>> >>>>>
>> >>>>>
>> >>>>> On Thu, Apr 12, 2012 at 10:48 PM, Jason Jones <jmjones5 at gmail.com
>> >wrote:
>> >>>>>
>> >>>>>> Hi Bart,
>> >>>>>>
>> >>>>>> Thanks for the suggestion, although it doesn't work either... The
>> >>>>>> skip option does work but since I'll be doing something with the
>> whitespace
>> >>>>>> later I don't want to take this option. Is there something else
>> we're
>> >>>>>> missing?
>> >>>>>>
>> >>>>>> Jason.
>> >>>>>>
>> >>>>>>
>> >>>>>> On 12 April 2012 19:10, Bart Kiers <bkiers at gmail.com> wrote:
>> >>>>>>
>> >>>>>>> Hi Jason,
>> >>>>>>>
>> >>>>>>> On Thu, Apr 12, 2012 at 6:43 PM, Jason Jones <jmjones5 at gmail.com
>> >wrote:
>> >>>>>>>
>> >>>>>>>> ...
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>>> start : program ;
>> >>>>>>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
>> >>>>>>>>
>> >>>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')* ; //currently only used
>> >>>>>>>> in string
>> >>>>>>>>
>> >>>>>>>>
>> >>>>>>> A lexer rule must always match something: if it can match zero
>> >>>>>>> chars, it can/will go in an infinite loop.
>> >>>>>>>
>> >>>>>>> Do something like this:
>> >>>>>>>
>> >>>>>>> start : program ;
>> >>>>>>> program : WHITESPACE? line+ WHITESPACE? (query WHITESPACE?)*;
>> >>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ;
>> >>>>>>>
>> >>>>>>> or simply skip spaces like this:
>> >>>>>>>
>> >>>>>>> start : program ;
>> >>>>>>> program : line+ query*;
>> >>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();} ;
>> >>>>>>>
>> >>>>>>> Regards,
>> >>>>>>>
>> >>>>>>> Bart.
>> >>>>>>>
>> >>>>>>
>> >>>>>>
>> >>>>>
>> >>>>
>> >>>
>> >>
>> >
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>
>


More information about the antlr-interest mailing list