[antlr-interest] Whitespace matching

Fri Apr 13 06:34:29 PDT 2012

Try the following changes (note that some of your parser rules become lexer
rules):

atom : SMALL_ATOM | STRING;

COMMENT : '% ' ~('\n'|'\r')* '\r'? '\n' | '/*' ( options {greedy=false;} :
. )* '*/' ;
SMALL_ATOM : LOWERCASE_LETTER CHARACTER* ;
VARIABLE : UPPERCASE_LETTER CHARACTER* ;
NUMERAL : DIGIT+ ;
STRING : '"' (CHARACTER | WHITESPACE)* '"' ;

fragment CHARACTER : LOWERCASE_LETTER | UPPERCASE_LETTER | DIGIT | SPECIAL ;
fragment LOWERCASE_LETTER : 'a' .. 'z' ;
fragment UPPERCASE_LETTER : 'A' .. 'Z' | '_' ;
fragment DIGIT : '0' .. '9' ;
fragment SPECIAL : '+' | '-' | '*' | '/' | '\\' | '^' | '~' | ':' | '.' |
'?' | '@' | '#' | '$' | '&' ;

I haven't tested this, but it should get you closer to what you need, if it
doesn't completely address the issue.

Regards,
Chuck

On Fri, Apr 13, 2012 at 9:03 AM, Jason Jones <jmjones5 at gmail.com> wrote:

> Ah, I see. I think I get what's been happening (whether I understand it is
> a different matter) there must be something else in the prolog grammar of
> mine that's changing the behaviour of the lexer/parser. I assumed that if I
> just added the rules you have that it would work the same as yours but
> apparently not. Here's the full grammar that I've been playing with:
>
> //TODO: Add grammar for operators
> //TODO: Add grammar for lists - DONE
> //TODO: Add grammar for comments - DONE
> //TODO: Add grammar for whitespace
>
> grammar prolog;
>
> //options {
> //output=template;
> //rewrite=true;
> //}
>
> start : program EOF;
> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
> line    :    'L';
> query    :    'Q';
> //line : clause | comment ;
> comment : '% ' string '\r\n' | '/*' string '*/' ; //Doesn't allow commas,
> parenthese, square brakets, etc. in comments. Consider fixing!
> //Another issue being how the single line comment is ended is it determined
> by the newline character?
> clause : predicate ('.' | ':-' predicate_list '.') ;
> predicate : atom | atom '(' term_list ')' ;
> predicate_list : predicate (',' predicate)* ;
> list : '[' term_list ('|' term)? ']' ;
>
> structure : atom '(' term_list ')' ;
> term_list : term (',' term)* ;
>
> //query : '?-' predicate_list '.' ;
>
> term : numeral | atom | variable | structure | list ;
> atom : small_atom | '\'' string '\'';
> small_atom : LOWERCASE_LETTER character*;
> variable : UPPERCASE_LETTER character* ;
> numeral : DIGIT+ ;
> character : LOWERCASE_LETTER | UPPERCASE_LETTER | DIGIT | SPECIAL ;
> string : character+ (WHITESPACE+ character+)* ;
>
> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ; //currently only used in string
> //NEWLINE : '\r\n' | '\n' ;
> LOWERCASE_LETTER : 'a' .. 'z' ;
> UPPERCASE_LETTER : 'A' .. 'Z' | '_' ;
> DIGIT : '0' .. '9' ;
> SPECIAL : '+' | '-' | '*' | '/' | '\\' | '^' | '~' | ':' | '.' | '?' | '@'
> | '#' | '$' | '&' ;
>
> So when I create a grammar just including the rules you've suggested it
> works fine but why when I use the same rules in this grammar does it not
> work?
>
> Jason.
>
> On 13 April 2012 12:39, Bart Kiers <bkiers at gmail.com> wrote:
>
> > You must be doing something wrong/different. Perhaps you're running an
> old
> > .class file?
> > I copied your prolog.g grammar and Main.java file and did this:
> >
> > wget http://www.antlr.org/download/antlr-3.4-complete.jar
> > java -cp antlr-3.4-complete.jar org.antlr.Tool prolog.g
> > javac -cp antlr-3.4-complete.jar *.java
> > java -cp .:antlr-3.4-complete.jar Main
> >
> > which didn't produce any error or warning.
> >
> > Regards,
> >
> > Bart.
> >
> >
> >
> > On Fri, Apr 13, 2012 at 1:06 PM, Jason Jones <jmjones5 at gmail.com> wrote:
> >
> >> Stranger... Okay will I've done a manual test using this class:
> >>
> >> import org.antlr.runtime.*;
> >>
> >>
> >> public class Main {
> >>           public static void main(String[] args) throws Exception {
> >>                prologLexer lexer = new prologLexer(new
> >> ANTLRStringStream("\r\nL\r\n"));
> >>               prologParser parser = new prologParser(new
> >> CommonTokenStream(lexer));
> >>               parser.start();
> >>           }
> >> }
> >>
> >> After running it like so:
> >>
> >> $ java -cp .:/usr/local/antlr-3.4/lib/antlr-3.4-complete.jar Main
> >> line 1:0 mismatched input '\r\n' expecting WHITESPACE
> >>
> >> I still seem to be getting the same issue ^. Here's the current grammar
> >> that I used to create the parser and lexer:
> >>
> >>
> >> start : program EOF;
> >> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
> >> line    :       'L';
> >> query   :       'Q';
> >>
> >> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ;
> >>
> >> Jason.
> >>
> >>
> >> On 13 April 2012 07:12, Bart Kiers <bkiers at gmail.com> wrote:
> >>
> >>> Both the interpreter and the debugger from ANTLRWorks (1.4.3) parse the
> >>> input just fine.
> >>>
> >>> I'm assuming you're not entering "\r" and "\n" as literals, but are
> >>> actually entering line breaks in the text areas of ANTLRWorks'
> >>> interpreter... Perhaps you've selected ANTLRWorks to start parsing
> with a
> >>> different rule than the `start` rule? Anyway, forget about ANTLRWorks
> for a
> >>> moment and whip up a manual test:
> >>>
> >>> public class Main {
> >>>   public static void main(String[] args) throws Exception {
> >>>     TLexer lexer = new TLexer(new ANTLRStringStream("\r\nL\r\n"));
> >>>     TParser parser = new TParser(new CommonTokenStream(lexer));
> >>>     parser.start();
> >>>   }
> >>> }
> >>>
> >>>
> >>> Bart.
> >>>
> >>>
> >>> On Fri, Apr 13, 2012 at 12:09 AM, Jason Jones <jmjones5 at gmail.com
> >wrote:
> >>>
> >>>> Hi Bart,
> >>>>
> >>>> I thing we're using different version of ANTLR (or something along
> >>>> those lines) as using your grammar I get a MismatchedTokenException
> using
> >>>> the input you've used "\r\nL\r\n". I'm currently using ANTLRWorks
> version
> >>>> 1.4.3, could this be the reason why your end seems to be working and
> mine
> >>>> not?
> >>>>
> >>>> Jason.
> >>>>
> >>>>
> >>>> On 12 April 2012 22:06, Bart Kiers <bkiers at gmail.com> wrote:
> >>>>
> >>>>> Hi Jason,
> >>>>>
> >>>>> Then there's something other than what you've posted going wrong,
> >>>>> since the parser generated from:
> >>>>>
> >>>>> start      : program EOF;
> >>>>> program    : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
> >>>>> line       : 'L';
> >>>>> query      : 'Q';
> >>>>> WHITESPACE : (' ' | '\t' | '\r' | '\n')+;
> >>>>>
> >>>>> parses the input "\r\nL\r\n" just fine.
> >>>>>
> >>>>> Regards,
> >>>>>
> >>>>> Bart.
> >>>>>
> >>>>>
> >>>>> On Thu, Apr 12, 2012 at 10:48 PM, Jason Jones <jmjones5 at gmail.com
> >wrote:
> >>>>>
> >>>>>> Hi Bart,
> >>>>>>
> >>>>>> Thanks for the suggestion, although it doesn't work either... The
> >>>>>> skip option does work but since I'll be doing something with the
> whitespace
> >>>>>> later I don't want to take this option. Is there something else
> we're
> >>>>>> missing?
> >>>>>>
> >>>>>> Jason.
> >>>>>>
> >>>>>>
> >>>>>> On 12 April 2012 19:10, Bart Kiers <bkiers at gmail.com> wrote:
> >>>>>>
> >>>>>>> Hi Jason,
> >>>>>>>
> >>>>>>> On Thu, Apr 12, 2012 at 6:43 PM, Jason Jones <jmjones5 at gmail.com
> >wrote:
> >>>>>>>
> >>>>>>>> ...
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> start : program ;
> >>>>>>>> program : WHITESPACE line+ WHITESPACE (query WHITESPACE)*;
> >>>>>>>>
> >>>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')* ; //currently only used
> >>>>>>>> in string
> >>>>>>>>
> >>>>>>>>
> >>>>>>> A lexer rule must always match something: if it can match zero
> >>>>>>> chars, it can/will go in an infinite loop.
> >>>>>>>
> >>>>>>> Do something like this:
> >>>>>>>
> >>>>>>> start : program ;
> >>>>>>> program : WHITESPACE? line+ WHITESPACE? (query WHITESPACE?)*;
> >>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ ;
> >>>>>>>
> >>>>>>> or simply skip spaces like this:
> >>>>>>>
> >>>>>>> start : program ;
> >>>>>>> program : line+ query*;
> >>>>>>> WHITESPACE  : (' ' | '\t' | '\r' | '\n')+ {skip();} ;
> >>>>>>>
> >>>>>>> Regards,
> >>>>>>>
> >>>>>>> Bart.
> >>>>>>>
> >>>>>>
> >>>>>>
> >>>>>
> >>>>
> >>>
> >>
> >
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>