[antlr-interest] Noob Question
Nik Molnar
nikmd23 at gmail.com
Tue Jan 12 18:32:50 PST 2010
JOHN!
THANK YOU! You don't know how long I've been struggling with this - and now
that you explain it, it makes perfect sense!
I will heed your warning about * and ? - I see how they match empty strings
now.
Thanks,
Nik
On Tue, Jan 12, 2010 at 9:21 PM, John B. Brodie <jbb at acm.org> wrote:
> Greetings!
>
> Your WS lexer rule can recognize the empty string, this is VERY bad.
>
> Because WS can recognize the empty string your lexer will enter an
> infinite loop when encountering a character it can not deal with - like
> the '_' in your example - you have no lexer rule that can handle a '_'.
>
> More below...
>
> On Tue, 2010-01-12 at 20:52 -0500, Nik Molnar wrote:
> > Hello all,
> >
> > I am rather new to ANTLR and seem to be running into a small issue I
> can't
> > figure out.
> >
> > I'm writing a very simple grammar based on many tutorials online, the
> > calculator.
> >
> > This grammar generates C# code that compiles perfectly, and works for the
> > most part in ANTLRWorks Interpreter, Debugger and in a sample app I made
> in
> > .NET to call the generated Parser/Lexer.
> >
> > The problem I run into is what I put in invalid syntax, expecting an
> error.
> > Output like so:
> >
> > Valid Syntax: "3+3" => Works in interpreter, debugger and compiled .net
> > code.
> > Invalid Syntax: "3+/3" => Gives error in interpreter, debugger and
> compiled
> > .net code, as expected.
> > Invalid Syntax: "3_3" => The interpreter shows nothing, the debugger
> cannot
> > connect and the .net code hangs for a while then throws an out of memory
> > exception.
>
> Your lexer will correctly identify the first '3' as an INT. Next your
> lexer will see the '_' which it is unable to deal with. BUT since your
> WS rule says that the empty string - the non-stuff between the first '3'
> and the '_' - is legal, your lexer accepts that empty string as a WS
> token and deposits it into the HIDDEN channel. Now the lexer is still
> looking at the '_' which it is unable to deal with. BUT since your WS
> rule says that the empty string - the non-stuff between the first '3'
> and the '_' - is legal, your lexer accepts that empty string as a WS
> token and deposits it into the HIDDEN channel. Now the lexer is still
> looking at the '_' which it is unable to deal with. BUT since your WS
> rule says that the empty string - the non-stuff between the first '3'
> and the '_' - is legal, your lexer accepts that empty string as a WS
> token and deposits it into the HIDDEN channel. Now the lexer is still
> looking at the '_' .... and so nothing good results.
>
> Your .NET app runs out of memory because the infinite sequence of empty
> WS tokens appended onto the HIDDEN channel just gobbles up all memory.
>
> The debugger can not connect because the connections happens after the
> lexer has finished tokenizing the input text. Your lexer never finishes
> so the debugger won't connect. I bet if you waited long enuf you would
> eventually run out of memory in this case too.
>
> Same drill for the interpreter....
>
> >
> > I'm sure I'm doing something wrong in my grammar but don't know what.
> >
> > I've included it below. Please help me!
> >
> > Thanks,
> >
> > grammar Test;
> >
> > /*options
> > {
> > language = 'CSharp2';
> > }*/
> >
> > expression
> > : amExpression;
> >
> > amExpression
> > :mdExpression ((PLUS|DASH) mdExpression)*
> > ;
> >
> > mdExpression
> > :INT ((STAR|SLASH) INT)*
> > ;
> >
> > DASH
> > :'-'
> > ;
> >
> > SLASH
> > :'/'
> > ;
> >
> > WS
> > : (' '
> > | '\t'
> > | '\n'
> > | '\r')*
> > { $channel = HIDDEN; }
> > ;
>
> the * above should really be a +
>
> be VERY careful with rules that can recognize the empty string, e.g.
> have just a * or ? operator.
>
> I have NEVER found an instance where a lexer rule that accepts nothing
> (the empty string) does anything that helps.
>
> On RARE occasions, a parser rule that accepts the empty string can be
> appropriate, but needs to be examined VERY closely.
>
> >
> > STAR
> > : '*'
> > ;
> >
> > PLUS
> > : '+'
> > ;
> >
> > fragment DIGIT
> > : '0'..'9'
> > ;
> >
> > INT
> > : (DIGIT)+
> > ;
>
> Hope this helps...
> -jbb
>
>
>
More information about the antlr-interest
mailing list