[antlr-interest] accepting nested code blocks

Gerald Rosenberg gerald at certiv.net
Tue Oct 13 13:06:33 PDT 2009


The book and Indhu are both correct.  How to 
proceed is more a matter of what you know about 
the content between the delimiter pair, its 
complexity,  and what you want out of it.  There 
are modal lexers, island grammars, and parser-evaluation alternatives.

The recursive lexer rule definition does not 
handle quoted strings, which in the case of 
JavaScript will prove fatal.  I believe there is 
a way to avoid this problem, only using pure 
lexer rules, but it will be complex and the 
margin is too small to contain the proof.

However, if all you want is a single token 
containing the clob between a balanced set of 
delimiters, there is this simple approach:

@lexer::members {

public boolean pairMatch(int limit) {
     return Helper.pairMatch(input, limit);
   }
}

BRACE_BLOCK
   :  '{' { pairMatch(500) }?
   ;

and include the attached helper in your 
build.  This version recognizes nested delimiters 
subject to line comments and both single and double quoted strings.



At 10:41 AM 10/13/2009, Espák Miklós wrote:
>Hi,
>
>I understand your point of view, but the book states explicitly the following:
>
>"ANTLR generates recursive-descent recognizers
>for lexers just as it does for parsers and tree parsers. Consequently,
>ANTLR supports recursive lexer rules, unlike other tools such as lex."
>
>Using recursion it should be possible to create such a lexer rule. If
>not, what it can be used for?
>
>My original problem is that the input files contain a JavaScript
>function definition. The other parts of the input are covered by the
>grammar. However, I do not need to check the validity of the JS
>function, just extract it as is, and pass to the JS engine later. So
>if it is not necessary, I do not want to parse it.
>
>Is it possible somehow? Or should I denote the beginning and the end
>of the JS function by some special token to allow catching it by a
>lexer rule?
>
>Cheers,
>Miklos
>
>2009/10/13 Indhu Bharathi <indhu.b at s7software.com>:
> > Balanced parenthesis cannot be expressed using regular expression which
> > means you cannot recognize it using lexer. You need a push down automata
> > which means you need a parser to recognize it. Try doing it using parser
> > rules.
> >
> >
> >
> > Cheers, Indhu
> >
> >
> >
> >
> >
> > From: antlr-interest-bounces at antlr.org
> > [mailto:antlr-interest-bounces at antlr.org] On Behalf Of Espák Miklós
> > Sent: Tuesday, October 13, 2009 10:04 PM
> > To: antlr-interest at antlr.org
> > Subject: [antlr-interest] accepting nested code blocks
> >
> >
> >
> > Hi,
> >
> > I want to create a lexer rule accepting nested code blocks.
> >
> > I tried out the example of the Definitive 
> ANTLR Reference (Section 4.3), but
> > it does not work.
> > It accepts only such inputs which do not contain any character other than
> > curly braces. Moreover, one closing brace is enough.
> >
> > The error is the following:
> > MismatchedTokenException: line 1:1 mismatched input UNKNOW expecting 125
> >
> > The original code of the book:
> >
> > fragment
> > CODE[boolean stripCurlies]:
> >   '{' ( CODE[stripCurlies] | ~('{' |'}' ) )* '}'
> >   {
> >     if ( stripCurlies ) {
> >       setText(getText().substring(1, getText().length()));
> >     }
> >   }
> >   ;
> >
> > The simplified version of the rule results the same:
> > fragment
> > Block: '{' ( Block | ~('{'|'}') )* '}';
> >
> > I use ANTLR 3.2.
> >
> > Does anybody have an idea, how to get around this?
> >
> > Thanks,
> >
> > Miklos
>
>List: http://www.antlr.org/mailman/listinfo/antlr-interest
>Unsubscribe: 
>http://www.antlr.org/mailman/options/antlr-interest/your-email-address
-------------- next part --------------
An HTML attachment was scrubbed...
URL: http://www.antlr.org/pipermail/antlr-interest/attachments/20091013/5591051f/attachment.html 
-------------- next part --------------

import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CharStream;

public class Helper {

	private static boolean debug = false;

	public static boolean pairMatch(CharStream input, int limit) {
		return pairMatch(input, limit, '{', '}');
	}

	public static boolean pairMatch(CharStream input, int limit, char open, char close) {
		int nest = 1; // already matched & consumed open char
		boolean done = false;
		while (!done && limit > 0) {
			int la_1 = input.LA(1);
			if (la_1 == -1) return false;
			if (la_1 == '\\') {
				int la_2 = input.LA(2);
				if (la_2 == -1) return false;
				consume(input, limit);
				consume(input, limit);
			} else if (la_1 == '/') {
				int la_2 = input.LA(2);
				if (la_2 == -1) return false;
				if (la_2 == '/') { // consume '//' to eol
					consume(input, limit);
					do {
						consume(input, limit);
						la_1 = input.LA(1);
						if (la_1 == -1) return false;
					} while (!(la_1 == '\r' || la_1 == '\n'));
				} else {
					consume(input, limit);
				}
			} else if (la_1 == '\'' || la_1 == '"') {
				boolean goodString = matchString(input, limit, (char) la_1);
				if (!goodString) return false;
			} else if (la_1 == open) {
				nest++;
				consume(input, limit);
			} else if (la_1 == close) {
				nest--;
				consume(input, limit);
				if (nest == 0) done = true;
			} else {
				consume(input, limit);
			}
		}
		if (limit == 0) return false;
		return true;
	}

	private static boolean matchString(CharStream input, int limit, char c) {
		consume(input, limit); // already matched open char
		boolean done = false;
		while (!done && limit > 0) {
			int la_1 = input.LA(1);
			if (la_1 == -1) return false;
			if (la_1 == '\\') {
				int la_2 = input.LA(2);
				if (la_2 == -1) return false;
				consume(input, limit);
				consume(input, limit);
			} else if (la_1 == c) {
				consume(input, limit);
				done = true;
			} else {
				consume(input, limit);
			}
		}
		if (limit == 0) return false;
		return true;
	}

	private static void consume(CharStream input, int limit) {
		if (debug) System.out.print((char) input.LA(1));
		input.consume();
		limit--;
	}

	// //////////////////////////////////////////////////////////////////////////

	public static void main(String[] args) {
		debug = true;
		ANTLRStringStream input = new ANTLRStringStream(t2);
		boolean result = pairMatch(input, 1000);
		System.out.println("Result: " + result);
	}

	public static final String t1 = "hel'lo}and";
	public static final String t2 = "h{ell}o}and";
	public static final String t3 = "run(\"A{}\"); }";
}


More information about the antlr-interest mailing list