[antlr-interest] Lexer too quick to grab a token?

Bart Kiers bkiers at gmail.com
Mon May 2 05:50:11 PDT 2011


On Mon, May 2, 2011 at 1:19 AM, Todd O'Bryan <toddobryan at gmail.com> wrote:

> ...
>
>
> Does this make any sense? Is there some way to deal with it?
>  ...


You could let '/]]' be matched in the 'R_TAG' rule and emit another token as
per the instructions described here:
http://www.antlr.org/wiki/pages/viewpage.action?pageId=3604497

A demo:

lexer grammar TLexer;

@members {

  List<Token> tokens = new ArrayList<Token>();

  private void emit(String text, int type) {
    Token token = new CommonToken(type, text);
    token.setType(type);
    emit(token);
  }

  @Override
  public void emit(Token token) {
    state.token = token;
    tokens.add(token);
  }

  @Override
  public Token nextToken() {
    super.nextToken();
    if(tokens.size() == 0) {
      return Token.EOF_TOKEN;
    }
    return (Token)tokens.remove(0);
  }
}

L_TAG
  :  '[/'
  ;

R_TAG
  :  '/]]' {emit("/", ANY); emit("]]", R_BRACKET);}
  |  '/]'
  ;

L_BRACKET
  :  '[['
  ;

R_BRACKET
  :  ']]'
  ;

SPACE
  :  (' ' | '\t' | '\r' | '\n') {skip();}
  ;

ANY
  :  .
  ;

which can be tested with the class:

import org.antlr.runtime.*;

public class Main {
  public static void main(String[] args) throws Exception {
    String source = "[/ foo /] [[/ bar /]]";
    ANTLRStringStream in = new ANTLRStringStream(source);
    TLexer lexer = new TLexer(in);
    CommonTokenStream tokens = new CommonTokenStream(lexer);
    for(Object o : tokens.getTokens()) {
      Token t = (Token)o;
      System.out.println("text=" + t.getText() + ", type=" + t.getType());
    }
  }
}


Regards,

Bart.


More information about the antlr-interest mailing list