[antlr-interest] Antlr first time user, help requested

Mon Jul 5 21:18:50 PDT 2010

Sorry to say that ANTLR is driving me nuts, starting to really hate
the tool, so I'd really appreciate some help on it before I give up on
it.

I am trying to parse a simple bit of text that looks something like this:

PageMetaData:
name: This is a test name
categories: category1, category2,
  category3
notes: These are notes
  that the newlines are important, but not the leading whitespace

So the idea is the script always starts with "PageMetaData:\n"
The name section should ignore leading whitespace after the color, and
take in any text to the end of the line, including white space
The categories section is a comma separated set of camel-cased words
that can be one one or more lines. Subsequent lines should lead with
one or more spaces
The notes section should allow multiple lines as long as they all
start with leading white space.
This is going to get a bit more complex, but you get the idea.

My grammar file is at the bottom of this email (not sure if this ML
supports attachments). It fails miserably (keep running into
mismatched token exceptions on the testName matching). Here is my
input text:
PageMetaData:
name: This is a test name
categories: category1, category2,
  category3
notes: These are notes
  that the newlines are important, but not the leading whitespace

So after trying many different variations I tried a very simple
grammar to step back to basics (or so I thought). Grammar:
grammar Test;

prog
: 'name:' NONBREAK NEWLINE? EOF!;

NONBREAK
: (~('\n'|'\r'))+ ;

NEWLINE:'\r'? '\n' ;

Input (quotes included to show that there is a new line):
"name: test
"

In the 1.4 ANTLRWorks Intrepreter I get a
MismatchedTokenException(4!=6) with this setup. What the heck this is
pretty basic?

I am also seeing problems with the windows EOL matching but not the
unix matching in ANTLWorks when I add a newline (using that above
newline token), but I am on Ubuntu Linux, not sure what is going on
there.
Would really appreciate some hints here.
Thank you

Grammar file from above:
grammar PageMetaData;

options {
  output = AST;
}

tokens {
	HEADER_TEXT = 'PageMetaData:' ;
	NAME_LABEL = 'name:' ;
	CATEGORIES_LABEL = 'categories:' ;
	TAGS_LABEL = 'tags:' ;
  NOTE_LABEL = 'note:' ;
  AUTOMATED_TESTS_LABEL = 'automated-tests:' ;
  AUTOMATED_TEST_LABEL = 'automated-test:' ;
  COMMENT;
  CAMELCASE;
  FILE;
  COMMA;
  TEXT;
}

COMMENT
  :	'//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
  ;

NEWLINE	: ('\r' '\n' | '\n' | '\r' );

CAMELCASE
	:	('A'..'Z'|'a'..'z'|'0'..'9')+;

FILE
	:	('A'..'Z'|'a'..'z'|'0'..'9'| '_' | '-' | '.' | '/')+;

COMMA
	:	',' (' '+ | NEWLINE ' '+)?;

TEXT : (~('\r'|'\n')+);

definition
	: (NEWLINE | ' ')* header NEWLINE
  testName NEWLINE
  categories NEWLINE
  (tags NEWLINE)?
  (note NEWLINE)?
  (automatedTests NEWLINE)?
  (automatedTest NEWLINE)?
  EOF! ;

header
	: HEADER_TEXT
	;

testName : NAME_LABEL TEXT;

categories
	: CATEGORIES_LABEL CAMELCASE (COMMA CAMELCASE)* ;

tags
	:	'tags:' (' '*)! CAMELCASE (COMMA CAMELCASE)* ;

note
	: 	'note:' (' '*)! TEXT (NEWLINE+ TEXT)* ;

automatedTests
	:	'tests:' (' '*)! FILE (COMMA FILE)* ;

automatedTest
	:	'test:' (' '*)! TEXT ;