[antlr-interest] Breaking out of a parser loop based on the current item
Richard Clark
rd_clark at sbcglobal.net
Mon Jul 19 20:11:42 PDT 2004
On Jul 19, 2004, at 17:09, Richard Clark wrote:
> 1) Define a word to exclude any ending punctuation (which can cause
> problems with Foo.bar ...), or
> 2) Set up a loop...
Actually, I wound up implementing option 1:
/* in the lexer */
// Four cases:
// non-spaces, non-period, followed by whitespace or the end of the file
// a sentence-ending word followed by a space (or the end of the file,
or a tag)
// non-spaces, dot, non-spaces
// non-spaces, open curly bracket w/o a following tag
WORD :
WORD_PART
| ( WORD_PART PERIOD WORD ) => WORD_PART PERIOD WORD
| ( WORD_PART RPAREN ) => WORD_PART RPAREN (WORD)?
| ( WORD_PART LCURLY ~('@')) => WORD_PART LCURLY
;
// any run of characters ending in whitespace, newline, period, or
end-of-file
// (also the right parenthesis, which is fixed by the WORD rule above)
protected
WORD_PART : ({ LA(0) != EOF_CHAR}? ~(' ' | '\t' | '\f' | '\r' | '\n' |
'.' | ')' | '{' | '@'))+
;
/* in the parser, including some trickery so the individual tokens are
melded into
one string */
sentence
{ StringBuffer buf = new StringBuffer(); }
// make sure there's a word to start the sentence
: (WORD) => sentenceFragment[buf] { #sentence = #[TEXT,
buf.toString()]; }
| /* nothing */
;
protected
sentenceFragment[StringBuffer buf]
: w:WORD {buf.append(w.getText());}
(
whitespace[buf] sentenceFragment[buf]
| sentenceEnd[buf]
)?
;
protected
sentenceEnd[StringBuffer buf]
: PERIOD {buf.append('.'); } (lp: RPAREN {buf.append(')'); })?
;
/* that's all... */
...Richard
Yahoo! Groups Links
<*> To visit your group on the web, go to:
http://groups.yahoo.com/group/antlr-interest/
<*> To unsubscribe from this group, send an email to:
antlr-interest-unsubscribe at yahoogroups.com
<*> Your use of Yahoo! Groups is subject to:
http://docs.yahoo.com/info/terms/
More information about the antlr-interest
mailing list