[antlr-interest] Help needed with baby LaTeX parser
Pavel Grinfeld
pg at freeboundaries.com
Tue Jun 22 13:47:52 PDT 2010
Hi,
I'm doing pretty well recognizing LaTeX commands, but now I'm at the
stage where I want to capture the "text". I'm having trouble defining
"everything else".
Basically, I currently define LaTeX as
commands (as I define them), possibly separated by WS, and everything
that's not a command is "text". I keep running into a problem that when
I define "text" generously, it starts grabbing tokens that belong to
commands. Any help would be greatly appreciated!
Thanks in advance,
Pavel
I'm including what I have so far, and the document I'm hoping to parse.
grammar PGTeX;
doc : (command WS?)+ EOF;
command : escWord cWord+ ( sWord+ cWord*)?;
sWord : '[' word ']';
cWord : '{' word '}';
escWord : '\\' word;
word : WORD;
WORD: ('-'|'a'..'z'|'A'..'Z'|'0'..'9'|'\*')+;
WS : ( ' ' | '\t'| '\r' | '\n' )+;
COMMENT
: '%' (~('\n'|'\r'))* {$channel = HIDDEN;};
And here's the document:
\documentclass{book}%
\usepackage{amsfonts}
\usepackage{amsmath}%
\newtheorem{summary}[theorem]{Summary}
\begin{document}
\chapter*{Intro}
Book starts here $x^{2}+y^{2}=1$. Here's an intersting faction:
\begin{equation}
\int_{0}^{1}\sin xdx=4
\end{equation}
\end{document}
More information about the antlr-interest
mailing list