[antlr-interest] Reposting an email about a baby latex parser
Pavel Grinfeld
pg at freeboundaries.com
Thu Jun 24 11:27:28 PDT 2010
Hi, I hope it's OK to resend an email that was overlooked previously...
My problem is separating text from commands in LaTeX. I'm doing pretty
well recognizing LaTeX commands, but now I'm at the stage where I want
to capture the "text". I'm having trouble defining "everything else".
Basically, I currently define LaTeX as
commands (as I define them), possibly separated by WS, and everything
that's not a command is "text". I keep running into a problem that when
I define "text" generously, it starts grabbing tokens that belong to
commands. Any help would be greatly appreciated!
Thanks in advance,
Pavel
I'm including what I have so far, and the document I'm hoping to parse.
grammar PGTeX;
doc : (command WS?)+ EOF;
command : escWord cWord+ ( sWord+ cWord*)?;
sWord : '[' word ']';
cWord : '{' word '}';
escWord : '\\' word;
word : WORD;
WORD: ('-'|'a'..'z'|'A'..'Z'|'0'..'9'|'\*')+;
WS : ( ' ' | '\t'| '\r' | '\n' )+;
COMMENT
: '%' (~('\n'|'\r'))* {$channel = HIDDEN;};
And here's the document:
\documentclass{book}%
\usepackage{amsfonts}
\usepackage{amsmath}%
\newtheorem{summary}[theorem]{Summary}
\begin{document}
\chapter*{Intro}
Book starts here $x^{2}+y^{2}=1$. Here's an intersting faction:
\begin{equation}
\int_{0}^{1}\sin xdx=4
\end{equation}
\end{document}
More information about the antlr-interest
mailing list