[antlr-interest] Reposting an email about a baby latex parser

Andrew Bradnan andrew.bradnan at gmail.com
Thu Jun 24 14:40:10 PDT 2010


What happens in the example you give? and what do you want it to do.


On Thu, Jun 24, 2010 at 11:27 AM, Pavel Grinfeld <pg at freeboundaries.com>wrote:

> Hi, I hope it's OK to resend an email that was overlooked previously...
>
> My problem is separating text from commands in LaTeX. I'm doing pretty
> well recognizing LaTeX commands, but now I'm at the stage where I want
> to capture the "text". I'm having trouble defining "everything else".
>
> Basically, I currently define LaTeX as
>
> commands (as I define them), possibly separated by WS, and everything
> that's not a command is "text". I keep running into a problem that when
> I define "text" generously, it starts grabbing tokens that belong to
> commands. Any help would be greatly appreciated!
>
> Thanks in advance,
>
> Pavel
>
>  I'm including what I have so far, and the document I'm hoping to parse.
>
> grammar PGTeX;
>
> doc : (command WS?)+ EOF;
>
> command : escWord  cWord+ ( sWord+ cWord*)?;
>
> sWord    : '[' word ']';
> cWord    : '{' word '}';
> escWord : '\\' word;
>
> word : WORD;
>
> WORD:    ('-'|'a'..'z'|'A'..'Z'|'0'..'9'|'\*')+;
>
> WS  :   ( ' ' | '\t'| '\r' | '\n' )+;
>
> COMMENT
>     :    '%' (~('\n'|'\r'))*  {$channel = HIDDEN;};
>
>
> And here's the document:
>
> \documentclass{book}%
> \usepackage{amsfonts}
> \usepackage{amsmath}%
> \newtheorem{summary}[theorem]{Summary}
> \begin{document}
>
>
> \chapter*{Intro}
>
> Book starts here $x^{2}+y^{2}=1$. Here's an intersting faction:
> \begin{equation}
> \int_{0}^{1}\sin xdx=4
> \end{equation}
>
> \end{document}
>
>
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>



-- 
/Andrew


More information about the antlr-interest mailing list