[antlr-interest] Can I restart lexing from definitepositionindocument?

Sam Harwell sharwell at pixelminegames.com
Fri Apr 17 07:30:17 PDT 2009


The document should be efficiently accessible by complete line in any
editor that supports syntax highlighting. Store one integer per line in
an array that contains the "colorizer state at the start of that line".
If you are allocating the buffer, it can be any size integers you need;
Visual Studio allocates an array of 32-bit ones for the colorizer to
use. The when the state at the start of line N changes, you should
retokenize line N and set line N+1's start state to the resulting state
at the end of line N.

-----Original Message-----
From: antlr-interest-bounces at antlr.org
[mailto:antlr-interest-bounces at antlr.org] On Behalf Of P.N.
Sent: Friday, April 17, 2009 9:12 AM
To: antlr-interest at antlr.org
Subject: Re: [antlr-interest] Can I restart lexing from
definitepositionindocument?


Thank You, Sam! This seems really helpful - tokenizing on a per-line 
basis should not result in any problems, I'll just build a line buffer, 
but don't want to buffer the whole file of course (my lexer will be 
called externally, so I don't even have full access to the document).

Kind regards

Peter



Sam Harwell schrieb:
> You should tokenize on a per-line basis. Never allow a token to span
> multiple lines, and never allow lookahead/back to cross a newline
> boundary. I've documented this process in my blog followed by a email
on
> this list earlier this week with an improvement. Here's the original
> article:
> http://blog.280z28.org/archives/2008/10/21/
>
> Sam
>
> -----Original Message-----
> From: antlr-interest-bounces at antlr.org
> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of P.N.
> Sent: Friday, April 17, 2009 8:58 AM
> To: antlr-interest at antlr.org
> Subject: Re: [antlr-interest] Can I restart lexing from definite
> positionindocument?
>
> Sam Harwell schrieb:
>> Is this part of a syntax highlighter?
>>
> Yes.
>
>> Either way, you can always take an arbitrary token from your original
>> token stream that falls before the location in the document where a
>> change occurred, and substring your document text at that location.
> Then
>> create a token stream based on the original tokens up to the break
>> followed by the tokens you just got back from the re-lexing of the
>> document.
>>
>
> Sounds strange, and not useful. My point is, the document content may
be
>
> changed in a JEditorPane, and I don't want to do lexing/tokenizing
> again.
>
> Peter
>
>> Sam
>>
>> -----Original Message-----
>> From: antlr-interest-bounces at antlr.org
>> [mailto:antlr-interest-bounces at antlr.org] On Behalf Of P.N.
>> Sent: Friday, April 17, 2009 4:51 AM
>> To: antlr-interest at antlr.org
>> Subject: [antlr-interest] Can I restart lexing from definite position
>> indocument?
>>
>>
>> Hello!
>>
>> Just want to know, if it's possible to restart lexing in a big file.
>> Say, e.g. the file might be of 20MB or more (okay, that's not good
>> programming style, but's not the question here ;-) ), and I'd change
>> sth. at the end of the file - would I need to do lexing from start
>> again, or is there a chance to do it only for the last characters?
>> Probably using org.antlr.runtime.RecognizerSharedState? How should I
> use
>> it?
>>
>> Kind regards
>>
>> Peter
>>
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
>>
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>
>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
> http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>



List: http://www.antlr.org/mailman/listinfo/antlr-interest
Unsubscribe:
http://www.antlr.org/mailman/options/antlr-interest/your-email-address


More information about the antlr-interest mailing list