[antlr-interest] Context-Sensitive Scanning proposal

Mon Aug 16 11:43:00 PDT 2010

I like the separation of lexer/parser to be able to deal with multiple
streams and hidden tokens (mostly for hidden tokens).

I like the analogy of "characters into words, words into sentences"
when describing parsing as well.

The other thing about the separation and my proposal is that it would
allow easy combination of common lexing patterns (like IDENT, NUM,
STRING, C-style comments, etc) into existing parsers, as well as
simple combination of separate parsers (though the parsers could
operate on the same character stream instead of the lexers acting on
that character stream, and common patterns could be "included" into a
parser.)

Part of me is still "schooled" into accepting about the separation,
but it really isn't necessary. However, there are so many people
schooled into it as well, jumping into a lexer-less parser might be a
bigger mental leap... Hard to say...

Of course the question comes up, without the separation, how can we do
DFAs if we want? (Another question - are compies fast enough now that
the parser speed is acceptable without using DFAs?).

WRT my proposal, if we wanted to do DFAs, we'd have to generate a DFA
for each context the parser needs -- that could be a lot of DFAs,
though each would likely be smaller than in a traditional "push"
lexer.
-- Scott

----------------------------------------
Scott Stanchfield
http://javadude.com

On Mon, Aug 16, 2010 at 2:31 PM, Graham Wideman
<gwlist at grahamwideman.com> wrote:
> Hi Scott, Terr and all,
>
> This sort of discussion has me asking:
>
> What are the rationales for any particular distribution of responsibilities between lexer(s) and parser(s)?
>
> -- Which rationales are about "fundamental structure of languages" (which might be about fundamental capacities or limitations of humans)?
>
> -- Which rationales are about implementation: best data model, best performance, etc?
>
> -- Which rationales are about clarity of description of the language recognizer?
>
> Or, to ask another way -- if one is drawn toward a lexer which is sensitive to parser context, then why not have just a parser which operates down to a finer level of detail?  If the parser were designed (per Scott S's idea) to "pull" tokens from one or more lexers, how is this different from the parser simply calling a lower level of parsing functions?
>
> -- Graham
>
> At 8/11/2010 12:09 PM, Scott Stanchfield wrote:
>>Cool - let me know what you think. I'm sure there are a lot of things
>>I didn't consider, but I wanted to pop the idea out there.
>>-- Scott
>>
>>----------------------------------------
>>Scott Stanchfield
>>http://javadude.com
>>
>>
>>
>>On Wed, Aug 11, 2010 at 3:05 PM, Terence Parr <parrt at cs.usfca.edu> wrote:
>>> Added to my todoList on the v4 plans to look at:
>>>
>>> http://www.antlr.org/wiki/display/~admin/ANTLR+v4+lexers
>>>
>>> T
>>> On Aug 10, 2010, at 8:41 PM, Scott Stanchfield wrote:
>>>
>>>> Hey all - I've written up a little proposal for a context-sensitive
>>>> scanning idea. Hopefully my brain is on a good track with this; I
>>>> think it could be a major win for ANTLR.
>>>>
>>>> Please check out
>>>>   http://javadude.com/articles/antlr-context-sensitive-scanner.html
>>>>
>>>> and let me know what you think.
>>>>
>>>> Looks like something similar has been done before (rats! thought I had
>>>> an original thought!).
>>>>
>>>> Thoughts?
>>>> -- Scott
>>>>
>>>> ----------------------------------------
>>>> Scott Stanchfield
>>>> http://javadude.com
>>>>
>>>> List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>>> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>>>
>>>
>>
>>List: http://www.antlr.org/mailman/listinfo/antlr-interest
>>Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>
>
> List: http://www.antlr.org/mailman/listinfo/antlr-interest
> Unsubscribe: http://www.antlr.org/mailman/options/antlr-interest/your-email-address
>