ANTLR Python API: antlr3.py Source File

00001 ##
00002 #  @package antlr3
00003 # @brief ANTLR3 runtime package
00004 # 
00005 # This module contains all support classes, which are needed to use recognizers
00006 # generated by ANTLR3.
00007 # 
00008 # @mainpage
00009 # 
00010 # \note Please be warned that the line numbers in the API documentation do not
00011 # match the real locations in the source code of the package. This is an
00012 # unintended artifact of doxygen, which I could only convince to use the
00013 # correct module names by concatenating all files from the package into a single
00014 # module file...
00015 # 
00016 # Here is a little overview over the most commonly used classes provided by
00017 # this runtime:
00018 # 
00019 # @section recognizers Recognizers
00020 # 
00021 # These recognizers are baseclasses for the code which is generated by ANTLR3.
00022 # 
00023 # - BaseRecognizer: Base class with common recognizer functionality.
00024 # - Lexer: Base class for lexers.
00025 # - Parser: Base class for parsers.
00026 # - tree.TreeParser: Base class for %tree parser.
00027 # 
00028 # @section streams Streams
00029 # 
00030 # Each recognizer pulls its input from one of the stream classes below. Streams
00031 # handle stuff like buffering, look-ahead and seeking.
00032 # 
00033 # A character stream is usually the first element in the pipeline of a typical
00034 # ANTLR3 application. It is used as the input for a Lexer.
00035 # 
00036 # - ANTLRStringStream: Reads from a string objects. The input should be a unicode
00037 #   object, or ANTLR3 will have trouble decoding non-ascii data.
00038 # - ANTLRFileStream: Opens a file and read the contents, with optional character
00039 #   decoding.
00040 # - ANTLRInputStream: Reads the date from a file-like object, with optional
00041 #   character decoding.
00042 # 
00043 # A Parser needs a TokenStream as input (which in turn is usually fed by a
00044 # Lexer):
00045 # 
00046 # - CommonTokenStream: A basic and most commonly used TokenStream
00047 #   implementation.
00048 # - TokenRewriteStream: A modification of CommonTokenStream that allows the
00049 #   stream to be altered (by the Parser). See the 'tweak' example for a usecase.
00050 # 
00051 # And tree.TreeParser finally fetches its input from a tree.TreeNodeStream:
00052 # 
00053 # - tree.CommonTreeNodeStream: A basic and most commonly used tree.TreeNodeStream
00054 #   implementation.
00055 #   
00056 # 
00057 # @section tokenstrees Tokens and Trees
00058 # 
00059 # A Lexer emits Token objects which are usually buffered by a TokenStream. A
00060 # Parser can build a Tree, if the output=AST option has been set in the grammar.
00061 # 
00062 # The runtime provides these Token implementations:
00063 # 
00064 # - CommonToken: A basic and most commonly used Token implementation.
00065 # - ClassicToken: A Token object as used in ANTLR 2.x, used to %tree
00066 #   construction.
00067 # 
00068 # Tree objects are wrapper for Token objects.
00069 # 
00070 # - tree.CommonTree: A basic and most commonly used Tree implementation.
00071 # 
00072 # A tree.TreeAdaptor is used by the parser to create tree.Tree objects for the
00073 # input Token objects.
00074 # 
00075 # - tree.CommonTreeAdaptor: A basic and most commonly used tree.TreeAdaptor
00076 # implementation.
00077 # 
00078 # 
00079 # @section Exceptions
00080 # 
00081 # RecognitionException are generated, when a recognizer encounters incorrect
00082 # or unexpected input.
00083 # 
00084 # - RecognitionException
00085 #   - MismatchedRangeException
00086 #   - MismatchedSetException
00087 #     - MismatchedNotSetException
00088 #     .
00089 #   - MismatchedTokenException
00090 #   - MismatchedTreeNodeException
00091 #   - NoViableAltException
00092 #   - EarlyExitException
00093 #   - FailedPredicateException
00094 #   .
00095 # .
00096 # 
00097 # A tree.RewriteCardinalityException is raised, when the parsers hits a
00098 # cardinality mismatch during AST construction. Although this is basically a
00099 # bug in your grammar, it can only be detected at runtime.
00100 # 
00101 # - tree.RewriteCardinalityException
00102 #   - tree.RewriteEarlyExitException
00103 #   - tree.RewriteEmptyStreamException
00104 #   .
00105 # .
00106 # 
00107 # 
00108 
00109 # tree.RewriteRuleElementStream
00110 # tree.RewriteRuleSubtreeStream
00111 # tree.RewriteRuleTokenStream
00112 # CharStream
00113 # DFA
00114 # TokenSource
00115 
00116 # [The "BSD licence"]
00117 # Copyright (c) 2005-2008 Terence Parr
00118 # All rights reserved.
00119 #
00120 # Redistribution and use in source and binary forms, with or without
00121 # modification, are permitted provided that the following conditions
00122 # are met:
00123 # 1. Redistributions of source code must retain the above copyright
00124 #    notice, this list of conditions and the following disclaimer.
00125 # 2. Redistributions in binary form must reproduce the above copyright
00126 #    notice, this list of conditions and the following disclaimer in the
00127 #    documentation and/or other materials provided with the distribution.
00128 # 3. The name of the author may not be used to endorse or promote products
00129 #    derived from this software without specific prior written permission.
00130 #
00131 # THIS SOFTWARE IS PROVIDED BY THE AUTHOR ``AS IS'' AND ANY EXPRESS OR
00132 # IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES
00133 # OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED.
00134 # IN NO EVENT SHALL THE AUTHOR BE LIABLE FOR ANY DIRECT, INDIRECT,
00135 # INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT
00136 # NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
00137 # DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
00138 # THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
00139 # (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF
00140 # THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
00141 
00142 __version__ = '3.1.3'
00143 
00144 def version_str_to_tuple(version_str):
00145     import re
00146     import sys
00147 
00148     if version_str == 'HEAD':
00149         return (sys.maxint, sys.maxint, sys.maxint, sys.maxint)
00150 
00151     m = re.match(r'(\d+)\.(\d+)(\.(\d+))?(b(\d+))?', version_str)
00152     if m is None:
00153         raise ValueError("Bad version string %r" % version_str)
00154 
00155     major = int(m.group(1))
00156     minor = int(m.group(2))
00157     patch = int(m.group(4) or 0)
00158     beta = int(m.group(6) or sys.maxint)
00159 
00160     return (major, minor, patch, beta)
00161 
00162 
00163 runtime_version_str = __version__
00164 runtime_version = version_str_to_tuple(runtime_version_str)
00165 
00166 
00167 from constants import *
00168 from dfa import *
00169 from exceptions import *
00170 from recognizers import *
00171 from streams import *
00172 from tokens import *
00173 """ANTLR3 exception hierarchy"""
00174 
00175 
00176 from antlr3.constants import INVALID_TOKEN_TYPE
00177 
00178 
00179 ##
00180 # @brief Raised to signal failed backtrack attempt
00181 class BacktrackingFailed(Exception):
00182 
00183     pass
00184 
00185 
00186 ##
00187 # @brief The root of the ANTLR exception hierarchy.
00188 # 
00189 #     To avoid English-only error messages and to generally make things
00190 #     as flexible as possible, these exceptions are not created with strings,
00191 #     but rather the information necessary to generate an error.  Then
00192 #     the various reporting methods in Parser and Lexer can be overridden
00193 #     to generate a localized error message.  For example, MismatchedToken
00194 #     exceptions are built with the expected token type.
00195 #     So, don't expect getMessage() to return anything.
00196 # 
00197 #     Note that as of Java 1.4, you can access the stack trace, which means
00198 #     that you can compute the complete trace of rules from the start symbol.
00199 #     This gives you considerable context information with which to generate
00200 #     useful error messages.
00201 # 
00202 #     ANTLR generates code that throws exceptions upon recognition error and
00203 #     also generates code to catch these exceptions in each rule.  If you
00204 #     want to quit upon first error, you can turn off the automatic error
00205 #     handling mechanism using rulecatch action, but you still need to
00206 #     override methods mismatch and recoverFromMismatchSet.
00207 #     
00208 #     In general, the recognition exceptions can track where in a grammar a
00209 #     problem occurred and/or what was the expected input.  While the parser
00210 #     knows its state (such as current input symbol and line info) that
00211 #     state can change before the exception is reported so current token index
00212 #     is computed and stored at exception time.  From this info, you can
00213 #     perhaps print an entire line of input not just a single token, for example.
00214 #     Better to just say the recognizer had a problem and then let the parser
00215 #     figure out a fancy report.
00216 #     
00217 #     
00218 class RecognitionException(Exception):
00219 
00220     def __init__(self, input=None):
00221         Exception.__init__(self)
00222 
00223         # What input stream did the error occur in?
00224         self.input = None
00225 
00226         # What is index of token/char were we looking at when the error
00227         # occurred?
00228         self.index = None
00229 
00230         # The current Token when an error occurred.  Since not all streams
00231         # can retrieve the ith Token, we have to track the Token object.
00232         # For parsers.  Even when it's a tree parser, token might be set.
00233         self.token = None
00234 
00235         # If this is a tree parser exception, node is set to the node with
00236         # the problem.
00237         self.node = None
00238 
00239         # The current char when an error occurred. For lexers.
00240         self.c = None
00241 
00242         # Track the line at which the error occurred in case this is
00243         # generated from a lexer.  We need to track this since the
00244         # unexpected char doesn't carry the line info.
00245         self.line = None
00246 
00247         self.charPositionInLine = None
00248 
00249         # If you are parsing a tree node stream, you will encounter som
00250         # imaginary nodes w/o line/col info.  We now search backwards looking
00251         # for most recent token with line/col info, but notify getErrorHeader()
00252         # that info is approximate.
00253         self.approximateLineInfo = False
00254 
00255         
00256         if input is not None:
00257             self.input = input
00258             self.index = input.index()
00259 
00260             # late import to avoid cyclic dependencies
00261             from antlr3.streams import TokenStream, CharStream
00262             from antlr3.tree import TreeNodeStream
00263 
00264             if isinstance(self.input, TokenStream):
00265                 self.token = self.input.LT(1)
00266                 self.line = self.token.line
00267                 self.charPositionInLine = self.token.charPositionInLine
00268 
00269             if isinstance(self.input, TreeNodeStream):
00270                 self.extractInformationFromTreeNodeStream(self.input)
00271 
00272             else:
00273                 if isinstance(self.input, CharStream):
00274                     self.c = self.input.LT(1)
00275                     self.line = self.input.line
00276                     self.charPositionInLine = self.input.charPositionInLine
00277 
00278                 else:
00279                     self.c = self.input.LA(1)
00280 
00281     def extractInformationFromTreeNodeStream(self, nodes):
00282         from antlr3.tree import Tree, CommonTree
00283         from antlr3.tokens import CommonToken
00284         
00285         self.node = nodes.LT(1)
00286         adaptor = nodes.adaptor
00287         payload = adaptor.getToken(self.node)
00288         if payload is not None:
00289             self.token = payload
00290             if payload.line <= 0:
00291                 # imaginary node; no line/pos info; scan backwards
00292                 i = -1
00293                 priorNode = nodes.LT(i)
00294                 while priorNode is not None:
00295                     priorPayload = adaptor.getToken(priorNode)
00296                     if priorPayload is not None and priorPayload.line > 0:
00297                         # we found the most recent real line / pos info
00298                         self.line = priorPayload.line
00299                         self.charPositionInLine = priorPayload.charPositionInLine
00300                         self.approximateLineInfo = True
00301                         break
00302                     
00303                     i -= 1
00304                     priorNode = nodes.LT(i)
00305                     
00306             else: # node created from real token
00307                 self.line = payload.line
00308                 self.charPositionInLine = payload.charPositionInLine
00309                 
00310         elif isinstance(self.node, Tree):
00311             self.line = self.node.line
00312             self.charPositionInLine = self.node.charPositionInLine
00313             if isinstance(self.node, CommonTree):
00314                 self.token = self.node.token
00315 
00316         else:
00317             type = adaptor.getType(self.node)
00318             text = adaptor.getText(self.node)
00319             self.token = CommonToken(type=type, text=text)
00320 
00321      
00322     ##
00323     # Return the token type or char of the unexpected input element
00324     def getUnexpectedType(self):
00325 
00326         from antlr3.streams import TokenStream
00327         from antlr3.tree import TreeNodeStream
00328 
00329         if isinstance(self.input, TokenStream):
00330             return self.token.type
00331 
00332         elif isinstance(self.input, TreeNodeStream):
00333             adaptor = self.input.treeAdaptor
00334             return adaptor.getType(self.node)
00335 
00336         else:
00337             return self.c
00338 
00339     unexpectedType = property(getUnexpectedType)
00340     
00341 
00342 ##
00343 # @brief A mismatched char or Token or tree node.
00344 class MismatchedTokenException(RecognitionException):
00345     
00346     def __init__(self, expecting, input):
00347         RecognitionException.__init__(self, input)
00348         self.expecting = expecting
00349         
00350 
00351     def __str__(self):
00352         #return "MismatchedTokenException("+self.expecting+")"
00353         return "MismatchedTokenException(%r!=%r)" % (
00354             self.getUnexpectedType(), self.expecting
00355             )
00356     __repr__ = __str__
00357 
00358 
00359 ##
00360 # An extra token while parsing a TokenStream
00361 class UnwantedTokenException(MismatchedTokenException):
00362 
00363     def getUnexpectedToken(self):
00364         return self.token
00365 
00366 
00367     def __str__(self):
00368         exp = ", expected %s" % self.expecting
00369         if self.expecting == INVALID_TOKEN_TYPE:
00370             exp = ""
00371 
00372         if self.token is None:
00373             return "UnwantedTokenException(found=%s%s)" % (None, exp)
00374 
00375         return "UnwantedTokenException(found=%s%s)" % (self.token.text, exp)
00376     __repr__ = __str__
00377 
00378 
00379 ##
00380 # 
00381 #     We were expecting a token but it's not found.  The current token
00382 #     is actually what we wanted next.
00383 #     
00384 class MissingTokenException(MismatchedTokenException):
00385 
00386     def __init__(self, expecting, input, inserted):
00387         MismatchedTokenException.__init__(self, expecting, input)
00388 
00389         self.inserted = inserted
00390 
00391 
00392     def getMissingType(self):
00393         return self.expecting
00394 
00395 
00396     def __str__(self):
00397         if self.inserted is not None and self.token is not None:
00398             return "MissingTokenException(inserted %r at %r)" % (
00399                 self.inserted, self.token.text)
00400 
00401         if self.token is not None:
00402             return "MissingTokenException(at %r)" % self.token.text
00403 
00404         return "MissingTokenException"
00405     __repr__ = __str__
00406 
00407 
00408 ##
00409 # @brief The next token does not match a range of expected types.
00410 class MismatchedRangeException(RecognitionException):
00411 
00412     def __init__(self, a, b, input):
00413         RecognitionException.__init__(self, input)
00414 
00415         self.a = a
00416         self.b = b
00417         
00418 
00419     def __str__(self):
00420         return "MismatchedRangeException(%r not in [%r..%r])" % (
00421             self.getUnexpectedType(), self.a, self.b
00422             )
00423     __repr__ = __str__
00424     
00425 
00426 ##
00427 # @brief The next token does not match a set of expected types.
00428 class MismatchedSetException(RecognitionException):
00429 
00430     def __init__(self, expecting, input):
00431         RecognitionException.__init__(self, input)
00432 
00433         self.expecting = expecting
00434         
00435 
00436     def __str__(self):
00437         return "MismatchedSetException(%r not in %r)" % (
00438             self.getUnexpectedType(), self.expecting
00439             )
00440     __repr__ = __str__
00441 
00442 
00443 ##
00444 # @brief Used for remote debugger deserialization
00445 class MismatchedNotSetException(MismatchedSetException):
00446     
00447     def __str__(self):
00448         return "MismatchedNotSetException(%r!=%r)" % (
00449             self.getUnexpectedType(), self.expecting
00450             )
00451     __repr__ = __str__
00452 
00453 
00454 ##
00455 # @brief Unable to decide which alternative to choose.
00456 class NoViableAltException(RecognitionException):
00457 
00458     def __init__(
00459         self, grammarDecisionDescription, decisionNumber, stateNumber, input
00460         ):
00461         RecognitionException.__init__(self, input)
00462 
00463         self.grammarDecisionDescription = grammarDecisionDescription
00464         self.decisionNumber = decisionNumber
00465         self.stateNumber = stateNumber
00466 
00467 
00468     def __str__(self):
00469         return "NoViableAltException(%r!=[%r])" % (
00470             self.unexpectedType, self.grammarDecisionDescription
00471             )
00472     __repr__ = __str__
00473     
00474 
00475 ##
00476 # @brief The recognizer did not match anything for a (..)+ loop.
00477 class EarlyExitException(RecognitionException):
00478 
00479     def __init__(self, decisionNumber, input):
00480         RecognitionException.__init__(self, input)
00481 
00482         self.decisionNumber = decisionNumber
00483 
00484 
00485 ##
00486 # @brief A semantic predicate failed during validation.
00487 # 
00488 #     Validation of predicates
00489 #     occurs when normally parsing the alternative just like matching a token.
00490 #     Disambiguating predicate evaluation occurs when we hoist a predicate into
00491 #     a prediction decision.
00492 #     
00493 class FailedPredicateException(RecognitionException):
00494 
00495     def __init__(self, input, ruleName, predicateText):
00496         RecognitionException.__init__(self, input)
00497         
00498         self.ruleName = ruleName
00499         self.predicateText = predicateText
00500 
00501 
00502     def __str__(self):
00503         return "FailedPredicateException("+self.ruleName+",{"+self.predicateText+"}?)"
00504     __repr__ = __str__
00505     
00506 
00507 ##
00508 # @brief The next tree mode does not match the expected type.
00509 class MismatchedTreeNodeException(RecognitionException):
00510 
00511     def __init__(self, expecting, input):
00512         RecognitionException.__init__(self, input)
00513         
00514         self.expecting = expecting
00515 
00516     def __str__(self):
00517         return "MismatchedTreeNodeException(%r!=%r)" % (
00518             self.getUnexpectedType(), self.expecting
00519             )
00520     __repr__ = __str__
00521 """ANTLR3 runtime package"""
00522 
00523 
00524 EOF = -1
00525 
00526 ## All tokens go to the parser (unless skip() is called in that rule)
00527 # on a particular "channel".  The parser tunes to a particular channel
00528 # so that whitespace etc... can go to the parser on a "hidden" channel.
00529 DEFAULT_CHANNEL = 0
00530 
00531 ## Anything on different channel than DEFAULT_CHANNEL is not parsed
00532 # by parser.
00533 HIDDEN_CHANNEL = 99
00534 
00535 # Predefined token types
00536 EOR_TOKEN_TYPE = 1
00537 
00538 ##
00539 # imaginary tree navigation type; traverse "get child" link
00540 DOWN = 2
00541 ##
00542 #imaginary tree navigation type; finish with a child list
00543 UP = 3
00544 
00545 MIN_TOKEN_TYPE = UP+1
00546         
00547 INVALID_TOKEN_TYPE = 0
00548 
00549 """ANTLR3 runtime package"""
00550 
00551 """ANTLR3 runtime package"""
00552 
00553 
00554 from antlr3.constants import EOF, DEFAULT_CHANNEL, INVALID_TOKEN_TYPE
00555 
00556 ############################################################################
00557 #
00558 # basic token interface
00559 #
00560 ############################################################################
00561 
00562 ##
00563 # @brief Abstract token baseclass.
00564 class Token(object):
00565 
00566     ##
00567     # @brief Get the text of the token.
00568     # 
00569     #         Using setter/getter methods is deprecated. Use o.text instead.
00570     #         
00571     def getText(self):
00572         raise NotImplementedError
00573     
00574     ##
00575     # @brief Set the text of the token.
00576     # 
00577     #         Using setter/getter methods is deprecated. Use o.text instead.
00578     #         
00579     def setText(self, text):
00580         raise NotImplementedError
00581 
00582 
00583     ##
00584     # @brief Get the type of the token.
00585     # 
00586     #         Using setter/getter methods is deprecated. Use o.type instead.
00587     def getType(self):
00588 
00589         raise NotImplementedError
00590     
00591     ##
00592     # @brief Get the type of the token.
00593     # 
00594     #         Using setter/getter methods is deprecated. Use o.type instead.
00595     def setType(self, ttype):
00596 
00597         raise NotImplementedError
00598     
00599     
00600     ##
00601     # @brief Get the line number on which this token was matched
00602     # 
00603     #         Lines are numbered 1..n
00604     #         
00605     #         Using setter/getter methods is deprecated. Use o.line instead.
00606     def getLine(self):
00607 
00608         raise NotImplementedError
00609     
00610     ##
00611     # @brief Set the line number on which this token was matched
00612     # 
00613     #         Using setter/getter methods is deprecated. Use o.line instead.
00614     def setLine(self, line):
00615 
00616         raise NotImplementedError
00617     
00618     
00619     ##
00620     # @brief Get the column of the tokens first character,
00621     #         
00622     #         Columns are numbered 0..n-1
00623     #         
00624     #         Using setter/getter methods is deprecated. Use o.charPositionInLine instead.
00625     def getCharPositionInLine(self):
00626 
00627         raise NotImplementedError
00628     
00629     ##
00630     # @brief Set the column of the tokens first character,
00631     # 
00632     #         Using setter/getter methods is deprecated. Use o.charPositionInLine instead.
00633     def setCharPositionInLine(self, pos):
00634 
00635         raise NotImplementedError
00636     
00637 
00638     ##
00639     # @brief Get the channel of the token
00640     # 
00641     #         Using setter/getter methods is deprecated. Use o.channel instead.
00642     def getChannel(self):
00643 
00644         raise NotImplementedError
00645     
00646     ##
00647     # @brief Set the channel of the token
00648     # 
00649     #         Using setter/getter methods is deprecated. Use o.channel instead.
00650     def setChannel(self, channel):
00651 
00652         raise NotImplementedError
00653     
00654 
00655     ##
00656     # @brief Get the index in the input stream.
00657     # 
00658     #         An index from 0..n-1 of the token object in the input stream.
00659     #         This must be valid in order to use the ANTLRWorks debugger.
00660     #         
00661     #         Using setter/getter methods is deprecated. Use o.index instead.
00662     def getTokenIndex(self):
00663 
00664         raise NotImplementedError
00665     
00666     ##
00667     # @brief Set the index in the input stream.
00668     # 
00669     #         Using setter/getter methods is deprecated. Use o.index instead.
00670     def setTokenIndex(self, index):
00671 
00672         raise NotImplementedError
00673 
00674 
00675     ##
00676     # @brief From what character stream was this token created.
00677     # 
00678     #         You don't have to implement but it's nice to know where a Token
00679     #         comes from if you have include files etc... on the input.
00680     def getInputStream(self):
00681 
00682         raise NotImplementedError
00683 
00684     ##
00685     # @brief From what character stream was this token created.
00686     # 
00687     #         You don't have to implement but it's nice to know where a Token
00688     #         comes from if you have include files etc... on the input.
00689     def setInputStream(self, input):
00690 
00691         raise NotImplementedError
00692 
00693 
00694 ############################################################################
00695 #
00696 # token implementations
00697 #
00698 # Token
00699 # +- CommonToken
00700 # \- ClassicToken
00701 #
00702 ############################################################################
00703 
00704 ##
00705 # @brief Basic token implementation.
00706 # 
00707 #     This implementation does not copy the text from the input stream upon
00708 #     creation, but keeps start/stop pointers into the stream to avoid
00709 #     unnecessary copy operations.
00710 # 
00711 #     
00712 class CommonToken(Token):
00713     
00714     def __init__(self, type=None, channel=DEFAULT_CHANNEL, text=None,
00715                  input=None, start=None, stop=None, oldToken=None):
00716         Token.__init__(self)
00717         
00718         if oldToken is not None:
00719             self.type = oldToken.type
00720             self.line = oldToken.line
00721             self.charPositionInLine = oldToken.charPositionInLine
00722             self.channel = oldToken.channel
00723             self.index = oldToken.index
00724             self._text = oldToken._text
00725             if isinstance(oldToken, CommonToken):
00726                 self.input = oldToken.input
00727                 self.start = oldToken.start
00728                 self.stop = oldToken.stop
00729             
00730         else:
00731             self.type = type
00732             self.input = input
00733             self.charPositionInLine = -1 # set to invalid position
00734             self.line = 0
00735             self.channel = channel
00736             
00737             #What token number is this from 0..n-1 tokens; < 0 implies invalid index
00738             self.index = -1
00739             
00740             # We need to be able to change the text once in a while.  If
00741             # this is non-null, then getText should return this.  Note that
00742             # start/stop are not affected by changing this.
00743             self._text = text
00744 
00745             # The char position into the input buffer where this token starts
00746             self.start = start
00747 
00748             # The char position into the input buffer where this token stops
00749             # This is the index of the last char, *not* the index after it!
00750             self.stop = stop
00751 
00752 
00753     def getText(self):
00754         if self._text is not None:
00755             return self._text
00756 
00757         if self.input is None:
00758             return None
00759         
00760         return self.input.substring(self.start, self.stop)
00761 
00762 
00763     ##
00764     # 
00765     #         Override the text for this token.  getText() will return this text
00766     #         rather than pulling from the buffer.  Note that this does not mean
00767     #         that start/stop indexes are not valid.  It means that that input
00768     #         was converted to a new string in the token object.
00769     #   
00770     def setText(self, text):
00771         self._text = text
00772 
00773     text = property(getText, setText)
00774 
00775 
00776     def getType(self):
00777         return self.type 
00778 
00779     def setType(self, ttype):
00780         self.type = ttype
00781 
00782     
00783     def getLine(self):
00784         return self.line
00785     
00786     def setLine(self, line):
00787         self.line = line
00788 
00789 
00790     def getCharPositionInLine(self):
00791         return self.charPositionInLine
00792     
00793     def setCharPositionInLine(self, pos):
00794         self.charPositionInLine = pos
00795 
00796 
00797     def getChannel(self):
00798         return self.channel
00799     
00800     def setChannel(self, channel):
00801         self.channel = channel
00802     
00803 
00804     def getTokenIndex(self):
00805         return self.index
00806     
00807     def setTokenIndex(self, index):
00808         self.index = index
00809 
00810 
00811     def getInputStream(self):
00812         return self.input
00813 
00814     def setInputStream(self, input):
00815         self.input = input
00816 
00817 
00818     def __str__(self):
00819         if self.type == EOF:
00820             return "<EOF>"
00821 
00822         channelStr = ""
00823         if self.channel > 0:
00824             channelStr = ",channel=" + str(self.channel)
00825 
00826         txt = self.text
00827         if txt is not None:
00828             txt = txt.replace("\n","\\\\n")
00829             txt = txt.replace("\r","\\\\r")
00830             txt = txt.replace("\t","\\\\t")
00831         else:
00832             txt = "<no text>"
00833 
00834         return "[@%d,%d:%d=%r,<%d>%s,%d:%d]" % (
00835             self.index,
00836             self.start, self.stop,
00837             txt,
00838             self.type, channelStr,
00839             self.line, self.charPositionInLine
00840             )
00841     
00842 
00843 ##
00844 # @brief Alternative token implementation.
00845 #     
00846 #     A Token object like we'd use in ANTLR 2.x; has an actual string created
00847 #     and associated with this object.  These objects are needed for imaginary
00848 #     tree nodes that have payload objects.  We need to create a Token object
00849 #     that has a string; the tree node will point at this token.  CommonToken
00850 #     has indexes into a char stream and hence cannot be used to introduce
00851 #     new strings.
00852 #     
00853 class ClassicToken(Token):
00854 
00855     def __init__(self, type=None, text=None, channel=DEFAULT_CHANNEL,
00856                  oldToken=None
00857                  ):
00858         Token.__init__(self)
00859         
00860         if oldToken is not None:
00861             self.text = oldToken.text
00862             self.type = oldToken.type
00863             self.line = oldToken.line
00864             self.charPositionInLine = oldToken.charPositionInLine
00865             self.channel = oldToken.channel
00866             
00867         self.text = text
00868         self.type = type
00869         self.line = None
00870         self.charPositionInLine = None
00871         self.channel = channel
00872         self.index = None
00873 
00874 
00875     def getText(self):
00876         return self.text
00877 
00878     def setText(self, text):
00879         self.text = text
00880 
00881 
00882     def getType(self):
00883         return self.type 
00884 
00885     def setType(self, ttype):
00886         self.type = ttype
00887 
00888     
00889     def getLine(self):
00890         return self.line
00891     
00892     def setLine(self, line):
00893         self.line = line
00894 
00895 
00896     def getCharPositionInLine(self):
00897         return self.charPositionInLine
00898     
00899     def setCharPositionInLine(self, pos):
00900         self.charPositionInLine = pos
00901 
00902 
00903     def getChannel(self):
00904         return self.channel
00905     
00906     def setChannel(self, channel):
00907         self.channel = channel
00908     
00909 
00910     def getTokenIndex(self):
00911         return self.index
00912     
00913     def setTokenIndex(self, index):
00914         self.index = index
00915 
00916 
00917     def getInputStream(self):
00918         return None
00919 
00920     def setInputStream(self, input):
00921         pass
00922 
00923 
00924     def toString(self):
00925         channelStr = ""
00926         if self.channel > 0:
00927             channelStr = ",channel=" + str(self.channel)
00928             
00929         txt = self.text
00930         if txt is None:
00931             txt = "<no text>"
00932 
00933         return "[@%r,%r,<%r>%s,%r:%r]" % (self.index,
00934                                           txt,
00935                                           self.type,
00936                                           channelStr,
00937                                           self.line,
00938                                           self.charPositionInLine
00939                                           )
00940     
00941 
00942     __str__ = toString
00943     __repr__ = toString
00944 
00945 
00946 
00947 EOF_TOKEN = CommonToken(type=EOF)
00948         
00949 INVALID_TOKEN = CommonToken(type=INVALID_TOKEN_TYPE)
00950 
00951 # In an action, a lexer rule can set token to this SKIP_TOKEN and ANTLR
00952 # will avoid creating a token for this symbol and try to fetch another.
00953 SKIP_TOKEN = CommonToken(type=INVALID_TOKEN_TYPE)
00954 
00955 
00956 """ANTLR3 runtime package"""
00957 
00958 
00959 import codecs
00960 from StringIO import StringIO
00961 
00962 from antlr3.constants import DEFAULT_CHANNEL, EOF
00963 from antlr3.tokens import Token, EOF_TOKEN
00964 
00965 
00966 ############################################################################
00967 #
00968 # basic interfaces
00969 #   IntStream
00970 #    +- CharStream
00971 #    \- TokenStream
00972 #
00973 # subclasses must implemented all methods
00974 #
00975 ############################################################################
00976 
00977 ##
00978 # 
00979 #     @brief Base interface for streams of integer values.
00980 # 
00981 #     A simple stream of integers used when all I care about is the char
00982 #     or token type sequence (such as interpretation).
00983 #     
00984 class IntStream(object):
00985 
00986     def consume(self):
00987         raise NotImplementedError
00988     
00989 
00990     ##
00991     # Get int at current input pointer + i ahead where i=1 is next int.
00992     # 
00993     #         Negative indexes are allowed.  LA(-1) is previous token (token
00994     #   just matched).  LA(-i) where i is before first token should
00995     #   yield -1, invalid char / EOF.
00996     #   
00997     def LA(self, i):
00998         
00999         raise NotImplementedError
01000         
01001 
01002     ##
01003     # 
01004     #         Tell the stream to start buffering if it hasn't already.  Return
01005     #         current input position, index(), or some other marker so that
01006     #         when passed to rewind() you get back to the same spot.
01007     #         rewind(mark()) should not affect the input cursor.  The Lexer
01008     #         track line/col info as well as input index so its markers are
01009     #         not pure input indexes.  Same for tree node streams.
01010     #         
01011     def mark(self):
01012 
01013         raise NotImplementedError
01014 
01015 
01016     ##
01017     # 
01018     #         Return the current input symbol index 0..n where n indicates the
01019     #         last symbol has been read.  The index is the symbol about to be
01020     #         read not the most recently read symbol.
01021     #         
01022     def index(self):
01023 
01024         raise NotImplementedError
01025 
01026 
01027     ##
01028     # 
01029     #         Reset the stream so that next call to index would return marker.
01030     #         The marker will usually be index() but it doesn't have to be.  It's
01031     #         just a marker to indicate what state the stream was in.  This is
01032     #         essentially calling release() and seek().  If there are markers
01033     #         created after this marker argument, this routine must unroll them
01034     #         like a stack.  Assume the state the stream was in when this marker
01035     #         was created.
01036     # 
01037     #         If marker is None:
01038     #         Rewind to the input position of the last marker.
01039     #         Used currently only after a cyclic DFA and just
01040     #         before starting a sem/syn predicate to get the
01041     #         input position back to the start of the decision.
01042     #         Do not "pop" the marker off the state.  mark(i)
01043     #         and rewind(i) should balance still. It is
01044     #         like invoking rewind(last marker) but it should not "pop"
01045     #         the marker off.  It's like seek(last marker's input position).       
01046     #   
01047     def rewind(self, marker=None):
01048 
01049         raise NotImplementedError
01050 
01051 
01052     ##
01053     # 
01054     #         You may want to commit to a backtrack but don't want to force the
01055     #         stream to keep bookkeeping objects around for a marker that is
01056     #         no longer necessary.  This will have the same behavior as
01057     #         rewind() except it releases resources without the backward seek.
01058     #         This must throw away resources for all markers back to the marker
01059     #         argument.  So if you're nested 5 levels of mark(), and then release(2)
01060     #         you have to release resources for depths 2..5.
01061     #   
01062     def release(self, marker=None):
01063 
01064         raise NotImplementedError
01065 
01066 
01067     ##
01068     # 
01069     #         Set the input cursor to the position indicated by index.  This is
01070     #         normally used to seek ahead in the input stream.  No buffering is
01071     #         required to do this unless you know your stream will use seek to
01072     #         move backwards such as when backtracking.
01073     # 
01074     #         This is different from rewind in its multi-directional
01075     #         requirement and in that its argument is strictly an input cursor
01076     #         (index).
01077     # 
01078     #         For char streams, seeking forward must update the stream state such
01079     #         as line number.  For seeking backwards, you will be presumably
01080     #         backtracking using the mark/rewind mechanism that restores state and
01081     #         so this method does not need to update state when seeking backwards.
01082     # 
01083     #         Currently, this method is only used for efficient backtracking using
01084     #         memoization, but in the future it may be used for incremental parsing.
01085     # 
01086     #         The index is 0..n-1.  A seek to position i means that LA(1) will
01087     #         return the ith symbol.  So, seeking to 0 means LA(1) will return the
01088     #         first element in the stream. 
01089     #         
01090     def seek(self, index):
01091 
01092         raise NotImplementedError
01093 
01094 
01095     ##
01096     # 
01097     #         Only makes sense for streams that buffer everything up probably, but
01098     #         might be useful to display the entire stream or for testing.  This
01099     #         value includes a single EOF.
01100     #   
01101     def size(self):
01102 
01103         raise NotImplementedError
01104 
01105 
01106     ##
01107     # 
01108     #         Where are you getting symbols from?  Normally, implementations will
01109     #         pass the buck all the way to the lexer who can ask its input stream
01110     #         for the file name or whatever.
01111     #         
01112     def getSourceName(self):
01113 
01114         raise NotImplementedError
01115 
01116 
01117 ##
01118 # 
01119 #     @brief A source of characters for an ANTLR lexer.
01120 # 
01121 #     This is an abstract class that must be implemented by a subclass.
01122 #     
01123 #     
01124 class CharStream(IntStream):
01125 
01126     # pylint does not realize that this is an interface, too
01127     #pylint: disable-msg=W0223
01128     
01129     EOF = -1
01130 
01131 
01132     ##
01133     # 
01134     #         For infinite streams, you don't need this; primarily I'm providing
01135     #         a useful interface for action code.  Just make sure actions don't
01136     #         use this on streams that don't support it.
01137     #         
01138     def substring(self, start, stop):
01139 
01140         raise NotImplementedError
01141         
01142     
01143     ##
01144     # 
01145     #         Get the ith character of lookahead.  This is the same usually as
01146     #         LA(i).  This will be used for labels in the generated
01147     #         lexer code.  I'd prefer to return a char here type-wise, but it's
01148     #         probably better to be 32-bit clean and be consistent with LA.
01149     #         
01150     def LT(self, i):
01151 
01152         raise NotImplementedError
01153 
01154 
01155     ##
01156     # ANTLR tracks the line information automatically
01157     def getLine(self):
01158 
01159         raise NotImplementedError
01160 
01161 
01162     ##
01163     # 
01164     #         Because this stream can rewind, we need to be able to reset the line
01165     #         
01166     def setLine(self, line):
01167 
01168         raise NotImplementedError
01169 
01170 
01171     ##
01172     # 
01173     #         The index of the character relative to the beginning of the line 0..n-1
01174     #         
01175     def getCharPositionInLine(self):
01176 
01177         raise NotImplementedError
01178 
01179 
01180     def setCharPositionInLine(self, pos):
01181         raise NotImplementedError
01182 
01183 
01184 ##
01185 # 
01186 # 
01187 #     @brief A stream of tokens accessing tokens from a TokenSource
01188 # 
01189 #     This is an abstract class that must be implemented by a subclass.
01190 #     
01191 #     
01192 class TokenStream(IntStream):
01193     
01194     # pylint does not realize that this is an interface, too
01195     #pylint: disable-msg=W0223
01196     
01197     ##
01198     # 
01199     #         Get Token at current input pointer + i ahead where i=1 is next Token.
01200     #         i<0 indicates tokens in the past.  So -1 is previous token and -2 is
01201     #         two tokens ago. LT(0) is undefined.  For i>=n, return Token.EOFToken.
01202     #         Return null for LT(0) and any index that results in an absolute address
01203     #         that is negative.
01204     #   
01205     def LT(self, k):
01206 
01207         raise NotImplementedError
01208 
01209 
01210     ##
01211     # 
01212     #         Get a token at an absolute index i; 0..n-1.  This is really only
01213     #         needed for profiling and debugging and token stream rewriting.
01214     #         If you don't want to buffer up tokens, then this method makes no
01215     #         sense for you.  Naturally you can't use the rewrite stream feature.
01216     #         I believe DebugTokenStream can easily be altered to not use
01217     #         this method, removing the dependency.
01218     #         
01219     def get(self, i):
01220 
01221         raise NotImplementedError
01222 
01223 
01224     ##
01225     # 
01226     #         Where is this stream pulling tokens from?  This is not the name, but
01227     #         the object that provides Token objects.
01228     #   
01229     def getTokenSource(self):
01230 
01231         raise NotImplementedError
01232 
01233 
01234     ##
01235     # 
01236     #         Return the text of all tokens from start to stop, inclusive.
01237     #         If the stream does not buffer all the tokens then it can just
01238     #         return "" or null;  Users should not access $ruleLabel.text in
01239     #         an action of course in that case.
01240     # 
01241     #         Because the user is not required to use a token with an index stored
01242     #         in it, we must provide a means for two token objects themselves to
01243     #         indicate the start/end location.  Most often this will just delegate
01244     #         to the other toString(int,int).  This is also parallel with
01245     #         the TreeNodeStream.toString(Object,Object).
01246     #   
01247     def toString(self, start=None, stop=None):
01248 
01249         raise NotImplementedError
01250 
01251         
01252 ############################################################################
01253 #
01254 # character streams for use in lexers
01255 #   CharStream
01256 #   \- ANTLRStringStream
01257 #
01258 ############################################################################
01259 
01260 
01261 ##
01262 # 
01263 #     @brief CharStream that pull data from a unicode string.
01264 #     
01265 #     A pretty quick CharStream that pulls all data from an array
01266 #     directly.  Every method call counts in the lexer.
01267 # 
01268 #     
01269 class ANTLRStringStream(CharStream):
01270 
01271     
01272     ##
01273     # 
01274     #         @param data This should be a unicode string holding the data you want
01275     #            to parse. If you pass in a byte string, the Lexer will choke on
01276     #            non-ascii data.
01277     #            
01278     #         
01279     def __init__(self, data):
01280         
01281         CharStream.__init__(self)
01282         
01283         # The data being scanned
01284         self.strdata = unicode(data)
01285         self.data = [ord(c) for c in self.strdata]
01286         
01287         # How many characters are actually in the buffer
01288         self.n = len(data)
01289 
01290         # 0..n-1 index into string of next char
01291         self.p = 0
01292 
01293         # line number 1..n within the input
01294         self.line = 1
01295 
01296         # The index of the character relative to the beginning of the
01297         # line 0..n-1
01298         self.charPositionInLine = 0
01299 
01300         # A list of CharStreamState objects that tracks the stream state
01301         # values line, charPositionInLine, and p that can change as you
01302         # move through the input stream.  Indexed from 0..markDepth-1.
01303         self._markers = [ ]
01304         self.lastMarker = None
01305         self.markDepth = 0
01306 
01307         # What is name or source of this char stream?
01308         self.name = None
01309 
01310 
01311     ##
01312     # 
01313     #         Reset the stream so that it's in the same state it was
01314     #         when the object was created *except* the data array is not
01315     #         touched.
01316     #         
01317     def reset(self):
01318         
01319         self.p = 0
01320         self.line = 1
01321         self.charPositionInLine = 0
01322         self._markers = [ ]
01323 
01324 
01325     def consume(self):
01326         try:
01327             if self.data[self.p] == 10: # \n
01328                 self.line += 1
01329                 self.charPositionInLine = 0
01330             else:
01331                 self.charPositionInLine += 1
01332 
01333             self.p += 1
01334             
01335         except IndexError:
01336             # happend when we reached EOF and self.data[self.p] fails
01337             # just do nothing
01338             pass
01339 
01340 
01341 
01342     def LA(self, i):
01343         if i == 0:
01344             return 0 # undefined
01345 
01346         if i < 0:
01347             i += 1 # e.g., translate LA(-1) to use offset i=0; then data[p+0-1]
01348 
01349         try:
01350             return self.data[self.p+i-1]
01351         except IndexError:
01352             return EOF
01353 
01354 
01355 
01356     def LT(self, i):
01357         if i == 0:
01358             return 0 # undefined
01359 
01360         if i < 0:
01361             i += 1 # e.g., translate LA(-1) to use offset i=0; then data[p+0-1]
01362 
01363         try:
01364             return self.strdata[self.p+i-1]
01365         except IndexError:
01366             return EOF
01367 
01368 
01369     ##
01370     # 
01371     #         Return the current input symbol index 0..n where n indicates the
01372     #         last symbol has been read.  The index is the index of char to
01373     #         be returned from LA(1).
01374     #         
01375     def index(self):
01376         
01377         return self.p
01378 
01379 
01380     def size(self):
01381         return self.n
01382 
01383 
01384     def mark(self):
01385         state = (self.p, self.line, self.charPositionInLine)
01386         try:
01387             self._markers[self.markDepth] = state
01388         except IndexError:
01389             self._markers.append(state)
01390         self.markDepth += 1
01391         
01392         self.lastMarker = self.markDepth
01393         
01394         return self.lastMarker
01395 
01396 
01397     def rewind(self, marker=None):
01398         if marker is None:
01399             marker = self.lastMarker
01400 
01401         p, line, charPositionInLine = self._markers[marker-1]
01402 
01403         self.seek(p)
01404         self.line = line
01405         self.charPositionInLine = charPositionInLine
01406         self.release(marker)
01407 
01408 
01409     def release(self, marker=None):
01410         if marker is None:
01411             marker = self.lastMarker
01412 
01413         self.markDepth = marker-1
01414 
01415 
01416     ##
01417     # 
01418     #         consume() ahead until p==index; can't just set p=index as we must
01419     #         update line and charPositionInLine.
01420     #         
01421     def seek(self, index):
01422         
01423         if index <= self.p:
01424             self.p = index # just jump; don't update stream state (line, ...)
01425             return
01426 
01427         # seek forward, consume until p hits index
01428         while self.p < index:
01429             self.consume()
01430 
01431 
01432     def substring(self, start, stop):
01433         return self.strdata[start:stop+1]
01434 
01435 
01436     ##
01437     # Using setter/getter methods is deprecated. Use o.line instead.
01438     def getLine(self):
01439         return self.line
01440 
01441 
01442     ##
01443     # 
01444     #         Using setter/getter methods is deprecated. Use o.charPositionInLine
01445     #         instead.
01446     #         
01447     def getCharPositionInLine(self):
01448         return self.charPositionInLine
01449 
01450 
01451     ##
01452     # Using setter/getter methods is deprecated. Use o.line instead.
01453     def setLine(self, line):
01454         self.line = line
01455 
01456 
01457     ##
01458     # 
01459     #         Using setter/getter methods is deprecated. Use o.charPositionInLine
01460     #         instead.
01461     #         
01462     def setCharPositionInLine(self, pos):
01463         self.charPositionInLine = pos
01464 
01465 
01466     def getSourceName(self):
01467         return self.name
01468 
01469 
01470 ##
01471 # 
01472 #     @brief CharStream that opens a file to read the data.
01473 #     
01474 #     This is a char buffer stream that is loaded from a file
01475 #     all at once when you construct the object.
01476 #     
01477 class ANTLRFileStream(ANTLRStringStream):
01478 
01479     ##
01480     # 
01481     #         @param fileName The path to the file to be opened. The file will be
01482     #            opened with mode 'rb'.
01483     # 
01484     #         @param encoding If you set the optional encoding argument, then the
01485     #            data will be decoded on the fly.
01486     #            
01487     #         
01488     def __init__(self, fileName, encoding=None):
01489         
01490         self.fileName = fileName
01491 
01492         fp = codecs.open(fileName, 'rb', encoding)
01493         try:
01494             data = fp.read()
01495         finally:
01496             fp.close()
01497             
01498         ANTLRStringStream.__init__(self, data)
01499 
01500 
01501     ##
01502     # Deprecated, access o.fileName directly.
01503     def getSourceName(self):
01504         
01505         return self.fileName
01506 
01507 
01508 ##
01509 # 
01510 #     @brief CharStream that reads data from a file-like object.
01511 # 
01512 #     This is a char buffer stream that is loaded from a file like object
01513 #     all at once when you construct the object.
01514 #     
01515 #     All input is consumed from the file, but it is not closed.
01516 #     
01517 class ANTLRInputStream(ANTLRStringStream):
01518 
01519     ##
01520     # 
01521     #         @param file A file-like object holding your input. Only the read()
01522     #            method must be implemented.
01523     # 
01524     #         @param encoding If you set the optional encoding argument, then the
01525     #            data will be decoded on the fly.
01526     #            
01527     #         
01528     def __init__(self, file, encoding=None):
01529         
01530         if encoding is not None:
01531             # wrap input in a decoding reader
01532             reader = codecs.lookup(encoding)[2]
01533             file = reader(file)
01534 
01535         data = file.read()
01536             
01537         ANTLRStringStream.__init__(self, data)
01538 
01539 
01540 # I guess the ANTLR prefix exists only to avoid a name clash with some Java
01541 # mumbojumbo. A plain "StringStream" looks better to me, which should be
01542 # the preferred name in Python.
01543 StringStream = ANTLRStringStream
01544 FileStream = ANTLRFileStream
01545 InputStream = ANTLRInputStream
01546 
01547 
01548 ############################################################################
01549 #
01550 # Token streams
01551 #   TokenStream
01552 #   +- CommonTokenStream
01553 #   \- TokenRewriteStream
01554 #
01555 ############################################################################
01556 
01557 
01558 ##
01559 # 
01560 #     @brief The most common stream of tokens
01561 #     
01562 #     The most common stream of tokens is one where every token is buffered up
01563 #     and tokens are prefiltered for a certain channel (the parser will only
01564 #     see these tokens and cannot change the filter channel number during the
01565 #     parse).
01566 #     
01567 class CommonTokenStream(TokenStream):
01568 
01569     ##
01570     # 
01571     #         @param tokenSource A TokenSource instance (usually a Lexer) to pull
01572     #             the tokens from.
01573     # 
01574     #         @param channel Skip tokens on any channel but this one; this is how we
01575     #             skip whitespace...
01576     #             
01577     #         
01578     def __init__(self, tokenSource=None, channel=DEFAULT_CHANNEL):
01579         
01580         TokenStream.__init__(self)
01581         
01582         self.tokenSource = tokenSource
01583 
01584         # Record every single token pulled from the source so we can reproduce
01585         # chunks of it later.
01586         self.tokens = []
01587 
01588         # Map<tokentype, channel> to override some Tokens' channel numbers
01589         self.channelOverrideMap = {}
01590 
01591         # Set<tokentype>; discard any tokens with this type
01592         self.discardSet = set()
01593 
01594         # Skip tokens on any channel but this one; this is how we skip whitespace...
01595         self.channel = channel
01596 
01597         # By default, track all incoming tokens
01598         self.discardOffChannelTokens = False
01599 
01600         # The index into the tokens list of the current token (next token
01601         # to consume).  p==-1 indicates that the tokens list is empty
01602         self.p = -1
01603 
01604         # Remember last marked position
01605         self.lastMarker = None
01606         
01607 
01608     ##
01609     # Reset this token stream by setting its token source.
01610     def setTokenSource(self, tokenSource):
01611         
01612         self.tokenSource = tokenSource
01613         self.tokens = []
01614         self.p = -1
01615         self.channel = DEFAULT_CHANNEL
01616 
01617 
01618     def reset(self):
01619         self.p = 0
01620         self.lastMarker = None
01621 
01622 
01623     ##
01624     # 
01625     #         Load all tokens from the token source and put in tokens.
01626     #   This is done upon first LT request because you might want to
01627     #         set some token type / channel overrides before filling buffer.
01628     #         
01629     def fillBuffer(self):
01630         
01631 
01632         index = 0
01633         t = self.tokenSource.nextToken()
01634         while t is not None and t.type != EOF:
01635             discard = False
01636             
01637             if self.discardSet is not None and t.type in self.discardSet:
01638                 discard = True
01639 
01640             elif self.discardOffChannelTokens and t.channel != self.channel:
01641                 discard = True
01642 
01643             # is there a channel override for token type?
01644             try:
01645                 overrideChannel = self.channelOverrideMap[t.type]
01646                 
01647             except KeyError:
01648                 # no override for this type
01649                 pass
01650             
01651             else:
01652                 if overrideChannel == self.channel:
01653                     t.channel = overrideChannel
01654                 else:
01655                     discard = True
01656             
01657             if not discard:
01658                 t.index = index
01659                 self.tokens.append(t)
01660                 index += 1
01661 
01662             t = self.tokenSource.nextToken()
01663        
01664         # leave p pointing at first token on channel
01665         self.p = 0
01666         self.p = self.skipOffTokenChannels(self.p)
01667 
01668 
01669     ##
01670     # 
01671     #         Move the input pointer to the next incoming token.  The stream
01672     #         must become active with LT(1) available.  consume() simply
01673     #         moves the input pointer so that LT(1) points at the next
01674     #         input symbol. Consume at least one token.
01675     # 
01676     #         Walk past any token not on the channel the parser is listening to.
01677     #         
01678     def consume(self):
01679         
01680         if self.p < len(self.tokens):
01681             self.p += 1
01682 
01683             self.p = self.skipOffTokenChannels(self.p) # leave p on valid token
01684 
01685 
01686     ##
01687     # 
01688     #         Given a starting index, return the index of the first on-channel
01689     #         token.
01690     #         
01691     def skipOffTokenChannels(self, i):
01692 
01693         try:
01694             while self.tokens[i].channel != self.channel:
01695                 i += 1
01696         except IndexError:
01697             # hit the end of token stream
01698             pass
01699         
01700         return i
01701 
01702 
01703     def skipOffTokenChannelsReverse(self, i):
01704         while i >= 0 and self.tokens[i].channel != self.channel:
01705             i -= 1
01706 
01707         return i
01708 
01709 
01710     ##
01711     # 
01712     #         A simple filter mechanism whereby you can tell this token stream
01713     #         to force all tokens of type ttype to be on channel.  For example,
01714     #         when interpreting, we cannot exec actions so we need to tell
01715     #         the stream to force all WS and NEWLINE to be a different, ignored
01716     #         channel.
01717     #   
01718     def setTokenTypeChannel(self, ttype, channel):
01719         
01720         self.channelOverrideMap[ttype] = channel
01721 
01722 
01723     def discardTokenType(self, ttype):
01724         self.discardSet.add(ttype)
01725 
01726 
01727     ##
01728     # 
01729     #         Given a start and stop index, return a list of all tokens in
01730     #         the token type set.  Return None if no tokens were found.  This
01731     #         method looks at both on and off channel tokens.
01732     #         
01733     def getTokens(self, start=None, stop=None, types=None):
01734 
01735         if self.p == -1:
01736             self.fillBuffer()
01737 
01738         if stop is None or stop >= len(self.tokens):
01739             stop = len(self.tokens) - 1
01740             
01741         if start is None or stop < 0:
01742             start = 0
01743 
01744         if start > stop:
01745             return None
01746 
01747         if isinstance(types, (int, long)):
01748             # called with a single type, wrap into set
01749             types = set([types])
01750             
01751         filteredTokens = [
01752             token for token in self.tokens[start:stop]
01753             if types is None or token.type in types
01754             ]
01755 
01756         if len(filteredTokens) == 0:
01757             return None
01758 
01759         return filteredTokens
01760 
01761 
01762     ##
01763     # 
01764     #         Get the ith token from the current position 1..n where k=1 is the
01765     #         first symbol of lookahead.
01766     #         
01767     def LT(self, k):
01768 
01769         if self.p == -1:
01770             self.fillBuffer()
01771 
01772         if k == 0:
01773             return None
01774 
01775         if k < 0:
01776             return self.LB(-k)
01777                 
01778         i = self.p
01779         n = 1
01780         # find k good tokens
01781         while n < k:
01782             # skip off-channel tokens
01783             i = self.skipOffTokenChannels(i+1) # leave p on valid token
01784             n += 1
01785 
01786         try:
01787             return self.tokens[i]
01788         except IndexError:
01789             return EOF_TOKEN
01790 
01791 
01792     ##
01793     # Look backwards k tokens on-channel tokens
01794     def LB(self, k):
01795 
01796         if self.p == -1:
01797             self.fillBuffer()
01798 
01799         if k == 0:
01800             return None
01801 
01802         if self.p - k < 0:
01803             return None
01804 
01805         i = self.p
01806         n = 1
01807         # find k good tokens looking backwards
01808         while n <= k:
01809             # skip off-channel tokens
01810             i = self.skipOffTokenChannelsReverse(i-1) # leave p on valid token
01811             n += 1
01812 
01813         if i < 0:
01814             return None
01815             
01816         return self.tokens[i]
01817 
01818 
01819     ##
01820     # 
01821     #         Return absolute token i; ignore which channel the tokens are on;
01822     #         that is, count all tokens not just on-channel tokens.
01823     #         
01824     def get(self, i):
01825 
01826         return self.tokens[i]
01827 
01828 
01829     def LA(self, i):
01830         return self.LT(i).type
01831 
01832 
01833     def mark(self):
01834         self.lastMarker = self.index()
01835         return self.lastMarker
01836     
01837 
01838     def release(self, marker=None):
01839         # no resources to release
01840         pass
01841     
01842 
01843     def size(self):
01844         return len(self.tokens)
01845 
01846 
01847     def index(self):
01848         return self.p
01849 
01850 
01851     def rewind(self, marker=None):
01852         if marker is None:
01853             marker = self.lastMarker
01854             
01855         self.seek(marker)
01856 
01857 
01858     def seek(self, index):
01859         self.p = index
01860 
01861 
01862     def getTokenSource(self):
01863         return self.tokenSource
01864 
01865 
01866     def getSourceName(self):
01867         return self.tokenSource.getSourceName()
01868 
01869 
01870     def toString(self, start=None, stop=None):
01871         if self.p == -1:
01872             self.fillBuffer()
01873 
01874         if start is None:
01875             start = 0
01876         elif not isinstance(start, int):
01877             start = start.index
01878 
01879         if stop is None:
01880             stop = len(self.tokens) - 1
01881         elif not isinstance(stop, int):
01882             stop = stop.index
01883         
01884         if stop >= len(self.tokens):
01885             stop = len(self.tokens) - 1
01886 
01887         return ''.join([t.text for t in self.tokens[start:stop+1]])
01888 
01889 
01890 ##
01891 # @brief Internal helper class.
01892 class RewriteOperation(object):
01893     
01894     def __init__(self, stream, index, text):
01895         self.stream = stream
01896         self.index = index
01897         self.text = text
01898 
01899     ##
01900     # Execute the rewrite operation by possibly adding to the buffer.
01901     #         Return the index of the next token to operate on.
01902     #         
01903     def execute(self, buf):
01904 
01905         return self.index
01906 
01907     def toString(self):
01908         opName = self.__class__.__name__
01909         return '<%s@%d:"%s">' % (opName, self.index, self.text)
01910 
01911     __str__ = toString
01912     __repr__ = toString
01913 
01914 
01915 ##
01916 # @brief Internal helper class.
01917 class InsertBeforeOp(RewriteOperation):
01918 
01919     def execute(self, buf):
01920         buf.write(self.text)
01921         buf.write(self.stream.tokens[self.index].text)
01922         return self.index + 1
01923 
01924 
01925 ##
01926 # 
01927 #     @brief Internal helper class.
01928 #     
01929 #     I'm going to try replacing range from x..y with (y-x)+1 ReplaceOp
01930 #     instructions.
01931 #     
01932 class ReplaceOp(RewriteOperation):
01933 
01934     def __init__(self, stream, first, last, text):
01935         RewriteOperation.__init__(self, stream, first, text)
01936         self.lastIndex = last
01937 
01938 
01939     def execute(self, buf):
01940         if self.text is not None:
01941             buf.write(self.text)
01942 
01943         return self.lastIndex + 1
01944 
01945 
01946     def toString(self):
01947         return '<ReplaceOp@%d..%d:"%s">' % (
01948             self.index, self.lastIndex, self.text)
01949 
01950     __str__ = toString
01951     __repr__ = toString
01952 
01953 
01954 ##
01955 # 
01956 #     @brief Internal helper class.
01957 #     
01958 class DeleteOp(ReplaceOp):
01959 
01960     def __init__(self, stream, first, last):
01961         ReplaceOp.__init__(self, stream, first, last, None)
01962 
01963 
01964     def toString(self):
01965         return '<DeleteOp@%d..%d>' % (self.index, self.lastIndex)
01966 
01967     __str__ = toString
01968     __repr__ = toString
01969 
01970 
01971 ##
01972 # @brief CommonTokenStream that can be modified.
01973 # 
01974 #     Useful for dumping out the input stream after doing some
01975 #     augmentation or other manipulations.
01976 # 
01977 #     You can insert stuff, replace, and delete chunks.  Note that the
01978 #     operations are done lazily--only if you convert the buffer to a
01979 #     String.  This is very efficient because you are not moving data around
01980 #     all the time.  As the buffer of tokens is converted to strings, the
01981 #     toString() method(s) check to see if there is an operation at the
01982 #     current index.  If so, the operation is done and then normal String
01983 #     rendering continues on the buffer.  This is like having multiple Turing
01984 #     machine instruction streams (programs) operating on a single input tape. :)
01985 # 
01986 #     Since the operations are done lazily at toString-time, operations do not
01987 #     screw up the token index values.  That is, an insert operation at token
01988 #     index i does not change the index values for tokens i+1..n-1.
01989 # 
01990 #     Because operations never actually alter the buffer, you may always get
01991 #     the original token stream back without undoing anything.  Since
01992 #     the instructions are queued up, you can easily simulate transactions and
01993 #     roll back any changes if there is an error just by removing instructions.
01994 #     For example,
01995 # 
01996 #      CharStream input = new ANTLRFileStream("input");
01997 #      TLexer lex = new TLexer(input);
01998 #      TokenRewriteStream tokens = new TokenRewriteStream(lex);
01999 #      T parser = new T(tokens);
02000 #      parser.startRule();
02001 # 
02002 #      Then in the rules, you can execute
02003 #         Token t,u;
02004 #         ...
02005 #         input.insertAfter(t, "text to put after t");}
02006 #         input.insertAfter(u, "text after u");}
02007 #         System.out.println(tokens.toString());
02008 # 
02009 #     Actually, you have to cast the 'input' to a TokenRewriteStream. :(
02010 # 
02011 #     You can also have multiple "instruction streams" and get multiple
02012 #     rewrites from a single pass over the input.  Just name the instruction
02013 #     streams and use that name again when printing the buffer.  This could be
02014 #     useful for generating a C file and also its header file--all from the
02015 #     same buffer:
02016 # 
02017 #         tokens.insertAfter("pass1", t, "text to put after t");}
02018 #         tokens.insertAfter("pass2", u, "text after u");}
02019 #         System.out.println(tokens.toString("pass1"));
02020 #         System.out.println(tokens.toString("pass2"));
02021 # 
02022 #     If you don't use named rewrite streams, a "default" stream is used as
02023 #     the first example shows.
02024 #     
02025 class TokenRewriteStream(CommonTokenStream):
02026     
02027     DEFAULT_PROGRAM_NAME = "default"
02028     MIN_TOKEN_INDEX = 0
02029 
02030     def __init__(self, tokenSource=None, channel=DEFAULT_CHANNEL):
02031         CommonTokenStream.__init__(self, tokenSource, channel)
02032 
02033         # You may have multiple, named streams of rewrite operations.
02034         # I'm calling these things "programs."
02035         #  Maps String (name) -> rewrite (List)
02036         self.programs = {}
02037         self.programs[self.DEFAULT_PROGRAM_NAME] = []
02038         
02039         # Map String (program name) -> Integer index
02040         self.lastRewriteTokenIndexes = {}
02041         
02042 
02043     ##
02044     # 
02045     #         Rollback the instruction stream for a program so that
02046     #         the indicated instruction (via instructionIndex) is no
02047     #         longer in the stream.  UNTESTED!
02048     #         
02049     def rollback(self, *args):
02050 
02051         if len(args) == 2:
02052             programName = args[0]
02053             instructionIndex = args[1]
02054         elif len(args) == 1:
02055             programName = self.DEFAULT_PROGRAM_NAME
02056             instructionIndex = args[0]
02057         else:
02058             raise TypeError("Invalid arguments")
02059         
02060         p = self.programs.get(programName, None)
02061         if p is not None:
02062             self.programs[programName] = (
02063                 p[self.MIN_TOKEN_INDEX:instructionIndex])
02064 
02065 
02066     ##
02067     # Reset the program so that no instructions exist
02068     def deleteProgram(self, programName=DEFAULT_PROGRAM_NAME):
02069             
02070         self.rollback(programName, self.MIN_TOKEN_INDEX)
02071 
02072 
02073     def insertAfter(self, *args):
02074         if len(args) == 2:
02075             programName = self.DEFAULT_PROGRAM_NAME
02076             index = args[0]
02077             text = args[1]
02078             
02079         elif len(args) == 3:
02080             programName = args[0]
02081             index = args[1]
02082             text = args[2]
02083 
02084         else:
02085             raise TypeError("Invalid arguments")
02086 
02087         if isinstance(index, Token):
02088             # index is a Token, grap the stream index from it
02089             index = index.index
02090 
02091         # to insert after, just insert before next index (even if past end)
02092         self.insertBefore(programName, index+1, text)
02093 
02094 
02095     def insertBefore(self, *args):
02096         if len(args) == 2:
02097             programName = self.DEFAULT_PROGRAM_NAME
02098             index = args[0]
02099             text = args[1]
02100             
02101         elif len(args) == 3:
02102             programName = args[0]
02103             index = args[1]
02104             text = args[2]
02105 
02106         else:
02107             raise TypeError("Invalid arguments")
02108 
02109         if isinstance(index, Token):
02110             # index is a Token, grap the stream index from it
02111             index = index.index
02112 
02113         op = InsertBeforeOp(self, index, text)
02114         rewrites = self.getProgram(programName)
02115         rewrites.append(op)
02116 
02117 
02118     def replace(self, *args):
02119         if len(args) == 2:
02120             programName = self.DEFAULT_PROGRAM_NAME
02121             first = args[0]
02122             last = args[0]
02123             text = args[1]
02124             
02125         elif len(args) == 3:
02126             programName = self.DEFAULT_PROGRAM_NAME
02127             first = args[0]
02128             last = args[1]
02129             text = args[2]
02130             
02131         elif len(args) == 4:
02132             programName = args[0]
02133             first = args[1]
02134             last = args[2]
02135             text = args[3]
02136 
02137         else:
02138             raise TypeError("Invalid arguments")
02139 
02140         if isinstance(first, Token):
02141             # first is a Token, grap the stream index from it
02142             first = first.index
02143 
02144         if isinstance(last, Token):
02145             # last is a Token, grap the stream index from it
02146             last = last.index
02147 
02148         if first > last or first < 0 or last < 0 or last >= len(self.tokens):
02149             raise ValueError(
02150                 "replace: range invalid: "+first+".."+last+
02151                 "(size="+len(self.tokens)+")")
02152 
02153         op = ReplaceOp(self, first, last, text)
02154         rewrites = self.getProgram(programName)
02155         rewrites.append(op)
02156         
02157 
02158     def delete(self, *args):
02159         self.replace(*(list(args) + [None]))
02160 
02161 
02162     def getLastRewriteTokenIndex(self, programName=DEFAULT_PROGRAM_NAME):
02163         return self.lastRewriteTokenIndexes.get(programName, -1)
02164 
02165 
02166     def setLastRewriteTokenIndex(self, programName, i):
02167         self.lastRewriteTokenIndexes[programName] = i
02168 
02169 
02170     def getProgram(self, name):
02171         p = self.programs.get(name, None)
02172         if p is  None:
02173             p = self.initializeProgram(name)
02174 
02175         return p
02176 
02177 
02178     def initializeProgram(self, name):
02179         p = []
02180         self.programs[name] = p
02181         return p
02182 
02183 
02184     def toOriginalString(self, start=None, end=None):
02185         if start is None:
02186             start = self.MIN_TOKEN_INDEX
02187         if end is None:
02188             end = self.size() - 1
02189         
02190         buf = StringIO()
02191         i = start
02192         while i >= self.MIN_TOKEN_INDEX and i <= end and i < len(self.tokens):
02193             buf.write(self.get(i).text)
02194             i += 1
02195 
02196         return buf.getvalue()
02197 
02198 
02199     def toString(self, *args):
02200         if len(args) == 0:
02201             programName = self.DEFAULT_PROGRAM_NAME
02202             start = self.MIN_TOKEN_INDEX
02203             end = self.size() - 1
02204             
02205         elif len(args) == 1:
02206             programName = args[0]
02207             start = self.MIN_TOKEN_INDEX
02208             end = self.size() - 1
02209 
02210         elif len(args) == 2:
02211             programName = self.DEFAULT_PROGRAM_NAME
02212             start = args[0]
02213             end = args[1]
02214             
02215         if start is None:
02216             start = self.MIN_TOKEN_INDEX
02217         elif not isinstance(start, int):
02218             start = start.index
02219 
02220         if end is None:
02221             end = len(self.tokens) - 1
02222         elif not isinstance(end, int):
02223             end = end.index
02224 
02225         # ensure start/end are in range
02226         if end >= len(self.tokens):
02227             end = len(self.tokens) - 1
02228 
02229         if start < 0:
02230             start = 0
02231 
02232         rewrites = self.programs.get(programName)
02233         if rewrites is None or len(rewrites) == 0:
02234             # no instructions to execute
02235             return self.toOriginalString(start, end)
02236         
02237         buf = StringIO()
02238 
02239         # First, optimize instruction stream
02240         indexToOp = self.reduceToSingleOperationPerIndex(rewrites)
02241 
02242         # Walk buffer, executing instructions and emitting tokens
02243         i = start
02244         while i <= end and i < len(self.tokens):
02245             op = indexToOp.get(i)
02246             # remove so any left have index size-1
02247             try:
02248                 del indexToOp[i]
02249             except KeyError:
02250                 pass
02251 
02252             t = self.tokens[i]
02253             if op is None:
02254                 # no operation at that index, just dump token
02255                 buf.write(t.text)
02256                 i += 1 # move to next token
02257 
02258             else:
02259                 i = op.execute(buf) # execute operation and skip
02260 
02261         # include stuff after end if it's last index in buffer
02262         # So, if they did an insertAfter(lastValidIndex, "foo"), include
02263         # foo if end==lastValidIndex.
02264         if end == len(self.tokens) - 1:
02265             # Scan any remaining operations after last token
02266             # should be included (they will be inserts).
02267             for i in sorted(indexToOp.keys()):
02268                 op = indexToOp[i]
02269                 if op.index >= len(self.tokens)-1:
02270                     buf.write(op.text)
02271 
02272         return buf.getvalue()
02273 
02274     __str__ = toString
02275 
02276 
02277     ##
02278     # 
02279     #         We need to combine operations and report invalid operations (like
02280     #         overlapping replaces that are not completed nested).  Inserts to
02281     #         same index need to be combined etc...   Here are the cases:
02282     # 
02283     #         I.i.u I.j.v                           leave alone, nonoverlapping
02284     #         I.i.u I.i.v                           combine: Iivu
02285     # 
02286     #         R.i-j.u R.x-y.v | i-j in x-y          delete first R
02287     #         R.i-j.u R.i-j.v                       delete first R
02288     #         R.i-j.u R.x-y.v | x-y in i-j          ERROR
02289     #         R.i-j.u R.x-y.v | boundaries overlap  ERROR
02290     # 
02291     #         I.i.u R.x-y.v   | i in x-y            delete I
02292     #         I.i.u R.x-y.v   | i not in x-y        leave alone, nonoverlapping
02293     #         R.x-y.v I.i.u   | i in x-y            ERROR
02294     #         R.x-y.v I.x.u                         R.x-y.uv (combine, delete I)
02295     #         R.x-y.v I.i.u   | i not in x-y        leave alone, nonoverlapping
02296     # 
02297     #         I.i.u = insert u before op @ index i
02298     #         R.x-y.u = replace x-y indexed tokens with u
02299     # 
02300     #         First we need to examine replaces.  For any replace op:
02301     # 
02302     #           1. wipe out any insertions before op within that range.
02303     #           2. Drop any replace op before that is contained completely within
02304     #              that range.
02305     #           3. Throw exception upon boundary overlap with any previous replace.
02306     # 
02307     #         Then we can deal with inserts:
02308     # 
02309     #           1. for any inserts to same index, combine even if not adjacent.
02310     #           2. for any prior replace with same left boundary, combine this
02311     #              insert with replace and delete this replace.
02312     #           3. throw exception if index in same range as previous replace
02313     # 
02314     #         Don't actually delete; make op null in list. Easier to walk list.
02315     #         Later we can throw as we add to index -> op map.
02316     # 
02317     #         Note that I.2 R.2-2 will wipe out I.2 even though, technically, the
02318     #         inserted stuff would be before the replace range.  But, if you
02319     #         add tokens in front of a method body '{' and then delete the method
02320     #         body, I think the stuff before the '{' you added should disappear too.
02321     # 
02322     #         Return a map from token index to operation.
02323     #         
02324     def reduceToSingleOperationPerIndex(self, rewrites):
02325         
02326         # WALK REPLACES
02327         for i, rop in enumerate(rewrites):
02328             if rop is None:
02329                 continue
02330 
02331             if not isinstance(rop, ReplaceOp):
02332                 continue
02333 
02334             # Wipe prior inserts within range
02335             for j, iop in self.getKindOfOps(rewrites, InsertBeforeOp, i):
02336                 if iop.index >= rop.index and iop.index <= rop.lastIndex:
02337                     rewrites[j] = None  # delete insert as it's a no-op.
02338 
02339             # Drop any prior replaces contained within
02340             for j, prevRop in self.getKindOfOps(rewrites, ReplaceOp, i):
02341                 if (prevRop.index >= rop.index
02342                     and prevRop.lastIndex <= rop.lastIndex):
02343                     rewrites[j] = None  # delete replace as it's a no-op.
02344                     continue
02345 
02346                 # throw exception unless disjoint or identical
02347                 disjoint = (prevRop.lastIndex < rop.index
02348                             or prevRop.index > rop.lastIndex)
02349                 same = (prevRop.index == rop.index
02350                         and prevRop.lastIndex == rop.lastIndex)
02351                 if not disjoint and not same:
02352                     raise ValueError(
02353                         "replace op boundaries of %s overlap with previous %s"
02354                         % (rop, prevRop))
02355 
02356         # WALK INSERTS
02357         for i, iop in enumerate(rewrites):
02358             if iop is None:
02359                 continue
02360 
02361             if not isinstance(iop, InsertBeforeOp):
02362                 continue
02363 
02364             # combine current insert with prior if any at same index
02365             for j, prevIop in self.getKindOfOps(rewrites, InsertBeforeOp, i):
02366                 if prevIop.index == iop.index: # combine objects
02367                     # convert to strings...we're in process of toString'ing
02368                     # whole token buffer so no lazy eval issue with any
02369                     # templates
02370                     iop.text = self.catOpText(iop.text, prevIop.text)
02371                     rewrites[j] = None  # delete redundant prior insert
02372 
02373             # look for replaces where iop.index is in range; error
02374             for j, rop in self.getKindOfOps(rewrites, ReplaceOp, i):
02375                 if iop.index == rop.index:
02376                     rop.text = self.catOpText(iop.text, rop.text)
02377                     rewrites[i] = None  # delete current insert
02378                     continue
02379 
02380                 if iop.index >= rop.index and iop.index <= rop.lastIndex:
02381                     raise ValueError(
02382                         "insert op %s within boundaries of previous %s"
02383                         % (iop, rop))
02384         
02385         m = {}
02386         for i, op in enumerate(rewrites):
02387             if op is None:
02388                 continue # ignore deleted ops
02389 
02390             assert op.index not in m, "should only be one op per index"
02391             m[op.index] = op
02392 
02393         return m
02394 
02395 
02396     def catOpText(self, a, b):
02397         x = ""
02398         y = ""
02399         if a is not None:
02400             x = a
02401         if b is not None:
02402             y = b
02403         return x + y
02404 
02405 
02406     def getKindOfOps(self, rewrites, kind, before=None):
02407         if before is None:
02408             before = len(rewrites)
02409         elif before > len(rewrites):
02410             before = len(rewrites)
02411 
02412         for i, op in enumerate(rewrites[:before]):
02413             if op is None:
02414                 # ignore deleted
02415                 continue
02416             if op.__class__ == kind:
02417                 yield i, op
02418 
02419 
02420     def toDebugString(self, start=None, end=None):
02421         if start is None:
02422             start = self.MIN_TOKEN_INDEX
02423         if end is None:
02424             end = self.size() - 1
02425 
02426         buf = StringIO()
02427         i = start
02428         while i >= self.MIN_TOKEN_INDEX and i <= end and i < len(self.tokens):
02429             buf.write(self.get(i))
02430             i += 1
02431 
02432         return buf.getvalue()
02433 """ANTLR3 runtime package"""
02434 
02435 
02436 import sys
02437 import inspect
02438 
02439 from antlr3 import runtime_version, runtime_version_str
02440 from antlr3.constants import DEFAULT_CHANNEL, HIDDEN_CHANNEL, EOF, \
02441      EOR_TOKEN_TYPE, INVALID_TOKEN_TYPE
02442 from antlr3.exceptions import RecognitionException, MismatchedTokenException, \
02443      MismatchedRangeException, MismatchedTreeNodeException, \
02444      NoViableAltException, EarlyExitException, MismatchedSetException, \
02445      MismatchedNotSetException, FailedPredicateException, \
02446      BacktrackingFailed, UnwantedTokenException, MissingTokenException
02447 from antlr3.tokens import CommonToken, EOF_TOKEN, SKIP_TOKEN
02448 from antlr3.compat import set, frozenset, reversed
02449 
02450 
02451 ##
02452 # 
02453 #     The set of fields needed by an abstract recognizer to recognize input
02454 #     and recover from errors etc...  As a separate state object, it can be
02455 #     shared among multiple grammars; e.g., when one grammar imports another.
02456 # 
02457 #     These fields are publically visible but the actual state pointer per
02458 #     parser is protected.
02459 #     
02460 class RecognizerSharedState(object):
02461 
02462     def __init__(self):
02463         # Track the set of token types that can follow any rule invocation.
02464         # Stack grows upwards.
02465         self.following = []
02466 
02467         # This is true when we see an error and before having successfully
02468         # matched a token.  Prevents generation of more than one error message
02469         # per error.
02470         self.errorRecovery = False
02471 
02472         # The index into the input stream where the last error occurred.
02473         # This is used to prevent infinite loops where an error is found
02474         # but no token is consumed during recovery...another error is found,
02475         # ad naseum.  This is a failsafe mechanism to guarantee that at least
02476         # one token/tree node is consumed for two errors.
02477         self.lastErrorIndex = -1
02478 
02479         # If 0, no backtracking is going on.  Safe to exec actions etc...
02480         # If >0 then it's the level of backtracking.
02481         self.backtracking = 0
02482 
02483         # An array[size num rules] of Map<Integer,Integer> that tracks
02484         # the stop token index for each rule.  ruleMemo[ruleIndex] is
02485         # the memoization table for ruleIndex.  For key ruleStartIndex, you
02486         # get back the stop token for associated rule or MEMO_RULE_FAILED.
02487         #
02488         # This is only used if rule memoization is on (which it is by default).
02489         self.ruleMemo = None
02490 
02491         ## Did the recognizer encounter a syntax error?  Track how many.
02492         self.syntaxErrors = 0
02493 
02494 
02495         # LEXER FIELDS (must be in same state object to avoid casting
02496         # constantly in generated code and Lexer object) :(
02497 
02498 
02499         ## The goal of all lexer rules/methods is to create a token object.
02500         # This is an instance variable as multiple rules may collaborate to
02501         # create a single token.  nextToken will return this object after
02502         # matching lexer rule(s).  If you subclass to allow multiple token
02503         # emissions, then set this to the last token to be matched or
02504         # something nonnull so that the auto token emit mechanism will not
02505         # emit another token.
02506         self.token = None
02507 
02508         ## What character index in the stream did the current token start at?
02509         # Needed, for example, to get the text for current token.  Set at
02510         # the start of nextToken.
02511         self.tokenStartCharIndex = -1
02512 
02513         ## The line on which the first character of the token resides
02514         self.tokenStartLine = None
02515 
02516         ## The character position of first character within the line
02517         self.tokenStartCharPositionInLine = None
02518 
02519         ## The channel number for the current token
02520         self.channel = None
02521 
02522         ## The token type for the current token
02523         self.type = None
02524 
02525         ## You can set the text for the current token to override what is in
02526         # the input char buffer.  Use setText() or can set this instance var.
02527         self.text = None
02528         
02529 
02530 ##
02531 # 
02532 #     @brief Common recognizer functionality.
02533 #     
02534 #     A generic recognizer that can handle recognizers generated from
02535 #     lexer, parser, and tree grammars.  This is all the parsing
02536 #     support code essentially; most of it is error recovery stuff and
02537 #     backtracking.
02538 #     
02539 class BaseRecognizer(object):
02540 
02541     MEMO_RULE_FAILED = -2
02542     MEMO_RULE_UNKNOWN = -1
02543 
02544     # copies from Token object for convenience in actions
02545     DEFAULT_TOKEN_CHANNEL = DEFAULT_CHANNEL
02546 
02547     # for convenience in actions
02548     HIDDEN = HIDDEN_CHANNEL
02549 
02550     # overridden by generated subclasses
02551     tokenNames = None
02552 
02553     # The antlr_version attribute has been introduced in 3.1. If it is not
02554     # overwritten in the generated recognizer, we assume a default of 3.0.1.
02555     antlr_version = (3, 0, 1, 0)
02556     antlr_version_str = "3.0.1"
02557 
02558     def __init__(self, state=None):
02559         # Input stream of the recognizer. Must be initialized by a subclass.
02560         self.input = None
02561 
02562         ## State of a lexer, parser, or tree parser are collected into a state
02563         # object so the state can be shared.  This sharing is needed to
02564         # have one grammar import others and share same error variables
02565         # and other state variables.  It's a kind of explicit multiple
02566         # inheritance via delegation of methods and shared state.
02567         if state is None:
02568             state = RecognizerSharedState()
02569         self._state = state
02570 
02571         if self.antlr_version > runtime_version:
02572             raise RuntimeError(
02573                 "ANTLR version mismatch: "
02574                 "The recognizer has been generated by V%s, but this runtime "
02575                 "is V%s. Please use the V%s runtime or higher."
02576                 % (self.antlr_version_str,
02577                    runtime_version_str,
02578                    self.antlr_version_str))
02579         elif (self.antlr_version < (3, 1, 0, 0) and
02580               self.antlr_version != runtime_version):
02581             # FIXME: make the runtime compatible with 3.0.1 codegen
02582             # and remove this block.
02583             raise RuntimeError(
02584                 "ANTLR version mismatch: "
02585                 "The recognizer has been generated by V%s, but this runtime "
02586                 "is V%s. Please use the V%s runtime."
02587                 % (self.antlr_version_str,
02588                    runtime_version_str,
02589                    self.antlr_version_str))
02590 
02591     # this one only exists to shut up pylint :(
02592     def setInput(self, input):
02593         self.input = input
02594 
02595         
02596     ##
02597     # 
02598     #         reset the parser's state; subclasses must rewinds the input stream
02599     #         
02600     def reset(self):
02601         
02602         # wack everything related to error recovery
02603         if self._state is None:
02604             # no shared state work to do
02605             return
02606         
02607         self._state.following = []
02608         self._state.errorRecovery = False
02609         self._state.lastErrorIndex = -1
02610         self._state.syntaxErrors = 0
02611         # wack everything related to backtracking and memoization
02612         self._state.backtracking = 0
02613         if self._state.ruleMemo is not None:
02614             self._state.ruleMemo = {}
02615 
02616 
02617     ##
02618     # 
02619     #         Match current input symbol against ttype.  Attempt
02620     #         single token insertion or deletion error recovery.  If
02621     #         that fails, throw MismatchedTokenException.
02622     # 
02623     #         To turn off single token insertion or deletion error
02624     #         recovery, override recoverFromMismatchedToken() and have it
02625     #         throw an exception. See TreeParser.recoverFromMismatchedToken().
02626     #         This way any error in a rule will cause an exception and
02627     #         immediate exit from rule.  Rule would recover by resynchronizing
02628     #         to the set of symbols that can follow rule ref.
02629     #         
02630     def match(self, input, ttype, follow):
02631         
02632         matchedSymbol = self.getCurrentInputSymbol(input)
02633         if self.input.LA(1) == ttype:
02634             self.input.consume()
02635             self._state.errorRecovery = False
02636             return matchedSymbol
02637 
02638         if self._state.backtracking > 0:
02639             # FIXME: need to return matchedSymbol here as well. damn!!
02640             raise BacktrackingFailed
02641 
02642         matchedSymbol = self.recoverFromMismatchedToken(input, ttype, follow)
02643         return matchedSymbol
02644 
02645 
02646     ##
02647     # Match the wildcard: in a symbol
02648     def matchAny(self, input):
02649 
02650         self._state.errorRecovery = False
02651         self.input.consume()
02652 
02653 
02654     def mismatchIsUnwantedToken(self, input, ttype):
02655         return input.LA(2) == ttype
02656 
02657 
02658     def mismatchIsMissingToken(self, input, follow):
02659         if follow is None:
02660             # we have no information about the follow; we can only consume
02661             # a single token and hope for the best
02662             return False
02663         
02664         # compute what can follow this grammar element reference
02665         if EOR_TOKEN_TYPE in follow:
02666             viableTokensFollowingThisRule = self.computeContextSensitiveRuleFOLLOW()
02667             follow = follow | viableTokensFollowingThisRule
02668 
02669             if len(self._state.following) > 0:
02670                 # remove EOR if we're not the start symbol
02671                 follow = follow - set([EOR_TOKEN_TYPE])
02672 
02673         # if current token is consistent with what could come after set
02674         # then we know we're missing a token; error recovery is free to
02675         # "insert" the missing token
02676         if input.LA(1) in follow or EOR_TOKEN_TYPE in follow:
02677             return True
02678 
02679         return False
02680 
02681 
02682     ##
02683     # Report a recognition problem.
02684     #             
02685     #         This method sets errorRecovery to indicate the parser is recovering
02686     #         not parsing.  Once in recovery mode, no errors are generated.
02687     #         To get out of recovery mode, the parser must successfully match
02688     #         a token (after a resync).  So it will go:
02689     # 
02690     #         1. error occurs
02691     #         2. enter recovery mode, report error
02692     #         3. consume until token found in resynch set
02693     #         4. try to resume parsing
02694     #         5. next match() will reset errorRecovery mode
02695     # 
02696     #         If you override, make sure to update syntaxErrors if you care about
02697     #         that.
02698     #         
02699     #         
02700     def reportError(self, e):
02701         
02702         # if we've already reported an error and have not matched a token
02703         # yet successfully, don't report any errors.
02704         if self._state.errorRecovery:
02705             return
02706 
02707         self._state.syntaxErrors += 1 # don't count spurious
02708         self._state.errorRecovery = True
02709 
02710         self.displayRecognitionError(self.tokenNames, e)
02711 
02712 
02713     def displayRecognitionError(self, tokenNames, e):
02714         hdr = self.getErrorHeader(e)
02715         msg = self.getErrorMessage(e, tokenNames)
02716         self.emitErrorMessage(hdr+" "+msg)
02717 
02718 
02719     ##
02720     # 
02721     #         What error message should be generated for the various
02722     #         exception types?
02723     #         
02724     #         Not very object-oriented code, but I like having all error message
02725     #         generation within one method rather than spread among all of the
02726     #         exception classes. This also makes it much easier for the exception
02727     #         handling because the exception classes do not have to have pointers back
02728     #         to this object to access utility routines and so on. Also, changing
02729     #         the message for an exception type would be difficult because you
02730     #         would have to subclassing exception, but then somehow get ANTLR
02731     #         to make those kinds of exception objects instead of the default.
02732     #         This looks weird, but trust me--it makes the most sense in terms
02733     #         of flexibility.
02734     # 
02735     #         For grammar debugging, you will want to override this to add
02736     #         more information such as the stack frame with
02737     #         getRuleInvocationStack(e, this.getClass().getName()) and,
02738     #         for no viable alts, the decision description and state etc...
02739     # 
02740     #         Override this to change the message generated for one or more
02741     #         exception types.
02742     #         
02743     def getErrorMessage(self, e, tokenNames):
02744 
02745         if isinstance(e, UnwantedTokenException):
02746             tokenName = "<unknown>"
02747             if e.expecting == EOF:
02748                 tokenName = "EOF"
02749 
02750             else:
02751                 tokenName = self.tokenNames[e.expecting]
02752 
02753             msg = "extraneous input %s expecting %s" % (
02754                 self.getTokenErrorDisplay(e.getUnexpectedToken()),
02755                 tokenName
02756                 )
02757 
02758         elif isinstance(e, MissingTokenException):
02759             tokenName = "<unknown>"
02760             if e.expecting == EOF:
02761                 tokenName = "EOF"
02762 
02763             else:
02764                 tokenName = self.tokenNames[e.expecting]
02765 
02766             msg = "missing %s at %s" % (
02767                 tokenName, self.getTokenErrorDisplay(e.token)
02768                 )
02769 
02770         elif isinstance(e, MismatchedTokenException):
02771             tokenName = "<unknown>"
02772             if e.expecting == EOF:
02773                 tokenName = "EOF"
02774             else:
02775                 tokenName = self.tokenNames[e.expecting]
02776 
02777             msg = "mismatched input " \
02778                   + self.getTokenErrorDisplay(e.token) \
02779                   + " expecting " \
02780                   + tokenName
02781 
02782         elif isinstance(e, MismatchedTreeNodeException):
02783             tokenName = "<unknown>"
02784             if e.expecting == EOF:
02785                 tokenName = "EOF"
02786             else:
02787                 tokenName = self.tokenNames[e.expecting]
02788 
02789             msg = "mismatched tree node: %s expecting %s" \
02790                   % (e.node, tokenName)
02791 
02792         elif isinstance(e, NoViableAltException):
02793             msg = "no viable alternative at input " \
02794                   + self.getTokenErrorDisplay(e.token)
02795 
02796         elif isinstance(e, EarlyExitException):
02797             msg = "required (...)+ loop did not match anything at input " \
02798                   + self.getTokenErrorDisplay(e.token)
02799 
02800         elif isinstance(e, MismatchedSetException):
02801             msg = "mismatched input " \
02802                   + self.getTokenErrorDisplay(e.token) \
02803                   + " expecting set " \
02804                   + repr(e.expecting)
02805 
02806         elif isinstance(e, MismatchedNotSetException):
02807             msg = "mismatched input " \
02808                   + self.getTokenErrorDisplay(e.token) \
02809                   + " expecting set " \
02810                   + repr(e.expecting)
02811 
02812         elif isinstance(e, FailedPredicateException):
02813             msg = "rule " \
02814                   + e.ruleName \
02815                   + " failed predicate: {" \
02816                   + e.predicateText \
02817                   + "}?"
02818 
02819         else:
02820             msg = str(e)
02821 
02822         return msg
02823     
02824 
02825     ##
02826     # 
02827     #         Get number of recognition errors (lexer, parser, tree parser).  Each
02828     #         recognizer tracks its own number.  So parser and lexer each have
02829     #         separate count.  Does not count the spurious errors found between
02830     #         an error and next valid token match
02831     # 
02832     #         See also reportError()
02833     #   
02834     def getNumberOfSyntaxErrors(self):
02835         return self._state.syntaxErrors
02836 
02837 
02838     ##
02839     # 
02840     #         What is the error header, normally line/character position information?
02841     #         
02842     def getErrorHeader(self, e):
02843         
02844         return "line %d:%d" % (e.line, e.charPositionInLine)
02845 
02846 
02847     ##
02848     # 
02849     #         How should a token be displayed in an error message? The default
02850     #         is to display just the text, but during development you might
02851     #         want to have a lot of information spit out.  Override in that case
02852     #         to use t.toString() (which, for CommonToken, dumps everything about
02853     #         the token). This is better than forcing you to override a method in
02854     #         your token objects because you don't have to go modify your lexer
02855     #         so that it creates a new Java type.
02856     #         
02857     def getTokenErrorDisplay(self, t):
02858         
02859         s = t.text
02860         if s is None:
02861             if t.type == EOF:
02862                 s = "<EOF>"
02863             else:
02864                 s = "<"+t.type+">"
02865 
02866         return repr(s)
02867     
02868 
02869     ##
02870     # Override this method to change where error messages go
02871     def emitErrorMessage(self, msg):
02872         sys.stderr.write(msg + '\n')
02873 
02874 
02875     ##
02876     # 
02877     #         Recover from an error found on the input stream.  This is
02878     #         for NoViableAlt and mismatched symbol exceptions.  If you enable
02879     #         single token insertion and deletion, this will usually not
02880     #         handle mismatched symbol exceptions but there could be a mismatched
02881     #         token that the match() routine could not recover from.
02882     #         
02883     def recover(self, input, re):
02884         
02885         # PROBLEM? what if input stream is not the same as last time
02886         # perhaps make lastErrorIndex a member of input
02887         if self._state.lastErrorIndex == input.index():
02888             # uh oh, another error at same token index; must be a case
02889             # where LT(1) is in the recovery token set so nothing is
02890             # consumed; consume a single token so at least to prevent
02891             # an infinite loop; this is a failsafe.
02892             input.consume()
02893 
02894         self._state.lastErrorIndex = input.index()
02895         followSet = self.computeErrorRecoverySet()
02896         
02897         self.beginResync()
02898         self.consumeUntil(input, followSet)
02899         self.endResync()
02900 
02901 
02902     ##
02903     # 
02904     #         A hook to listen in on the token consumption during error recovery.
02905     #         The DebugParser subclasses this to fire events to the listenter.
02906     #         
02907     def beginResync(self):
02908 
02909         pass
02910 
02911 
02912     ##
02913     # 
02914     #         A hook to listen in on the token consumption during error recovery.
02915     #         The DebugParser subclasses this to fire events to the listenter.
02916     #         
02917     def endResync(self):
02918 
02919         pass
02920 
02921 
02922     ##
02923     # 
02924     #         Compute the error recovery set for the current rule.  During
02925     #         rule invocation, the parser pushes the set of tokens that can
02926     #         follow that rule reference on the stack; this amounts to
02927     #         computing FIRST of what follows the rule reference in the
02928     #         enclosing rule. This local follow set only includes tokens
02929     #         from within the rule; i.e., the FIRST computation done by
02930     #         ANTLR stops at the end of a rule.
02931     # 
02932     #         EXAMPLE
02933     # 
02934     #         When you find a "no viable alt exception", the input is not
02935     #         consistent with any of the alternatives for rule r.  The best
02936     #         thing to do is to consume tokens until you see something that
02937     #         can legally follow a call to r *or* any rule that called r.
02938     #         You don't want the exact set of viable next tokens because the
02939     #         input might just be missing a token--you might consume the
02940     #         rest of the input looking for one of the missing tokens.
02941     # 
02942     #         Consider grammar:
02943     # 
02944     #         a : '[' b ']'
02945     #           | '(' b ')'
02946     #           ;
02947     #         b : c '^' INT ;
02948     #         c : ID
02949     #           | INT
02950     #           ;
02951     # 
02952     #         At each rule invocation, the set of tokens that could follow
02953     #         that rule is pushed on a stack.  Here are the various "local"
02954     #         follow sets:
02955     # 
02956     #         FOLLOW(b1_in_a) = FIRST(']') = ']'
02957     #         FOLLOW(b2_in_a) = FIRST(')') = ')'
02958     #         FOLLOW(c_in_b) = FIRST('^') = '^'
02959     # 
02960     #         Upon erroneous input "[]", the call chain is
02961     # 
02962     #         a -> b -> c
02963     # 
02964     #         and, hence, the follow context stack is:
02965     # 
02966     #         depth  local follow set     after call to rule
02967     #           0         <EOF>                    a (from main())
02968     #           1          ']'                     b
02969     #           3          '^'                     c
02970     # 
02971     #         Notice that ')' is not included, because b would have to have
02972     #         been called from a different context in rule a for ')' to be
02973     #         included.
02974     # 
02975     #         For error recovery, we cannot consider FOLLOW(c)
02976     #         (context-sensitive or otherwise).  We need the combined set of
02977     #         all context-sensitive FOLLOW sets--the set of all tokens that
02978     #         could follow any reference in the call chain.  We need to
02979     #         resync to one of those tokens.  Note that FOLLOW(c)='^' and if
02980     #         we resync'd to that token, we'd consume until EOF.  We need to
02981     #         sync to context-sensitive FOLLOWs for a, b, and c: {']','^'}.
02982     #         In this case, for input "[]", LA(1) is in this set so we would
02983     #         not consume anything and after printing an error rule c would
02984     #         return normally.  It would not find the required '^' though.
02985     #         At this point, it gets a mismatched token error and throws an
02986     #         exception (since LA(1) is not in the viable following token
02987     #         set).  The rule exception handler tries to recover, but finds
02988     #         the same recovery set and doesn't consume anything.  Rule b
02989     #         exits normally returning to rule a.  Now it finds the ']' (and
02990     #         with the successful match exits errorRecovery mode).
02991     # 
02992     #         So, you cna see that the parser walks up call chain looking
02993     #         for the token that was a member of the recovery set.
02994     # 
02995     #         Errors are not generated in errorRecovery mode.
02996     # 
02997     #         ANTLR's error recovery mechanism is based upon original ideas:
02998     # 
02999     #         "Algorithms + Data Structures = Programs" by Niklaus Wirth
03000     # 
03001     #         and
03002     # 
03003     #         "A note on error recovery in recursive descent parsers":
03004     #         http://portal.acm.org/citation.cfm?id=947902.947905
03005     # 
03006     #         Later, Josef Grosch had some good ideas:
03007     # 
03008     #         "Efficient and Comfortable Error Recovery in Recursive Descent
03009     #         Parsers":
03010     #         ftp://www.cocolab.com/products/cocktail/doca4.ps/ell.ps.zip
03011     # 
03012     #         Like Grosch I implemented local FOLLOW sets that are combined
03013     #         at run-time upon error to avoid overhead during parsing.
03014     #         
03015     def computeErrorRecoverySet(self):
03016         
03017         return self.combineFollows(False)
03018 
03019         
03020     ##
03021     # 
03022     #         Compute the context-sensitive FOLLOW set for current rule.
03023     #         This is set of token types that can follow a specific rule
03024     #         reference given a specific call chain.  You get the set of
03025     #         viable tokens that can possibly come next (lookahead depth 1)
03026     #         given the current call chain.  Contrast this with the
03027     #         definition of plain FOLLOW for rule r:
03028     # 
03029     #          FOLLOW(r)={x | S=>*alpha r beta in G and x in FIRST(beta)}
03030     # 
03031     #         where x in T* and alpha, beta in V*; T is set of terminals and
03032     #         V is the set of terminals and nonterminals.  In other words,
03033     #         FOLLOW(r) is the set of all tokens that can possibly follow
03034     #         references to r in *any* sentential form (context).  At
03035     #         runtime, however, we know precisely which context applies as
03036     #         we have the call chain.  We may compute the exact (rather
03037     #         than covering superset) set of following tokens.
03038     # 
03039     #         For example, consider grammar:
03040     # 
03041     #         stat : ID '=' expr ';'      // FOLLOW(stat)=={EOF}
03042     #              | "return" expr '.'
03043     #              ;
03044     #         expr : atom ('+' atom)* ;   // FOLLOW(expr)=={';','.',')'}
03045     #         atom : INT                  // FOLLOW(atom)=={'+',')',';','.'}
03046     #              | '(' expr ')'
03047     #              ;
03048     # 
03049     #         The FOLLOW sets are all inclusive whereas context-sensitive
03050     #         FOLLOW sets are precisely what could follow a rule reference.
03051     #         For input input "i=(3);", here is the derivation:
03052     # 
03053     #         stat => ID '=' expr ';'
03054     #              => ID '=' atom ('+' atom)* ';'
03055     #              => ID '=' '(' expr ')' ('+' atom)* ';'
03056     #              => ID '=' '(' atom ')' ('+' atom)* ';'
03057     #              => ID '=' '(' INT ')' ('+' atom)* ';'
03058     #              => ID '=' '(' INT ')' ';'
03059     # 
03060     #         At the "3" token, you'd have a call chain of
03061     # 
03062     #           stat -> expr -> atom -> expr -> atom
03063     # 
03064     #         What can follow that specific nested ref to atom?  Exactly ')'
03065     #         as you can see by looking at the derivation of this specific
03066     #         input.  Contrast this with the FOLLOW(atom)={'+',')',';','.'}.
03067     # 
03068     #         You want the exact viable token set when recovering from a
03069     #         token mismatch.  Upon token mismatch, if LA(1) is member of
03070     #         the viable next token set, then you know there is most likely
03071     #         a missing token in the input stream.  "Insert" one by just not
03072     #         throwing an exception.
03073     #         
03074     def computeContextSensitiveRuleFOLLOW(self):
03075 
03076         return self.combineFollows(True)
03077 
03078 
03079     def combineFollows(self, exact):
03080         followSet = set()
03081         for idx, localFollowSet in reversed(list(enumerate(self._state.following))):
03082             followSet |= localFollowSet
03083             if exact:
03084                 # can we see end of rule?
03085                 if EOR_TOKEN_TYPE in localFollowSet:
03086                     # Only leave EOR in set if at top (start rule); this lets
03087                     # us know if have to include follow(start rule); i.e., EOF
03088                     if idx > 0:
03089                         followSet.remove(EOR_TOKEN_TYPE)
03090                         
03091                 else:
03092                     # can't see end of rule, quit
03093                     break
03094 
03095         return followSet
03096 
03097 
03098     ##
03099     # Attempt to recover from a single missing or extra token.
03100     # 
03101     #         EXTRA TOKEN
03102     # 
03103     #         LA(1) is not what we are looking for.  If LA(2) has the right token,
03104     #         however, then assume LA(1) is some extra spurious token.  Delete it
03105     #         and LA(2) as if we were doing a normal match(), which advances the
03106     #         input.
03107     # 
03108     #         MISSING TOKEN
03109     # 
03110     #         If current token is consistent with what could come after
03111     #         ttype then it is ok to 'insert' the missing token, else throw
03112     #         exception For example, Input 'i=(3;' is clearly missing the
03113     #         ')'.  When the parser returns from the nested call to expr, it
03114     #         will have call chain:
03115     # 
03116     #           stat -> expr -> atom
03117     # 
03118     #         and it will be trying to match the ')' at this point in the
03119     #         derivation:
03120     # 
03121     #              => ID '=' '(' INT ')' ('+' atom)* ';'
03122     #                                 ^
03123     #         match() will see that ';' doesn't match ')' and report a
03124     #         mismatched token error.  To recover, it sees that LA(1)==';'
03125     #         is in the set of tokens that can follow the ')' token
03126     #         reference in rule atom.  It can assume that you forgot the ')'.
03127     #         
03128     def recoverFromMismatchedToken(self, input, ttype, follow):
03129 
03130         e = None
03131 
03132         # if next token is what we are looking for then "delete" this token
03133         if self.mismatchIsUnwantedToken(input, ttype):
03134             e = UnwantedTokenException(ttype, input)
03135 
03136             self.beginResync()
03137             input.consume() # simply delete extra token
03138             self.endResync()
03139 
03140             # report after consuming so AW sees the token in the exception
03141             self.reportError(e)
03142 
03143             # we want to return the token we're actually matching
03144             matchedSymbol = self.getCurrentInputSymbol(input)
03145 
03146             # move past ttype token as if all were ok
03147             input.consume()
03148             return matchedSymbol
03149 
03150         # can't recover with single token deletion, try insertion
03151         if self.mismatchIsMissingToken(input, follow):
03152             inserted = self.getMissingSymbol(input, e, ttype, follow)
03153             e = MissingTokenException(ttype, input, inserted)
03154 
03155             # report after inserting so AW sees the token in the exception
03156             self.reportError(e)
03157             return inserted
03158 
03159         # even that didn't work; must throw the exception
03160         e = MismatchedTokenException(ttype, input)
03161         raise e
03162 
03163 
03164     ##
03165     # Not currently used
03166     def recoverFromMismatchedSet(self, input, e, follow):
03167 
03168         if self.mismatchIsMissingToken(input, follow):
03169             self.reportError(e)
03170             # we don't know how to conjure up a token for sets yet
03171             return self.getMissingSymbol(input, e, INVALID_TOKEN_TYPE, follow)
03172 
03173         # TODO do single token deletion like above for Token mismatch
03174         raise e
03175 
03176 
03177     ##
03178     # 
03179     #         Match needs to return the current input symbol, which gets put
03180     #         into the label for the associated token ref; e.g., x=ID.  Token
03181     #         and tree parsers need to return different objects. Rather than test
03182     #         for input stream type or change the IntStream interface, I use
03183     #         a simple method to ask the recognizer to tell me what the current
03184     #         input symbol is.
03185     # 
03186     #         This is ignored for lexers.
03187     #         
03188     def getCurrentInputSymbol(self, input):
03189         
03190         return None
03191 
03192 
03193     ##
03194     # Conjure up a missing token during error recovery.
03195     # 
03196     #         The recognizer attempts to recover from single missing
03197     #         symbols. But, actions might refer to that missing symbol.
03198     #         For example, x=ID {f($x);}. The action clearly assumes
03199     #         that there has been an identifier matched previously and that
03200     #         $x points at that token. If that token is missing, but
03201     #         the next token in the stream is what we want we assume that
03202     #         this token is missing and we keep going. Because we
03203     #         have to return some token to replace the missing token,
03204     #         we have to conjure one up. This method gives the user control
03205     #         over the tokens returned for missing tokens. Mostly,
03206     #         you will want to create something special for identifier
03207     #         tokens. For literals such as '{' and ',', the default
03208     #         action in the parser or tree parser works. It simply creates
03209     #         a CommonToken of the appropriate type. The text will be the token.
03210     #         If you change what tokens must be created by the lexer,
03211     #         override this method to create the appropriate tokens.
03212     #         
03213     def getMissingSymbol(self, input, e, expectedTokenType, follow):
03214 
03215         return None
03216 
03217 
03218 ##     def recoverFromMissingElement(self, input, e, follow):
03219 ##         """
03220 ##         This code is factored out from mismatched token and mismatched set
03221 ##         recovery.  It handles "single token insertion" error recovery for
03222 ##         both.  No tokens are consumed to recover from insertions.  Return
03223 ##         true if recovery was possible else return false.
03224 ##         """
03225         
03226 ##         if self.mismatchIsMissingToken(input, follow):
03227 ##             self.reportError(e)
03228 ##             return True
03229 
03230 ##         # nothing to do; throw exception
03231 ##         return False
03232 
03233 
03234     ##
03235     # 
03236     #         Consume tokens until one matches the given token or token set
03237     # 
03238     #         tokenTypes can be a single token type or a set of token types
03239     #         
03240     #         
03241     def consumeUntil(self, input, tokenTypes):
03242         
03243         if not isinstance(tokenTypes, (set, frozenset)):
03244             tokenTypes = frozenset([tokenTypes])
03245 
03246         ttype = input.LA(1)
03247         while ttype != EOF and ttype not in tokenTypes:
03248             input.consume()
03249             ttype = input.LA(1)
03250 
03251 
03252     def getRuleInvocationStack(self):
03253         ##
03254         # 
03255         #         Return List<String> of the rules in your parser instance
03256         #         leading up to a call to this method.  You could override if
03257         #         you want more details such as the file/line info of where
03258         #         in the parser java code a rule is invoked.
03259         # 
03260         #         This is very useful for error messages and for context-sensitive
03261         #         error recovery.
03262         # 
03263         #         You must be careful, if you subclass a generated recognizers.
03264         #         The default implementation will only search the module of self
03265         #         for rules, but the subclass will not contain any rules.
03266         #         You probably want to override this method to look like
03267         # 
03268         #             return self._getRuleInvocationStack(<class>.__module__)
03269         # 
03270         #         where <class> is the class of the generated recognizer, e.g.
03271         #         the superclass of self.
03272         #         
03273         def getRuleInvocationStack(self):
03274 
03275         return self._getRuleInvocationStack(self.__module__)
03276 
03277 
03278     ##
03279     # 
03280     #         A more general version of getRuleInvocationStack where you can
03281     #         pass in, for example, a RecognitionException to get it's rule
03282     #         stack trace.  This routine is shared with all recognizers, hence,
03283     #         static.
03284     # 
03285     #         TODO: move to a utility class or something; weird having lexer call
03286     #         this
03287     #         
03288     def _getRuleInvocationStack(cls, module):
03289 
03290         # mmmhhh,... perhaps look at the first argument
03291         # (f_locals[co_varnames[0]]?) and test if it's a (sub)class of
03292         # requested recognizer...
03293         
03294         rules = []
03295         for frame in reversed(inspect.stack()):
03296             code = frame[0].f_code
03297             codeMod = inspect.getmodule(code)
03298             if codeMod is None:
03299                 continue
03300 
03301             # skip frames not in requested module
03302             if codeMod.__name__ != module:
03303                 continue
03304 
03305             # skip some unwanted names
03306             if code.co_name in ('nextToken', '<module>'):
03307                 continue
03308 
03309             rules.append(code.co_name)
03310 
03311         return rules
03312         
03313     _getRuleInvocationStack = classmethod(_getRuleInvocationStack)
03314     
03315 
03316     def getBacktrackingLevel(self):
03317         return self._state.backtracking
03318 
03319     def setBacktrackingLevel(self, n):
03320         self._state.backtracking = n
03321 
03322 
03323     ##
03324     # Return whether or not a backtracking attempt failed.
03325     def failed(self):
03326 
03327         return self._state.failed
03328 
03329 
03330     ##
03331     # For debugging and other purposes, might want the grammar name.
03332     #         
03333     #         Have ANTLR generate an implementation for this method.
03334     #         
03335     def getGrammarFileName(self):
03336 
03337         return self.grammarFileName
03338 
03339 
03340     def getSourceName(self):
03341         raise NotImplementedError
03342 
03343     
03344     ##
03345     # A convenience method for use most often with template rewrites.
03346     # 
03347     #         Convert a List<Token> to List<String>
03348     #         
03349     def toStrings(self, tokens):
03350 
03351         if tokens is None:
03352             return None
03353 
03354         return [token.text for token in tokens]
03355 
03356 
03357     ##
03358     # 
03359     #         Given a rule number and a start token index number, return
03360     #         MEMO_RULE_UNKNOWN if the rule has not parsed input starting from
03361     #         start index.  If this rule has parsed input starting from the
03362     #         start index before, then return where the rule stopped parsing.
03363     #         It returns the index of the last token matched by the rule.
03364     #         
03365     def getRuleMemoization(self, ruleIndex, ruleStartIndex):
03366         
03367         if ruleIndex not in self._state.ruleMemo:
03368             self._state.ruleMemo[ruleIndex] = {}
03369 
03370         return self._state.ruleMemo[ruleIndex].get(
03371             ruleStartIndex, self.MEMO_RULE_UNKNOWN
03372             )
03373 
03374 
03375     ##
03376     # 
03377     #         Has this rule already parsed input at the current index in the
03378     #         input stream?  Return the stop token index or MEMO_RULE_UNKNOWN.
03379     #         If we attempted but failed to parse properly before, return
03380     #         MEMO_RULE_FAILED.
03381     # 
03382     #         This method has a side-effect: if we have seen this input for
03383     #         this rule and successfully parsed before, then seek ahead to
03384     #         1 past the stop token matched for this rule last time.
03385     #         
03386     def alreadyParsedRule(self, input, ruleIndex):
03387 
03388         stopIndex = self.getRuleMemoization(ruleIndex, input.index())
03389         if stopIndex == self.MEMO_RULE_UNKNOWN:
03390             return False
03391 
03392         if stopIndex == self.MEMO_RULE_FAILED:
03393             raise BacktrackingFailed
03394 
03395         else:
03396             input.seek(stopIndex + 1)
03397 
03398         return True
03399 
03400 
03401     ##
03402     # 
03403     #         Record whether or not this rule parsed the input at this position
03404     #         successfully.
03405     #         
03406     def memoize(self, input, ruleIndex, ruleStartIndex, success):
03407 
03408         if success:
03409             stopTokenIndex = input.index() - 1
03410         else:
03411             stopTokenIndex = self.MEMO_RULE_FAILED
03412         
03413         if ruleIndex in self._state.ruleMemo:
03414             self._state.ruleMemo[ruleIndex][ruleStartIndex] = stopTokenIndex
03415 
03416 
03417     def traceIn(self, ruleName, ruleIndex, inputSymbol):
03418         sys.stdout.write("enter %s %s" % (ruleName, inputSymbol))
03419         
03420         if self._state.backtracking > 0:
03421             sys.stdout.write(" backtracking=%s" % self._state.backtracking)
03422 
03423         sys.stdout.write('\n')
03424 
03425 
03426     def traceOut(self, ruleName, ruleIndex, inputSymbol):
03427         sys.stdout.write("exit %s %s" % (ruleName, inputSymbol))
03428         
03429         if self._state.backtracking > 0:
03430             sys.stdout.write(" backtracking=%s" % self._state.backtracking)
03431 
03432         if self._state.failed:
03433             sys.stdout.write(" failed")
03434         else:
03435             sys.stdout.write(" succeeded")
03436 
03437         sys.stdout.write('\n')
03438 
03439 
03440 ##
03441 # 
03442 #     @brief Abstract baseclass for token producers.
03443 #     
03444 #     A source of tokens must provide a sequence of tokens via nextToken()
03445 #     and also must reveal it's source of characters; CommonToken's text is
03446 #     computed from a CharStream; it only store indices into the char stream.
03447 # 
03448 #     Errors from the lexer are never passed to the parser.  Either you want
03449 #     to keep going or you do not upon token recognition error.  If you do not
03450 #     want to continue lexing then you do not want to continue parsing.  Just
03451 #     throw an exception not under RecognitionException and Java will naturally
03452 #     toss you all the way out of the recognizers.  If you want to continue
03453 #     lexing then you should not throw an exception to the parser--it has already
03454 #     requested a token.  Keep lexing until you get a valid one.  Just report
03455 #     errors and keep going, looking for a valid token.
03456 #     
03457 class TokenSource(object):
03458     
03459     ##
03460     # Return a Token object from your input stream (usually a CharStream).
03461     #         
03462     #         Do not fail/return upon lexing error; keep chewing on the characters
03463     #         until you get a good one; errors are not passed through to the parser.
03464     #         
03465     def nextToken(self):
03466 
03467         raise NotImplementedError
03468     
03469 
03470     ##
03471     # The TokenSource is an interator.
03472     # 
03473     #         The iteration will not include the final EOF token, see also the note
03474     #         for the next() method.
03475     # 
03476     #         
03477     def __iter__(self):
03478         
03479         return self
03480 
03481     
03482     ##
03483     # Return next token or raise StopIteration.
03484     # 
03485     #         Note that this will raise StopIteration when hitting the EOF token,
03486     #         so EOF will not be part of the iteration.
03487     #         
03488     #         
03489     def next(self):
03490 
03491         token = self.nextToken()
03492         if token is None or token.type == EOF:
03493             raise StopIteration
03494         return token
03495 
03496     
03497 ##
03498 # 
03499 #     @brief Baseclass for generated lexer classes.
03500 #     
03501 #     A lexer is recognizer that draws input symbols from a character stream.
03502 #     lexer grammars result in a subclass of this object. A Lexer object
03503 #     uses simplified match() and error recovery mechanisms in the interest
03504 #     of speed.
03505 #     
03506 class Lexer(BaseRecognizer, TokenSource):
03507 
03508     def __init__(self, input, state=None):
03509         BaseRecognizer.__init__(self, state)
03510         TokenSource.__init__(self)
03511         
03512         # Where is the lexer drawing characters from?
03513         self.input = input
03514 
03515 
03516     def reset(self):
03517         BaseRecognizer.reset(self) # reset all recognizer state variables
03518 
03519         if self.input is not None:
03520             # rewind the input
03521             self.input.seek(0)
03522 
03523         if self._state is None:
03524             # no shared state work to do
03525             return
03526         
03527         # wack Lexer state variables
03528         self._state.token = None
03529         self._state.type = INVALID_TOKEN_TYPE
03530         self._state.channel = DEFAULT_CHANNEL
03531         self._state.tokenStartCharIndex = -1
03532         self._state.tokenStartLine = -1
03533         self._state.tokenStartCharPositionInLine = -1
03534         self._state.text = None
03535 
03536 
03537     ##
03538     # 
03539     #         Return a token from this source; i.e., match a token on the char
03540     #         stream.
03541     #         
03542     def nextToken(self):
03543         
03544         while 1:
03545             self._state.token = None
03546             self._state.channel = DEFAULT_CHANNEL
03547             self._state.tokenStartCharIndex = self.input.index()
03548             self._state.tokenStartCharPositionInLine = self.input.charPositionInLine
03549             self._state.tokenStartLine = self.input.line
03550             self._state.text = None
03551             if self.input.LA(1) == EOF:
03552                 return EOF_TOKEN
03553 
03554             try:
03555                 self.mTokens()
03556                 
03557                 if self._state.token is None:
03558                     self.emit()
03559                     
03560                 elif self._state.token == SKIP_TOKEN:
03561                     continue
03562 
03563                 return self._state.token
03564 
03565             except NoViableAltException, re:
03566                 self.reportError(re)
03567                 self.recover(re) # throw out current char and try again
03568 
03569             except RecognitionException, re:
03570                 self.reportError(re)
03571                 # match() routine has already called recover()
03572 
03573 
03574     ##
03575     # 
03576     #         Instruct the lexer to skip creating a token for current lexer rule
03577     #         and look for another token.  nextToken() knows to keep looking when
03578     #         a lexer rule finishes with token set to SKIP_TOKEN.  Recall that
03579     #         if token==null at end of any token rule, it creates one for you
03580     #         and emits it.
03581     #         
03582     def skip(self):
03583         
03584         self._state.token = SKIP_TOKEN
03585 
03586 
03587     ##
03588     # This is the lexer entry point that sets instance var 'token'
03589     def mTokens(self):
03590 
03591         # abstract method
03592         raise NotImplementedError
03593     
03594 
03595     ##
03596     # Set the char stream and reset the lexer
03597     def setCharStream(self, input):
03598         self.input = None
03599         self.reset()
03600         self.input = input
03601 
03602 
03603     def getSourceName(self):
03604         return self.input.getSourceName()
03605 
03606 
03607     ##
03608     # 
03609     #         The standard method called to automatically emit a token at the
03610     #         outermost lexical rule.  The token object should point into the
03611     #         char buffer start..stop.  If there is a text override in 'text',
03612     #         use that to set the token's text.  Override this method to emit
03613     #         custom Token objects.
03614     # 
03615     #         If you are building trees, then you should also override
03616     #         Parser or TreeParser.getMissingSymbol().
03617     #         
03618     def emit(self, token=None):
03619 
03620         if token is None:
03621             token = CommonToken(
03622                 input=self.input,
03623                 type=self._state.type,
03624                 channel=self._state.channel,
03625                 start=self._state.tokenStartCharIndex,
03626                 stop=self.getCharIndex()-1
03627                 )
03628             token.line = self._state.tokenStartLine
03629             token.text = self._state.text
03630             token.charPositionInLine = self._state.tokenStartCharPositionInLine
03631 
03632         self._state.token = token
03633         
03634         return token
03635 
03636 
03637     def match(self, s):
03638         if isinstance(s, basestring):
03639             for c in s:
03640                 if self.input.LA(1) != ord(c):
03641                     if self._state.backtracking > 0:
03642                         raise BacktrackingFailed
03643 
03644                     mte = MismatchedTokenException(c, self.input)
03645                     self.recover(mte)
03646                     raise mte
03647 
03648                 self.input.consume()
03649 
03650         else:
03651             if self.input.LA(1) != s:
03652                 if self._state.backtracking > 0:
03653                     raise BacktrackingFailed
03654 
03655                 mte = MismatchedTokenException(unichr(s), self.input)
03656                 self.recover(mte) # don't really recover; just consume in lexer
03657                 raise mte
03658         
03659             self.input.consume()
03660             
03661 
03662     def matchAny(self):
03663         self.input.consume()
03664 
03665 
03666     def matchRange(self, a, b):
03667         if self.input.LA(1) < a or self.input.LA(1) > b:
03668             if self._state.backtracking > 0:
03669                 raise BacktrackingFailed
03670 
03671             mre = MismatchedRangeException(unichr(a), unichr(b), self.input)
03672             self.recover(mre)
03673             raise mre
03674 
03675         self.input.consume()
03676 
03677 
03678     def getLine(self):
03679         return self.input.line
03680 
03681 
03682     def getCharPositionInLine(self):
03683         return self.input.charPositionInLine
03684 
03685 
03686     ##
03687     # What is the index of the current character of lookahead?
03688     def getCharIndex(self):
03689         
03690         return self.input.index()
03691 
03692 
03693     ##
03694     # 
03695     #         Return the text matched so far for the current token or any
03696     #         text override.
03697     #         
03698     def getText(self):
03699         if self._state.text is not None:
03700             return self._state.text
03701         
03702         return self.input.substring(
03703             self._state.tokenStartCharIndex,
03704             self.getCharIndex()-1
03705             )
03706 
03707 
03708     ##
03709     # 
03710     #         Set the complete text of this token; it wipes any previous
03711     #         changes to the text.
03712     #         
03713     def setText(self, text):
03714         self._state.text = text
03715 
03716 
03717     text = property(getText, setText)
03718 
03719 
03720     def reportError(self, e):
03721         ## TODO: not thought about recovery in lexer yet.
03722 
03723         ## # if we've already reported an error and have not matched a token
03724         ## # yet successfully, don't report any errors.
03725         ## if self.errorRecovery:
03726         ##     #System.err.print("[SPURIOUS] ");
03727         ##     return;
03728         ## 
03729         ## self.errorRecovery = True
03730 
03731         self.displayRecognitionError(self.tokenNames, e)
03732 
03733 
03734     def getErrorMessage(self, e, tokenNames):
03735         msg = None
03736         
03737         if isinstance(e, MismatchedTokenException):
03738             msg = "mismatched character " \
03739                   + self.getCharErrorDisplay(e.c) \
03740                   + " expecting " \
03741                   + self.getCharErrorDisplay(e.expecting)
03742 
03743         elif isinstance(e, NoViableAltException):
03744             msg = "no viable alternative at character " \
03745                   + self.getCharErrorDisplay(e.c)
03746 
03747         elif isinstance(e, EarlyExitException):
03748             msg = "required (...)+ loop did not match anything at character " \
03749                   + self.getCharErrorDisplay(e.c)
03750             
03751         elif isinstance(e, MismatchedNotSetException):
03752             msg = "mismatched character " \
03753                   + self.getCharErrorDisplay(e.c) \
03754                   + " expecting set " \
03755                   + repr(e.expecting)
03756 
03757         elif isinstance(e, MismatchedSetException):
03758             msg = "mismatched character " \
03759                   + self.getCharErrorDisplay(e.c) \
03760                   + " expecting set " \
03761                   + repr(e.expecting)
03762 
03763         elif isinstance(e, MismatchedRangeException):
03764             msg = "mismatched character " \
03765                   + self.getCharErrorDisplay(e.c) \
03766                   + " expecting set " \
03767                   + self.getCharErrorDisplay(e.a) \
03768                   + ".." \
03769                   + self.getCharErrorDisplay(e.b)
03770 
03771         else:
03772             msg = BaseRecognizer.getErrorMessage(self, e, tokenNames)
03773 
03774         return msg
03775 
03776 
03777     def getCharErrorDisplay(self, c):
03778         if c == EOF:
03779             c = '<EOF>'
03780         return repr(c)
03781 
03782 
03783     ##
03784     # 
03785     #         Lexers can normally match any char in it's vocabulary after matching
03786     #         a token, so do the easy thing and just kill a character and hope
03787     #         it all works out.  You can instead use the rule invocation stack
03788     #         to do sophisticated error recovery if you are in a fragment rule.
03789     #         
03790     def recover(self, re):
03791 
03792         self.input.consume()
03793 
03794 
03795     def traceIn(self, ruleName, ruleIndex):
03796         inputSymbol = "%s line=%d:%s" % (self.input.LT(1),
03797                                          self.getLine(),
03798                                          self.getCharPositionInLine()
03799                                          )
03800         
03801         BaseRecognizer.traceIn(self, ruleName, ruleIndex, inputSymbol)
03802 
03803 
03804     def traceOut(self, ruleName, ruleIndex):
03805         inputSymbol = "%s line=%d:%s" % (self.input.LT(1),
03806                                          self.getLine(),
03807                                          self.getCharPositionInLine()
03808                                          )
03809 
03810         BaseRecognizer.traceOut(self, ruleName, ruleIndex, inputSymbol)
03811 
03812 
03813 
03814 ##
03815 # 
03816 #     @brief Baseclass for generated parser classes.
03817 #     
03818 class Parser(BaseRecognizer):
03819     
03820     def __init__(self, lexer, state=None):
03821         BaseRecognizer.__init__(self, state)
03822 
03823         self.setTokenStream(lexer)
03824 
03825 
03826     def reset(self):
03827         BaseRecognizer.reset(self) # reset all recognizer state variables
03828         if self.input is not None:
03829             self.input.seek(0) # rewind the input
03830 
03831 
03832     def getCurrentInputSymbol(self, input):
03833         return input.LT(1)
03834 
03835 
03836     def getMissingSymbol(self, input, e, expectedTokenType, follow):
03837         if expectedTokenType == EOF:
03838             tokenText = "<missing EOF>"
03839         else:
03840             tokenText = "<missing " + self.tokenNames[expectedTokenType] + ">"
03841         t = CommonToken(type=expectedTokenType, text=tokenText)
03842         current = input.LT(1)
03843         if current.type == EOF:
03844             current = input.LT(-1)
03845 
03846         if current is not None:
03847             t.line = current.line
03848             t.charPositionInLine = current.charPositionInLine
03849         t.channel = DEFAULT_CHANNEL
03850         return t
03851 
03852 
03853     ##
03854     # Set the token stream and reset the parser
03855     def setTokenStream(self, input):
03856         
03857         self.input = None
03858         self.reset()
03859         self.input = input
03860 
03861 
03862     def getTokenStream(self):
03863         return self.input
03864 
03865 
03866     def getSourceName(self):
03867         return self.input.getSourceName()
03868 
03869 
03870     def traceIn(self, ruleName, ruleIndex):
03871         BaseRecognizer.traceIn(self, ruleName, ruleIndex, self.input.LT(1))
03872 
03873 
03874     def traceOut(self, ruleName, ruleIndex):
03875         BaseRecognizer.traceOut(self, ruleName, ruleIndex, self.input.LT(1))
03876 
03877 
03878 ##
03879 # 
03880 #     Rules can return start/stop info as well as possible trees and templates.
03881 #     
03882 class RuleReturnScope(object):
03883 
03884     ##
03885     # Return the start token or tree.
03886     def getStart(self):
03887         return None
03888     
03889 
03890     ##
03891     # Return the stop token or tree.
03892     def getStop(self):
03893         return None
03894 
03895     
03896     ##
03897     # Has a value potentially if output=AST.
03898     def getTree(self):
03899         return None
03900 
03901 
03902     ##
03903     # Has a value potentially if output=template.
03904     def getTemplate(self):
03905         return None
03906 
03907 
03908 ##
03909 # 
03910 #     Rules that return more than a single value must return an object
03911 #     containing all the values.  Besides the properties defined in
03912 #     RuleLabelScope.predefinedRulePropertiesScope there may be user-defined
03913 #     return values.  This class simply defines the minimum properties that
03914 #     are always defined and methods to access the others that might be
03915 #     available depending on output option such as template and tree.
03916 # 
03917 #     Note text is not an actual property of the return value, it is computed
03918 #     from start and stop using the input stream's toString() method.  I
03919 #     could add a ctor to this so that we can pass in and store the input
03920 #     stream, but I'm not sure we want to do that.  It would seem to be undefined
03921 #     to get the .text property anyway if the rule matches tokens from multiple
03922 #     input streams.
03923 # 
03924 #     I do not use getters for fields of objects that are used simply to
03925 #     group values such as this aggregate.  The getters/setters are there to
03926 #     satisfy the superclass interface.
03927 #     
03928 class ParserRuleReturnScope(RuleReturnScope):
03929 
03930     def __init__(self):
03931         self.start = None
03932         self.stop = None
03933 
03934     
03935     def getStart(self):
03936         return self.start
03937 
03938 
03939     def getStop(self):
03940         return self.stop
03941 
03942 
03943