[antlr-interest] Match the start and end of a line
Gary R. Van Sickle
g.r.vansickle at att.net
Thu Dec 25 08:18:05 PST 2008
> From: Gavin Lambert
>
> At 22:17 25/12/2008, Gary R. Van Sickle wrote:
> >translation_unit
> > : (BOL statement EOL)+
> > ;
> >
> >You'd have to be throwing up WS tokens as well though for
> >that to be buying you anything.
>
> Actually even in that case it doesn't really buy you
> anything, unless EOLs can occur in other contexts as well.
> (And even then it's doubtful -- it'd just make parsing harder.)
>
It would make parsing harder no doubt, but I'm thinking of cases such as in
SPICE, where (at least for some definitions of the term "SPICE"), the first
token on a line must start in the first column, i.e.:
OK:
"R1 0 1 1k\n"
Not valid:
" R1 0 1 1k\n"
Some crusty old C preprocessors want the "#" in the first column as well.
Now, the utility of such restrictions my be dubious if you're writing a
recognizer, but maybe you're writing a validator to determine if the given
SPICE deck or C file will get through the crustiest of the crusty old
SPICEes or C preprocessors.
So having slept on it, and given the above rationale, rules like these would
make some sort of sense:
spice_resistor_declaration
: BOL name=ID WS node0=ID WS node1=ID WS value=ID WS EOL
// PS: Yeah, a SPICE deck is a virtually-unparseable atrocity of a
"language".
// Without some form of context tracking or feedback from the
parser, it's simply
// not possible for the lexer to tell a component-type-plus-name
from a node ID from a literal value.
// Welcome to my world ;-(.
;
cpp_define
: BOL '#' WS 'define' WS ID WS ( '(' WS define_param_list WS ')' )? WS
define_body WS EOL
;
So, yeah, it buys you a mess, that's for sure. Or wait, I think I see your
point, are you saying that if you're explicitly handling WS's in the parser,
the BOL buys you nothing? I think you're right, these would I think be
equivalent to the above, and require no BOL complications:
spice_resistor_declaration
: name=ID WS node0=ID WS node1=ID WS value=ID WS EOL
;
cpp_define
: '#' WS 'define' WS ID WS ( '(' WS define_param_list WS ')' )? WS
define_body WS EOL
;
>
> But I think the OP would need to explain a bit more about
> *why* they're interested in line beginnings and endings
> before we can be of more help.
>
Indeed. One thing I do know is that it would make parsing SPICE a bit more
tractable. If you could do something like this, it would be one small step
for man:
RESISTOR_ID : ^R[[:alnum:]_]+ ;
spice_resistor_declaration
: RESISTOR_ID node0=ID node1=ID value=ID
// Hey hey, welcome to 1973! We can at least tell our component
types from our node IDs and literal values now!
;
Wouldn't need explicit EOLs either.
Is there a reason why the ANTLR lexer doesn't/can't support full regexes?
--
Gary R. Van Sickle
More information about the antlr-interest
mailing list