Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43839 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 25619 invoked from network); 4 May 2009 19:25:06 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 May 2009 19:25:06 -0000 Authentication-Results: pb1.pair.com smtp.mail=shire@php.net; spf=unknown; sender-id=unknown Authentication-Results: pb1.pair.com header.from=shire@php.net; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 208.43.138.18 as permitted sender) X-PHP-List-Original-Sender: shire@php.net X-Host-Fingerprint: 208.43.138.18 sizzo.org Linux 2.6 Received: from [208.43.138.18] ([208.43.138.18:48940] helo=sizzo.org) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 09/9C-57065-1114FF94 for ; Mon, 04 May 2009 15:25:06 -0400 Received: from shirebook.local (outbound500a.pasd.tfbnw.net [204.15.21.171]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by sizzo.org (Postfix) with ESMTPSA id 9D11ECBE617; Mon, 4 May 2009 12:25:02 -0700 (PDT) Message-ID: <49FF410B.2030807@php.net> Date: Mon, 04 May 2009 12:24:59 -0700 Reply-To: shire@php.net User-Agent: Postbox 1.0b11 (Macintosh/2009041623) MIME-Version: 1.0 To: Matt Wilmas CC: internals@lists.php.net, Nuno Lopes , Lukas Kahwe Smith , Dmitry Stogov References: <6604D94D40FD465F992144110B075BB5@pc1> <9D5D4CBF-5CB1-47EC-81F4-59E3C48EEEEF@pooteeweet.org> <49FE9AE7.4000008@php.net> <6886527DD6D44DFAAC29221BAE374B07@pc1> In-Reply-To: <6886527DD6D44DFAAC29221BAE374B07@pc1> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc. From: shire@php.net (shire) Hey Matt, Matt Wilmas wrote: >>>> +/* To save initial string length after scanning to first variable, >>>> CG(doc_comment_len) can be reused */ >>>> +#define double_quotes_scanned_len CG(doc_comment_len) >>>> + >> >> (minor) Maybe we should rename this var if we're going to use it for >> other >> purposes, this doesn't really save any typing. Also if we do want the >> define maybe we should upper case it so it's more obvious? > > Yeah, I tried to think of other ways to do it, but just left it trying > to look like another variable (not to save typing). Well, it can easily > be changed later if a "cleaner" way is decided... > Yeah I would just prefer if it was more obvious that it is *not* a variable ;-) >>>> + while (YYCURSOR < YYLIMIT) { >>>> + switch (*YYCURSOR++) { >> >> In the example above, which we have a couple examples of here, we don't >> obey the YYFILL macro to detect if we have exceeded our EOF. This >> *might* be a problem, but only really depends on if we intend to use the >> YYFILL as a solution for exceeding our mmap bounds. > > I don't understand what the problem might be? The YYCURSOR < YYLIMIT > check is what the YYFILL has been doing. If you mean after changes > later, as long as the the whole thing is mmap()'d (which I'm assuming > would be the case?), it just "looks" like a standard string, with > terminating '\0', right? And there's no reading past YYLIMIT. Sorry yeah this wouldn't be a problem currently, but only if we try to fix the mmap issue by using YYFILL to realloc more space into the buffer. Then that macro would change to something more complicated. (per my previous replies with Arnaud) Have you considered using the lexer STATES and regex's instead of the manual C code for scanning the rest. It seems like if we have a one-char regex match for what the C code is doing we could handle this in the lexer without a lot of manual intervention (need to look at it more, just a thought I had earlier, the expressions are clearer now with your patch applied) ;-) -shire