Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43827 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 21046 invoked from network); 4 May 2009 07:36:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 4 May 2009 07:36:15 -0000 Authentication-Results: pb1.pair.com header.from=shire@php.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=shire@php.net; spf=unknown; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 208.43.138.18 as permitted sender) X-PHP-List-Original-Sender: shire@php.net X-Host-Fingerprint: 208.43.138.18 sizzo.org Linux 2.6 Received: from [208.43.138.18] ([208.43.138.18:43942] helo=sizzo.org) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 8A/E3-23981-FEA9EF94 for ; Mon, 04 May 2009 03:36:15 -0400 Received: from shirebook.tekrat.linksys (unknown [75.101.56.2]) (using TLSv1 with cipher DHE-RSA-CAMELLIA256-SHA (256/256 bits)) (No client certificate requested) by sizzo.org (Postfix) with ESMTPSA id 9A4B9CBE559; Mon, 4 May 2009 00:36:11 -0700 (PDT) Message-ID: <49FE9AE7.4000008@php.net> Date: Mon, 04 May 2009 00:36:07 -0700 Reply-To: shire@php.net User-Agent: Postbox 1.0b11 (Macintosh/2009041623) MIME-Version: 1.0 To: Matt Wilmas CC: internals@lists.php.net, Nuno Lopes , Lukas Kahwe Smith , Dmitry Stogov References: <6604D94D40FD465F992144110B075BB5@pc1> <9D5D4CBF-5CB1-47EC-81F4-59E3C48EEEEF@pooteeweet.org> In-Reply-To: Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] [PATCH] Scanner "diet" with fixes, etc. From: shire@php.net (shire) Hey Matt, Thanks for posting, sorry for not having a chance to reply to this sooner. Maybe couple things from the patch, >> +/* To save initial string length after scanning to first variable, CG(doc_comment_len) can be reused */ >> +#define double_quotes_scanned_len CG(doc_comment_len) >> + (minor) Maybe we should rename this var if we're going to use it for other purposes, this doesn't really save any typing. Also if we do want the define maybe we should upper case it so it's more obvious? >> + while (YYCURSOR < YYLIMIT) { >> + switch (*YYCURSOR++) { In the example above, which we have a couple examples of here, we don't obey the YYFILL macro to detect if we have exceeded our EOF. This *might* be a problem, but only really depends on if we intend to use the YYFILL as a solution for exceeding our mmap bounds. Regarding the ZEND_MMAP_AHEAD issue and the temp. fix that Dmitry put in we need to find a solution to that, perhaps I can play with that this week too as I think I'm seeing some related issues in my testing of 5.3. Essentially we abuse ZEND_MMAP_AHEAD by adding it to the file size and passing it to the mmap call which isn't at all valid and only really works up to PAGESIZE. We could possibly use YYFILL to re-allocate more space as necessary past the end of file to fix this. I don't see anything glaring in the patch that's a major issue, I can probably test more on a larger code base in the next 2-3 days. As I've said before this seems to be crossing the line of us writing a scanner by hand rather than letting re2c do the heavy lifting, but without a modification to re2c to handle EOF I don't have an alternative solution currently. (If we had some way to detect which regex we where matching against in the YYFILL that would likely be able to handle these bugs, but I didn't see a way to do that easily). -shire