Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43810 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 12210 invoked from network); 30 Apr 2009 06:57:18 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Apr 2009 06:57:18 -0000 Authentication-Results: pb1.pair.com smtp.mail=dmitry@zend.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=dmitry@zend.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain zend.com designates 212.25.124.185 as permitted sender) X-PHP-List-Original-Sender: dmitry@zend.com X-Host-Fingerprint: 212.25.124.185 il-mr1.zend.com Received: from [212.25.124.185] ([212.25.124.185:53604] helo=il-mr1.zend.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id E9/05-15944-BCB49F94 for ; Thu, 30 Apr 2009 02:57:16 -0400 Received: from il-gw1.zend.com (unknown [10.1.1.21]) by il-mr1.zend.com (Postfix) with ESMTP id 0F4A1504D7; Thu, 30 Apr 2009 10:39:54 +0300 (IDT) Received: from ws.home ([10.1.10.8]) by il-gw1.zend.com with Microsoft SMTPSVC(6.0.3790.3959); Thu, 30 Apr 2009 09:57:07 +0300 Message-ID: <49F94BC6.5060904@zend.com> Date: Thu, 30 Apr 2009 10:57:10 +0400 User-Agent: Thunderbird 2.0.0.21 (X11/20090320) MIME-Version: 1.0 To: Matt Wilmas CC: internals@lists.php.net, shire@php.net References: <6604D94D40FD465F992144110B075BB5@pc1> In-Reply-To: <6604D94D40FD465F992144110B075BB5@pc1> Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-OriginalArrivalTime: 30 Apr 2009 06:57:08.0072 (UTC) FILETIME=[E0B43680:01C9C960] Subject: Re: [PATCH] Scanner "diet" with fixes, etc. From: dmitry@zend.com (Dmitry Stogov) Hi Matt, Does this patch fix EOF handling issues related to mmap()? (e.g. parsing of files with size 4096, 8192, ...). Now we have two dirty fixes to handle them correctly. The patch is quite big to understand it quickly. I'll probably take a look on weekend. -ANY_CHAR [^\x00] +ANY_CHAR [^] Is [^] a correct regular expression? Thanks. Dmitry. Matt Wilmas wrote: > Hi Dmitry, Brian, all, > > Here's a scanner patch that I mentioned awhile ago, with a possible way > to work around the re2c EOF handling issues. > > The primary change is to do a "manual scan" like I talked about in areas > that match large amounts and can contain NULL bytes (strings/comments, > which are now scanned faster too), as is done for inline HTML. I called > it a "diet" :-) because it removes my complicated string regex patterns > from a couple years ago, which doesn't make the .l file much smaller > after adding the manual scan code (easier to understand...?), but it > does result in a ~34k reduction of 5.3's generated .c file... > > This fixes Bug #46817, as well as a better, more proper fix for the > older Bug #42767, both related to ending comments. > > Now inline HTML chunks aren't broken up when a tag starting with "s" is > encountered (