Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43812 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 66642 invoked from network); 30 Apr 2009 12:05:23 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Apr 2009 12:05:23 -0000 Authentication-Results: pb1.pair.com smtp.mail=scottmac@php.net; spf=unknown; sender-id=unknown Authentication-Results: pb1.pair.com header.from=scottmac@php.net; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 97.107.131.220 as permitted sender) X-PHP-List-Original-Sender: scottmac@php.net X-Host-Fingerprint: 97.107.131.220 whisky.macvicar.net Linux 2.6 Received: from [97.107.131.220] ([97.107.131.220:48323] helo=whisky.macvicar.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id AD/24-42998-10499F94 for ; Thu, 30 Apr 2009 08:05:22 -0400 Received: from [10.0.0.116] (office.vbulletin.com [217.155.246.60]) by whisky.macvicar.net (Postfix) with ESMTP id E23704695A; Thu, 30 Apr 2009 08:05:15 -0400 (EDT) Message-ID: <49F993FA.2090301@php.net> Date: Thu, 30 Apr 2009 13:05:14 +0100 User-Agent: Thunderbird 2.0.0.21 (Windows/20090302) MIME-Version: 1.0 To: Dmitry Stogov CC: Matt Wilmas , internals@lists.php.net, shire@php.net References: <6604D94D40FD465F992144110B075BB5@pc1> <49F94BC6.5060904@zend.com> In-Reply-To: <49F94BC6.5060904@zend.com> Content-Type: text/plain; charset=ISO-8859-1 Content-Transfer-Encoding: 7bit X-ScottMac-MailScanner-Information: Please contact the ISP for more information X-ScottMac-MailScanner-ID: E23704695A.78AFD X-ScottMac-MailScanner: Not scanned: please contact your Internet E-Mail Service Provider for details X-ScottMac-MailScanner-From: scottmac@php.net X-Spam-Status: No Subject: Re: [PHP-DEV] Re: [PATCH] Scanner "diet" with fixes, etc. From: scottmac@php.net (Scott MacVicar) [^] is a special case to write a portable match any character in re2c. Scott Dmitry Stogov wrote: > Hi Matt, > > Does this patch fix EOF handling issues related to mmap()? (e.g. parsing > of files with size 4096, 8192, ...). Now we have two dirty fixes to > handle them correctly. > > The patch is quite big to understand it quickly. I'll probably take a > look on weekend. > > -ANY_CHAR [^\x00] > +ANY_CHAR [^] > > Is [^] a correct regular expression? > > Thanks. Dmitry. > > Matt Wilmas wrote: >> Hi Dmitry, Brian, all, >> >> Here's a scanner patch that I mentioned awhile ago, with a possible >> way to work around the re2c EOF handling issues. >> >> The primary change is to do a "manual scan" like I talked about in >> areas that match large amounts and can contain NULL bytes >> (strings/comments, which are now scanned faster too), as is done for >> inline HTML. I called it a "diet" :-) because it removes my >> complicated string regex patterns from a couple years ago, which >> doesn't make the .l file much smaller after adding the manual scan >> code (easier to understand...?), but it does result in a ~34k >> reduction of 5.3's generated .c file... >> >> This fixes Bug #46817, as well as a better, more proper fix for the >> older Bug #42767, both related to ending comments. >> >> Now inline HTML chunks aren't broken up when a tag starting with "s" >> is encountered (