Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43813 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 68377 invoked from network); 30 Apr 2009 12:16:11 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Apr 2009 12:16:11 -0000 Authentication-Results: pb1.pair.com header.from=rquadling@googlemail.com; sender-id=pass; domainkeys=bad Authentication-Results: pb1.pair.com smtp.mail=rquadling@googlemail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain googlemail.com designates 209.85.220.220 as permitted sender) DomainKey-Status: bad X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: rquadling@googlemail.com X-Host-Fingerprint: 209.85.220.220 mail-fx0-f220.google.com Received: from [209.85.220.220] ([209.85.220.220:59878] helo=mail-fx0-f220.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 66/84-42998-98699F94 for ; Thu, 30 Apr 2009 08:16:10 -0400 Received: by fxm20 with SMTP id 20so1788773fxm.23 for ; Thu, 30 Apr 2009 05:16:06 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=domainkey-signature:mime-version:received:reply-to:in-reply-to :references:from:date:message-id:subject:to:cc:content-type :content-transfer-encoding; bh=DFrnvc2jN4nLm5JLibNPS1kTO4LavD85xO9Jj7betps=; b=hwoVVhLXMmKb4B84I43HGPdHkqv377peoxPatB8HcTyQjAPYojm3AclGcVNmzOMWPc Irk7TDfKTbSY4ImfbRo6l1A2iRaYOIk4D+4QqPHLkdSmJiTc8bWIdRb921AZwPpYpuAZ aGPCipy/R6AND1CsGeig/MYQ2gZmTOvLkRhiI= DomainKey-Signature: a=rsa-sha1; c=nofws; d=googlemail.com; s=gamma; h=mime-version:reply-to:in-reply-to:references:from:date:message-id :subject:to:cc:content-type:content-transfer-encoding; b=JE96LkENzf4/fryN9tYWl20Aad4D8GsYg3KkSvf8d/9deqtvySyg6jYhPkl7Rn5TX9 1J1svcJgSZUllcckuJaFXfnNVeup1ar9ET99aXJnHiVPCXDrx9D89SveQ7YaTBFkyich 5ab0oT0puOEmathw5BAO0/oxUlywCVQGnz8vA= MIME-Version: 1.0 Received: by 10.223.104.74 with SMTP id n10mr710559fao.5.1241093765080; Thu, 30 Apr 2009 05:16:05 -0700 (PDT) Reply-To: RQuadling@googlemail.com In-Reply-To: <49F993FA.2090301@php.net> References: <6604D94D40FD465F992144110B075BB5@pc1> <49F94BC6.5060904@zend.com> <49F993FA.2090301@php.net> Date: Thu, 30 Apr 2009 13:15:45 +0100 Message-ID: <10845a340904300515k62fe7dbes4e22b318c61be140@mail.gmail.com> To: Scott MacVicar Cc: Dmitry Stogov , Matt Wilmas , internals@lists.php.net, shire@php.net Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Re: [PATCH] Scanner "diet" with fixes, etc. From: rquadling@googlemail.com (Richard Quadling) 2009/4/30 Scott MacVicar : > [^] is a special case to write a portable match any character in re2c. > > Scott > > Dmitry Stogov wrote: >> Hi Matt, >> >> Does this patch fix EOF handling issues related to mmap()? (e.g. parsing >> of files with size 4096, 8192, ...). Now we have two dirty fixes to >> handle them correctly. >> >> The patch is quite big to understand it quickly. I'll probably take a >> look on weekend. >> >> -ANY_CHAR [^\x00] >> +ANY_CHAR [^] >> >> Is [^] a correct regular expression? >> >> Thanks. Dmitry. >> >> Matt Wilmas wrote: >>> Hi Dmitry, Brian, all, >>> >>> Here's a scanner patch that I mentioned awhile ago, with a possible >>> way to work around the re2c EOF handling issues. >>> >>> The primary change is to do a "manual scan" like I talked about in >>> areas that match large amounts and can contain NULL bytes >>> (strings/comments, which are now scanned faster too), as is done for >>> inline HTML. =C2=A0I called it a "diet" :-) because it removes my >>> complicated string regex patterns from a couple years ago, which >>> doesn't make the .l file much smaller after adding the manual scan >>> code (easier to understand...?), but it does result in a ~34k >>> reduction of 5.3's generated .c file... >>> >>> This fixes Bug #46817, as well as a better, more proper fix for the >>> older Bug #42767, both related to ending comments. >>> >>> Now inline HTML chunks aren't broken up when a tag starting with "s" >>> is encountered (