Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:43813
Mailing-List: contact internals-help@lists.php.net; run by ezmlm
Received-SPF: pass (pb1.pair.com: domain googlemail.com designates 209.85.220.220 as permitted sender)
DomainKey-Status: bad
DomainKey-Signature: a=rsa-sha1; c=nofws;
        d=googlemail.com; s=gamma;
        h=mime-version:reply-to:in-reply-to:references:from:date:message-id
         :subject:to:cc:content-type:content-transfer-encoding;
        b=JE96LkENzf4/fryN9tYWl20Aad4D8GsYg3KkSvf8d/9deqtvySyg6jYhPkl7Rn5TX9
         1J1svcJgSZUllcckuJaFXfnNVeup1ar9ET99aXJnHiVPCXDrx9D89SveQ7YaTBFkyich
         5ab0oT0puOEmathw5BAO0/oxUlywCVQGnz8vA=
MIME-Version: 1.0
Reply-To: RQuadling@googlemail.com
In-Reply-To: <49F993FA.2090301@php.net>
References: <6604D94D40FD465F992144110B075BB5@pc1> <49F94BC6.5060904@zend.com> 
	<49F993FA.2090301@php.net>
Date: Thu, 30 Apr 2009 13:15:45 +0100
Message-ID: <10845a340904300515k62fe7dbes4e22b318c61be140@mail.gmail.com>
To: Scott MacVicar <scottmac@php.net>
Cc: Dmitry Stogov <dmitry@zend.com>, Matt Wilmas <php_lists@realplain.com>, internals@lists.php.net, 
	shire@php.net
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
Subject: Re: [PHP-DEV] Re: [PATCH] Scanner "diet" with fixes, etc.
From: rquadling@googlemail.com (Richard Quadling)

2009/4/30 Scott MacVicar <scottmac@php.net>:
> [^] is a special case to write a portable match any character in re2c.
>
> Scott
>
> Dmitry Stogov wrote:
>> Hi Matt,
>>
>> Does this patch fix EOF handling issues related to mmap()? (e.g. parsing
>> of files with size 4096, 8192, ...). Now we have two dirty fixes to
>> handle them correctly.
>>
>> The patch is quite big to understand it quickly. I'll probably take a
>> look on weekend.
>>
>> -ANY_CHAR [^\x00]
>> +ANY_CHAR [^]
>>
>> Is [^] a correct regular expression?
>>
>> Thanks. Dmitry.
>>
>> Matt Wilmas wrote:
>>> Hi Dmitry, Brian, all,
>>>
>>> Here's a scanner patch that I mentioned awhile ago, with a possible
>>> way to work around the re2c EOF handling issues.
>>>
>>> The primary change is to do a "manual scan" like I talked about in
>>> areas that match large amounts and can contain NULL bytes
>>> (strings/comments, which are now scanned faster too), as is done for
>>> inline HTML. =C2=A0I called it a "diet" :-) because it removes my
>>> complicated string regex patterns from a couple years ago, which
>>> doesn't make the .l file much smaller after adding the manual scan
>>> code (easier to understand...?), but it does result in a ~34k
>>> reduction of 5.3's generated .c file...
>>>
>>> This fixes Bug #46817, as well as a better, more proper fix for the
>>> older Bug #42767, both related to ending comments.
>>>
>>> Now inline HTML chunks aren't broken up when a tag starting with "s"
>>> is encountered (<script> for JS, <span>, etc.), since it's unlikely to
>>> be a long PHP <script> tag.
>>>
>>> If an opening PHP <SCRIPT> tag was used with a capital "S", it was
>>> missed if it wasn't the first thing scanned:
>>>
>>> var_dump(token_get_all("HTML... <SCRIPT language=3Dphp>"));
>>>
>>> Single-line comments with a Windows newline didn't include the full \r\=
n:
>>>
>>> var_dump(token_get_all("<?php // Comment\r\n?>"));
>>>
>>> Finally, part of the optimized scanning is that, for double quoted
>>> strings, when the first variable is encountered (making it
>>> non-constant), the amount that's been scanned up to that point is
>>> remembered, which can then be skipped over (up to the variable) after
>>> returning the quote token. Previously that initial part of the string
>>> was rescanned -- the cost dependent on how far "into" the string the
>>> first var is.
>>>
>>>
>>> I think that's about all -- =C2=A0I'll send another message if I forgot=
 to
>>> mention anything... =C2=A0Just wanted to send this along quick for to y=
ou
>>> guys to look at or whatever. =C2=A0It was basically done last week, I j=
ust
>>> had to do a couple finishing touches and verify that everything was OK.
>>>
>>> http://realplain.com/php/scanner_diet.diff (Merged changes, but didn't
>>> test yet.)
>>> http://realplain.com/php/scanner_diet_5_3.diff
>>>
>>>
>>> Thanks,
>>> Matt
>>
>
> --
> PHP Internals - PHP Runtime Development Mailing List
> To unsubscribe, visit: http://www.php.net/unsub.php
>
>

Aha - bottom of section at http://re2c.org/manual.html#lbAJ



--=20
-----
Richard Quadling
Zend Certified Engineer : http://zend.com/zce.php?c=3DZEND002498&r=3D213474=
731
"Standing on the shoulders of some very clever giants!"