Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:35894 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 75462 invoked from network); 2 Mar 2008 23:27:21 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 2 Mar 2008 23:27:21 -0000 Authentication-Results: pb1.pair.com header.from=helly@php.net; sender-id=unknown Authentication-Results: pb1.pair.com smtp.mail=helly@php.net; spf=unknown; sender-id=unknown Received-SPF: unknown (pb1.pair.com: domain php.net does not designate 85.214.94.56 as permitted sender) X-PHP-List-Original-Sender: helly@php.net X-Host-Fingerprint: 85.214.94.56 aixcept.net Linux 2.6 Received: from [85.214.94.56] ([85.214.94.56:35755] helo=h1149922.serverkompetenz.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 5D/B6-29055-7D73BC74 for ; Sun, 02 Mar 2008 18:27:21 -0500 Received: from MBOERGER-ZRH.corp.google.com (209-222.1-85.cust.bluewin.ch [85.1.222.209]) (using TLSv1 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) by h1149922.serverkompetenz.net (Postfix) with ESMTP id BED4E11F05C; Mon, 3 Mar 2008 00:27:16 +0100 (CET) Date: Mon, 3 Mar 2008 00:26:51 +0100 Reply-To: Marcus Boerger X-Priority: 3 (Normal) Message-ID: <1642796941.20080303002651@marcus-boerger.de> To: Stanislav Malyshev CC: Marcus Boerger , internals@lists.php.net In-Reply-To: <47CB2E9D.6010102@zend.com> References: <1706278209.20080302232134@marcus-boerger.de> <47CB2E9D.6010102@zend.com> MIME-Version: 1.0 Content-Type: text/plain; charset=iso-8859-15 Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer From: helly@php.net (Marcus Boerger) Hello Stanislav, Sunday, March 2, 2008, 11:47:57 PM, you wrote: > Hi! >> be much easier, switching to re2c promises a much faster lexer. Actually, >> without any specific re2c optimizations we already get around a 20% scanner > I think 20% faster is very cool. > However, as I understand re2c is not a standard tool found everywhere. > So what happens if you wanted to use it on some exotic system where re2c > is not readily available as manintainer-supported software? Also, flex > is available on Windows for example as part of cygwin, while I don't see > re2c there. > I understand this can be of low importance since we keep generated files > in our repositories, but I think we still have to keep it in mind. > I understand also current patch requires non-release version of re2c - > maybe we should have some release version at least until we make PHP > depend on it? Well, re2c works for on a very large amount of systems, can easily be build and comes with a read to download windows executable. Furthermore all major distributions have re2c packages. Along with storing the generated files in cvs i see no issue at all in these regards. >> Current state: >> Flex has been fully replaced by re2c in Zend. We have also switched to an >> mmap-based lexer approach for now. However, we had to drop multibyte support > Were the stream support issues solved? We completely dropped multibyte support. The reason is that the way we were doing it, is that we constanlty switch between the full original and a recoded duplicate that simply ignores multibyte (or any encoding at all). Once we have finished the move to re2c, we can support all of those correctly. The multibyte support also duplicated the encoding tables otherwise available in ext/mbstring or ext/iconv or pecl/intl. >> as well as the encoding declare. The current state can be checked out from >> Scott's subversion repository [3] and you can follow the development on his >> Trac setup [4]. When you want to build php with re2c, then you need to grab >> re2c from its sourceforge subversion repository [5]. You can also check out >> the changes in a patch created Sunday 2nd March against a PHP checkout from >> 14th February [6]. >> >> Further steps: >> Commit this to PHP 5.3. Synch to HEAD. Add pecl/intl to 5.3. Discuss/recreate >> multibyte support with libintl. > Note - pecl/intl does nothing towards multibyte support etc., at least > for now. If there are voloteers to change that, it can be discussed, but > so far it is for doing entirely other things (locale-dependent > functionality mostly). Yes I know. However pecl/intl brings in a php/icu bridge which we can build on. > So, I think before re2c parser can be merged the issue with multibyte > compatibility must be solved - otherwise it will make the users that > rely on it unable to use newer PHP. As cool as 20% faster is, I think we > can't drop support for such feature, especially not in 5.3. Rely on a not supported undocumented feature? I am rather able to build php and rewrite that support. >> Commit to 5.3 between the 5th and the 15th of March. Synch to HEAD a couple >> of days later. Moving pecl/libintl to ext (depends on the 5.3 RMs decision). >> After that is done, decide about multibyte support. Along with the commit to >> the 5.3 branch there will be a new re2c version available. > I think we first need to figure out what happens to multibyte support, > and not commit anything before we have it figured out. Multibyte support > is important piece of functionality for some PHP users, and it works > now. Breaking it without providing any alternative - especially that we > have now 5.3 mostly ready for the release cycle, and solving multibyte > problems with re2c may take undefined amount of time, as far as I > understand. I do not think it would be acceptable to release 5.3 without > multibyte support, so the option here either merge it now and have 5.3 > waiting until MB is figured out, or try to figure it out before commit > and if we can't in a reasonable term, go forward with 5.3 and defer the > parser change for 5.4. > Again, while I think the speedup is great and congratulate Marcus, Nuno > and Scott on great work, I think we should keep in mind we have working > parser right now and changing it in an incompatible way is very > high-risk and should not be taken hastily. You are free to contribute and make MB support working upfront. Best regards, Marcus