Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:35897 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 79517 invoked from network); 2 Mar 2008 23:48:55 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 2 Mar 2008 23:48:55 -0000 Authentication-Results: pb1.pair.com header.from=alan@akbkhome.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=alan@akbkhome.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain akbkhome.com designates 202.81.246.113 as permitted sender) X-PHP-List-Original-Sender: alan@akbkhome.com X-Host-Fingerprint: 202.81.246.113 246-113.netfront.net Received: from [202.81.246.113] ([202.81.246.113:58988] helo=akbkhome.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id ED/97-29055-2EC3BC74 for ; Sun, 02 Mar 2008 18:48:52 -0500 Received: from wideboy ([192.168.0.27]) by akbkhome.com with esmtp (Exim 4.67) (envelope-from ) id 1JVxvC-0000wT-Gi; Mon, 03 Mar 2008 07:48:46 +0800 Message-ID: <47CB3CDC.8050006@akbkhome.com> Date: Mon, 03 Mar 2008 07:48:44 +0800 User-Agent: Thunderbird 2.0.0.12 (X11/20080227) MIME-Version: 1.0 To: Marcus Boerger CC: internals@lists.php.net References: <1706278209.20080302232134@marcus-boerger.de> In-Reply-To: <1706278209.20080302232134@marcus-boerger.de> Content-Type: text/plain; charset=ISO-8859-15; format=flowed Content-Transfer-Encoding: 7bit X-mailfort-sig: 371d305107eebe358ecec5da3db0a8c0 Subject: Re: [PHP-DEV] [RFC] Replace the flex-based scanner with an re2c [1] based lexer From: alan@akbkhome.com (Alan Knowles) Can you clarify the Multibyte issues: - I presume this means that it can handle ASCII/UTF8/16 etc. but will not handle things like BIG5/GB encoding in source code - this may be a bit of an issue around here.. Regards Alan Marcus Boerger wrote: > RFC: REPLACE THE FLEX-BASED SCANNER WITH AN RE2C [1] BASED LEXER > > Situation: > The current flex-based lexer depends on an outdated and unsupported flex > version. Alternatives include either updating to a newer version of flex or > using re2c, which we already use for a variety of things (serializing, pdo sql > scanning, date/time parsing). While moving towards a newer flex version would > be much easier, switching to re2c promises a much faster lexer. Actually, > without any specific re2c optimizations we already get around a 20% scanner > performance increase. Running the tests gets an overall speedup of 2%. It is > arguable whether this is enough, but re2c has more advantages. First of all, > re2c allows one to scan any type of input (ASCII, UTF-8, UTF-16, UTF-32). > Secondly, it allows for better integration with Lemon [2], which would be the > next step. And thirdly we can switch to a reentrant scanner. > > Current state: > Flex has been fully replaced by re2c in Zend. We have also switched to an > mmap-based lexer approach for now. However, we had to drop multibyte support > as well as the encoding declare. The current state can be checked out from > Scott's subversion repository [3] and you can follow the development on his > Trac setup [4]. When you want to build php with re2c, then you need to grab > re2c from its sourceforge subversion repository [5]. You can also check out > the changes in a patch created Sunday 2nd March against a PHP checkout from > 14th February [6]. > > Further steps: > Commit this to PHP 5.3. Synch to HEAD. Add pecl/intl to 5.3. Discuss/recreate > multibyte support with libintl. > > Future steps: > Replace bison with lemon in PHP 5.4 or HEAD. > > Time Frame: > Commit to 5.3 between the 5th and the 15th of March. Synch to HEAD a couple > of days later. Moving pecl/libintl to ext (depends on the 5.3 RMs decision). > After that is done, decide about multibyte support. Along with the commit to > the 5.3 branch there will be a new re2c version available. > > > Marcus Boerger > Nuno Lopes > Scott MacVicar > > > [1] http://re2c.org/ > [2] http://www.hwaci.com/sw/lemon/ > [3] svn://whisky.macvicar.net/php-re2c > [4] http://trac.macvicar.net/php-re2c/ > [5] https://re2c.svn.sourceforge.net/svnroot/re2c/trunk/re2c > [6] http://php.net/~helly/php-re2c-20080302.diff.txt > > > >