Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:43808 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 87754 invoked from network); 30 Apr 2009 02:50:23 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Apr 2009 02:50:23 -0000 Authentication-Results: pb1.pair.com smtp.mail=php_lists@realplain.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php_lists@realplain.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain realplain.com from 209.151.69.1 cause and error) X-PHP-List-Original-Sender: php_lists@realplain.com X-Host-Fingerprint: 209.151.69.1 liberty.vosn.net Linux 2.4/2.6 Received: from [209.151.69.1] ([209.151.69.1:56846] helo=liberty.vosn.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 82/41-15944-DE119F94 for ; Wed, 29 Apr 2009 22:50:22 -0400 Received: from 72-161-141-80.dyn.centurytel.net ([72.161.141.80]:60953 helo=pc1) by liberty.vosn.net with smtp (Exim 4.69) (envelope-from ) id 1LzMLp-0006CR-TB; Wed, 29 Apr 2009 20:50:18 -0600 Message-ID: <6604D94D40FD465F992144110B075BB5@pc1> To: Cc: "Dmitry Stogov" , Date: Wed, 29 Apr 2009 21:50:15 -0500 MIME-Version: 1.0 Content-Type: text/plain; format=flowed; charset="iso-8859-1"; reply-type=original Content-Transfer-Encoding: 7bit X-Priority: 3 X-MSMail-Priority: Normal X-Mailer: Microsoft Outlook Express 6.00.2900.5512 X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5579 X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - liberty.vosn.net X-AntiAbuse: Original Domain - lists.php.net X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - realplain.com Subject: [PATCH] Scanner "diet" with fixes, etc. From: php_lists@realplain.com ("Matt Wilmas") Hi Dmitry, Brian, all, Here's a scanner patch that I mentioned awhile ago, with a possible way to work around the re2c EOF handling issues. The primary change is to do a "manual scan" like I talked about in areas that match large amounts and can contain NULL bytes (strings/comments, which are now scanned faster too), as is done for inline HTML. I called it a "diet" :-) because it removes my complicated string regex patterns from a couple years ago, which doesn't make the .l file much smaller after adding the manual scan code (easier to understand...?), but it does result in a ~34k reduction of 5.3's generated .c file... This fixes Bug #46817, as well as a better, more proper fix for the older Bug #42767, both related to ending comments. Now inline HTML chunks aren't broken up when a tag starting with "s" is encountered (