Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:55326 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 83705 invoked from network); 9 Sep 2011 08:08:09 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Sep 2011 08:08:09 -0000 Authentication-Results: pb1.pair.com header.from=tyra3l@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=tyra3l@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.83.53 as permitted sender) X-PHP-List-Original-Sender: tyra3l@gmail.com X-Host-Fingerprint: 74.125.83.53 mail-gw0-f53.google.com Received: from [74.125.83.53] ([74.125.83.53:60229] helo=mail-gw0-f53.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 61/00-18042-669C96E4 for ; Fri, 09 Sep 2011 04:08:08 -0400 Received: by gwj20 with SMTP id 20so1020230gwj.12 for ; Fri, 09 Sep 2011 01:08:04 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type:content-transfer-encoding; bh=PMbaKT2ZUWwv+m+FXtDNlblzJ3YfpMUS3LySsK6DgPs=; b=xM3lGT/K84xZ25oJEWNtDyy7+QDXa2VbUVUwyb23Sax0C9Pc/do/R90SzUaIa8AN2e T1G31hmwS+wdJvZIhHT5iB0QjPPuqfDqc02mIrt5/GayIz8jJXH4nj1SfMyjW0kHtNvJ O/mdrg8S9sTQe9eqahF9RWY10+LZnPam9rxmI= MIME-Version: 1.0 Received: by 10.147.17.13 with SMTP id u13mr1529174yai.31.1315555320001; Fri, 09 Sep 2011 01:02:00 -0700 (PDT) Received: by 10.147.168.14 with HTTP; Fri, 9 Sep 2011 01:01:59 -0700 (PDT) In-Reply-To: References: Date: Fri, 9 Sep 2011 10:01:59 +0200 Message-ID: To: Nikita Popov Cc: PHP internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Revert Tokenizer behavior for 5.4 From: tyra3l@gmail.com (Ferenc Kovacs) On Fri, Sep 9, 2011 at 9:15 AM, Nikita Popov wr= ote: > In Bug #54089 [1] a patch was applied that cuts of token_get_all() > output after a T_HALT_COMPILER token. This was done because otherwise > PHP would keep on lexing after that and would generate errors because > of binary data (which is not valid PHP, mostly.) > > The problem with the patch is, that there are some tokens after > T_HALT_COMPILER that are of interest, namely the '(' ')' ';'. After > the patch it is impossible to get those tokens, without either > relexing the code after T_HALT_COMPILER (that way you get the binary > data problem back, just with much more complex code) or writing a > regular expression to match it (which is really hard, as there may be > any token dropped by the PHP parser in there, i.e. whitespace, > comments, PHP tags). > > This issue was pointed out by the creator of the bug report, but was > those comments were ignored for some reason. > > I would like this patch to be reverted on the 5.4 and trunk branches. > I assume it's too late to revert it on 5.3, as it has been there for > some time already. It is just counterproductive. (Alternatively one > could fix token_get_all to return the (); tokens after > __halt_compiler, too, but that would be hard, probably.) I think that it wouldn't be too hard. From a quick glance on the code, we should introduce a new local variable, set that to true where we break now ( http://svn.php.net/viewvc/php/php-src/trunk/ext/tokenizer/tokenizer.c?view= =3Dmarkup#l155 ) and don't break there but for the next ';'. another maybe less confusing solution would be to explicitly add '(', ')' and ';' to the result in the T_HALT_COMPILER condition before breking out of the loop. I will create a patch for this afternoon. or could there be other important tokens after the __halt_compiler() which should be present in the token_get_all() result? --=20 Ferenc Kov=C3=A1cs @Tyr43l - http://tyrael.hu