Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:55325 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 79357 invoked from network); 9 Sep 2011 07:38:01 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Sep 2011 07:38:01 -0000 Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@googlemail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=nikita.ppv@googlemail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain googlemail.com designates 209.85.215.176 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@googlemail.com X-Host-Fingerprint: 209.85.215.176 mail-ey0-f176.google.com Received: from [209.85.215.176] ([209.85.215.176:60251] helo=mail-ey0-f176.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 20/10-13166-752C96E4 for ; Fri, 09 Sep 2011 03:38:00 -0400 Received: by eyz10 with SMTP id 10so1216551eyz.35 for ; Fri, 09 Sep 2011 00:37:57 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=googlemail.com; s=gamma; h=mime-version:date:message-id:subject:from:to:content-type; bh=UabJV7J83ho0cC6Tuy+de3cawFU73deOS5cLMjGPLBA=; b=G5XSlQ2Je+Ybg5vPNHrJaJM9FyQo8shAXPeop/mioAvVx7MWEYreaTUEpJTOA477SL ll75GTywELCWYiz3Y2sUB3FfZR9LpNpxKO0YwtAhRbLDNwJNyowpQH/O9u1CrpwK5Utd DJyvmsocs12j+JxQjNMsYF9/B9JtcBMiGsIbI= MIME-Version: 1.0 Received: by 10.14.13.79 with SMTP id a55mr168899eea.46.1315552521696; Fri, 09 Sep 2011 00:15:21 -0700 (PDT) Received: by 10.14.96.136 with HTTP; Fri, 9 Sep 2011 00:15:21 -0700 (PDT) Date: Fri, 9 Sep 2011 09:15:21 +0200 Message-ID: To: PHP internals Content-Type: text/plain; charset=ISO-8859-1 Subject: [PHP-DEV] Revert Tokenizer behavior for 5.4 From: nikita.ppv@googlemail.com (Nikita Popov) In Bug #54089 [1] a patch was applied that cuts of token_get_all() output after a T_HALT_COMPILER token. This was done because otherwise PHP would keep on lexing after that and would generate errors because of binary data (which is not valid PHP, mostly.) The problem with the patch is, that there are some tokens after T_HALT_COMPILER that are of interest, namely the '(' ')' ';'. After the patch it is impossible to get those tokens, without either relexing the code after T_HALT_COMPILER (that way you get the binary data problem back, just with much more complex code) or writing a regular expression to match it (which is really hard, as there may be any token dropped by the PHP parser in there, i.e. whitespace, comments, PHP tags). This issue was pointed out by the creator of the bug report, but was those comments were ignored for some reason. I would like this patch to be reverted on the 5.4 and trunk branches. I assume it's too late to revert it on 5.3, as it has been there for some time already. It is just counterproductive. (Alternatively one could fix token_get_all to return the (); tokens after __halt_compiler, too, but that would be hard, probably.) Nikita [1]: https://bugs.php.net/bug.php?id=54089