Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:55328 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 90789 invoked from network); 9 Sep 2011 08:46:41 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 9 Sep 2011 08:46:41 -0000 Authentication-Results: pb1.pair.com smtp.mail=nicolas.grekas@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=nicolas.grekas@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.213.170 as permitted sender) X-PHP-List-Original-Sender: nicolas.grekas@gmail.com X-Host-Fingerprint: 209.85.213.170 mail-yx0-f170.google.com Received: from [209.85.213.170] ([209.85.213.170:42088] helo=mail-yx0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 2B/F0-18042-072D96E4 for ; Fri, 09 Sep 2011 04:46:40 -0400 Received: by yxi13 with SMTP id 13so1089434yxi.29 for ; Fri, 09 Sep 2011 01:46:37 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=gamma; h=mime-version:sender:in-reply-to:references:from:date :x-google-sender-auth:message-id:subject:to:cc:content-type; bh=zEAGINqYAtbsOayy9HaFcVrVs6lxIV6gbkphm/K/qfw=; b=R2Tgtva1TVUyIKiSXYAFNLzUPIiOXUJJqqo/VWtVnvvios/t2EpdBJpdAV78KCsW2E NGwTkWwd6EFPr5bIObpcxFmFg0fMWDg+EilNgr1xLoT6fSL7tWoDj6Cath9JgnuXpCvW EefEArtDt4+CGGe9gHz73mHXvp7Tp4aUEjloM= Received: by 10.236.75.165 with SMTP id z25mr10246045yhd.68.1315557997692; Fri, 09 Sep 2011 01:46:37 -0700 (PDT) MIME-Version: 1.0 Sender: nicolas.grekas@gmail.com Received: by 10.236.63.201 with HTTP; Fri, 9 Sep 2011 01:46:17 -0700 (PDT) In-Reply-To: References: Date: Fri, 9 Sep 2011 10:46:17 +0200 X-Google-Sender-Auth: XCB0SUcvEItdff8KxCvyFv-IGY0 Message-ID: To: Ferenc Kovacs Cc: Nikita Popov , PHP internals Content-Type: multipart/alternative; boundary=20cf300fab3522472104ac7e373e Subject: Re: [PHP-DEV] Revert Tokenizer behavior for 5.4 From: nicolas.grekas+php@gmail.com (Nicolas Grekas) --20cf300fab3522472104ac7e373e Content-Type: text/plain; charset=ISO-8859-1 Thank you Nikita for take this subject here! On Fri, Sep 9, 2011 at 10:01, Ferenc Kovacs wrote: > don't break there but for the next ';'. You can also just count the number of semantic token after T_HALT_COMPILER (ie excluding whitespace and comments) and once you hit 3, halt. less confusing solution would be to explicitly add '(', ')' and ';' to the > result in the T_HALT_COMPILER condition before breking out of the > loop. > If you mean verifying that '(', ')' and (';' or T_CLOSE_TAG) are effectively following T_HALT_COMPILER, I think that's part of the syntax analyser's job, not tokenizer's. If you're ok with this argument, then just couting 3 tokens is really the most basic "syntax analysis" we have to do to fix the pb, don't you think? > could there be other important tokens after the __halt_compiler() > which should be present in the token_get_all() result? > Maybe the binary data itself, as a big T_INLINE_HTML for example ? Also, if token_get_all you be made binary safe, that would be very cool ! (no more eating of \x00-\x1F inside regular code) :) Nicolas --20cf300fab3522472104ac7e373e--