Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:98614 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 56510 invoked from network); 23 Mar 2017 23:44:15 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 23 Mar 2017 23:44:15 -0000 Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 74.125.82.53 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 74.125.82.53 mail-wm0-f53.google.com Received: from [74.125.82.53] ([74.125.82.53:37308] helo=mail-wm0-f53.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 3F/F1-40046-ECD54D85 for ; Thu, 23 Mar 2017 18:44:15 -0500 Received: by mail-wm0-f53.google.com with SMTP id n11so1134758wma.0 for ; Thu, 23 Mar 2017 16:44:14 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc; bh=6E9dW4ICozc2DpNCEnLHMsOaxyw/4cKmT9SXcz682+E=; b=YiPYWUT+pE7Z48VIFDVDr5mH5WWZebwGjpq7h13x9Mt3NYAferLIcKrDrbWWfhJmr4 gwmfbrp+e7ACEaIFk7ER6jvxXU2YYu8AiBWlfxsScM0DxIfy1+eOLEMFrHQgFPl4/F+Q i7KLgY3Gr5EH8Fa+bTTvHdpasP53XZyBGvawYkaEk+ISqvsnOvNgP/oi7DicpagV07Hi zJ+IpPf33RQ+Dannq/eFRXRdLXgVbhLCMt8vKWQv/o8maPw6vRBk+FfhXnFFZL8CXWZE 2ThUP6OC34dQW6GY3RCGZGWouH6xoAtG5hSGt+OGHwZDFWohTFz6VI6XFgt3QD3f8vk1 shOg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:in-reply-to:references:from:date :message-id:subject:to:cc; bh=6E9dW4ICozc2DpNCEnLHMsOaxyw/4cKmT9SXcz682+E=; b=nK608r/oemjoCqs9hCtNAmogxHEOFpEVnt3/6WYnCqvM8wMB/9m4BU7XcuFlPKa+Cp pMh5bl2AOpCP3zZfcCAwQC7AmMfTRwNQQ4agH34AaT3TcdqOLyaKe3frN2FjqOTgC7+h rvqISUPW/Xrd1hbEfaWYrxnuO+ytJcULFt/CoYekr3a/bHdjCHM/2msZI5MewI+QFI1L bLB5sbGZqqkU1C1Gwkha/lzahdqOZDI5H3W8RkxKjqU6XEG4LNhrHTE+aL77Bj/073/g HZdZBQ9yJH9mu8sFdG9xhKkkAoTugjxeZluImdzlKmwt0hP6zezDvJe3bhcBlQCwnkjy grdA== X-Gm-Message-State: AFeK/H2S/DsL1GZBSE1iN1jDcEDR2J+ts5i5BCPM1x19z/kVsuNlENb9yHnoCcYytsQ0WHl/bG8GJKPZjyYZUQ== X-Received: by 10.28.93.142 with SMTP id r136mr200538wmb.95.1490312652138; Thu, 23 Mar 2017 16:44:12 -0700 (PDT) MIME-Version: 1.0 Received: by 10.223.170.216 with HTTP; Thu, 23 Mar 2017 16:44:11 -0700 (PDT) In-Reply-To: References: Date: Fri, 24 Mar 2017 00:44:11 +0100 Message-ID: To: Sara Golemon Cc: =?UTF-8?Q?Jan_Tvrd=C3=ADk?= , PHP internals Content-Type: multipart/alternative; boundary=001a114719343c30f5054b6e74ad Subject: Re: [PHP-DEV] Re: TOKEN_AS_OBJECT for token_get_all() From: nikita.ppv@gmail.com (Nikita Popov) --001a114719343c30f5054b6e74ad Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable On Fri, Mar 24, 2017 at 12:33 AM, Sara Golemon wrote: > On Thu, 23 Mar 2017 18:16:31 +0100, Nikita Popov > > I'd like to add a new TOKEN_AS_OBJECT flag to token_get_all(), which > > returns an array of PhpToken objects, rather than the mix of plain > strings > > and arrays we currently have. The PhpToken class is defined as: > > > > class PhpToken { > > public $type; > > public $text; > > public $line; > > } > > > > This has been previously suggested and implemented by Rouven We=C3=9Fli= ng [1], > > I've just ported this feature to master and optimized the implementatio= n > > [2]. > > > > > Yeah, IIRC the more recent discussion ended at "Oh, and this! And > this! And this!" which eventually went nowhere (largely my ADD). IMO > there's no harm in adding this, and the class format seems entirely > reasonable. > > If I may bikeshed a TINY bit, I'd ask that all tokens return as > objects, rather than char|PhpToken similar to the current char|array > format we have. (Maybe that's in the PR, I haven't looked at either) > That's how it's currently implemented, yes. The output is PhpToken[]. > On Thu, Mar 23, 2017 at 5:25 PM, Jan Tvrd=C3=ADk wrote= : > > Regarding memory - would it be possible to return iterator instead of > array? > > > That has some hairy edges to it since the lexer isn't actually > reentrant within a given thread. Image the following: > > foreach (token_get_all($code, TOKEN_AS_ITERATOR) as $token) { > $moretokens =3D token_get_all($morecode, $whateverflags); // <--- Here > be broken dreams and crying unicorns > } > > Worse still if you have parallel iterators. > > That's probably fixable, but it's a much heavier refactor and one that > should probably have an RFC. > The lexer is reentrant as long as the lexer state is saved and restored every time the iterator is advanced -- we already need to support reentry because, for various unpleasant reasons, it is actually possible that a compilation is triggered while another is still in progress... But yes, this is definitely a much more complicated change, and one I would not consider to be particularly useful. In my experience most interesting uses of token_get_all() are not compatible with a simple iterator, or would eschew it because the iterator would in all likelihood perform worse. Nikita --001a114719343c30f5054b6e74ad--