Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:90191 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 9641 invoked from network); 6 Jan 2016 09:43:41 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 6 Jan 2016 09:43:41 -0000 Authentication-Results: pb1.pair.com smtp.mail=nikita.ppv@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=nikita.ppv@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.160.177 as permitted sender) X-PHP-List-Original-Sender: nikita.ppv@gmail.com X-Host-Fingerprint: 209.85.160.177 mail-yk0-f177.google.com Received: from [209.85.160.177] ([209.85.160.177:33030] helo=mail-yk0-f177.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 0D/4E-21755-CC1EC865 for ; Wed, 06 Jan 2016 04:43:40 -0500 Received: by mail-yk0-f177.google.com with SMTP id k129so287777679yke.0 for ; Wed, 06 Jan 2016 01:43:40 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=2pYTR9W2TMZ772ibdjWu4rtUzeVg0RWx+rnxm6K4VtA=; b=JUb0bUrSfwtwAbHemC1S/KzCHlIWIcYI646T7myoiT/ZY2Tn1kSYRkj/F4zsmbVjiU zDDfS7A/CjueugOnPPC4TCWklrKg5YUctqp7XCY04f3S8AennpZPwhO7PozCONvucvRQ evgPH6mrR7AroCehrzc4q/JhxFqI549Pn4c6jOhy8xoa/SlqBKoBEwM/x3YObFZ0grwc paAsMlSJw42qOIqR75HFzBgv7pm8Yr/8UO9rhnUpH/te7wac7jb28Kk+tnJpM/IXAehf olBtq20+pypRenFDrBJx/czvrDgoccjPy6qPVECiqzJj4T6wrxzU6pq+Y6olft/oah8Y ZGxQ== MIME-Version: 1.0 X-Received: by 10.13.211.198 with SMTP id v189mr51671316ywd.298.1452073417710; Wed, 06 Jan 2016 01:43:37 -0800 (PST) Received: by 10.129.148.70 with HTTP; Wed, 6 Jan 2016 01:43:37 -0800 (PST) In-Reply-To: References: Date: Wed, 6 Jan 2016 10:43:37 +0100 Message-ID: To: Sara Golemon Cc: PHP internals Content-Type: multipart/alternative; boundary=001a114da6164029350528a73063 Subject: Re: [PHP-DEV] [RFC] Normalize token_get_all() output (with flag) From: nikita.ppv@gmail.com (Nikita Popov) --001a114da6164029350528a73063 Content-Type: text/plain; charset=UTF-8 On Tue, Jan 5, 2016 at 8:45 PM, Sara Golemon wrote: > On Tue, Jan 5, 2016 at 6:16 AM, Nikita Popov wrote: > > Would be nice if someone could come up with a more explicit name for the > > flag. TOKEN_FULL is not obvious, at least to me. TOKEN_ALWAYS_ARRAY? > > > Yeah, I'm not a huge fan of the name either, but I couldn't come up > with anything better at the time. > > Maybe TOKEN_ASSOC? Since it provides associative array elements (as > opposed to the current indexed array behavior) > I like that one. > > I'd also like to have a flag TOKEN_NO_LINENOS with deduplication of token > > arrays, but that's a separate matter... > > > Not sure what you're suggesting here. Can you elaborate? > Basically: token_get_all() is rather slow. I think it says something that getting the tokens of a script is about as slow as lexing it, parsing it into an internal AST and constructing an object-based userland AST for it. If you use token_get_all() in a matter that only requires one lookahead token at a time, you don't really care about how nice the token format is, you're only interested in it being efficient. I was hoping that we can optimize it by dropping the line numbers (which is the most volatile part of the structure) and try to reuse the same array for tokens which have the same ID and content (but likely different lineno). It's very likely that a script contains the T_WHITESPACE( ) token more than one and similarly labels and variables tend to repeat, etc. No idea if that would actually work/help, just an idea. Nikita --001a114da6164029350528a73063--