Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:108566 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 5872 invoked from network); 14 Feb 2020 12:10:41 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 14 Feb 2020 12:10:41 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id DC1EE1804AB for ; Fri, 14 Feb 2020 02:25:16 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-lj1-f170.google.com (mail-lj1-f170.google.com [209.85.208.170]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 14 Feb 2020 02:25:16 -0800 (PST) Received: by mail-lj1-f170.google.com with SMTP id n18so10123994ljo.7 for ; Fri, 14 Feb 2020 02:25:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=28F6B+XfJcAoIjSah0EKFQ4YbyJflZ+azIwHWMKVNcM=; b=rt3RPYwYab1W5Mz0RHC/KGPAZfOSWlJHZxokhU+JCmfuuMrtbB0Ll8Un5bMFXmbxh8 ZjD9x8w+634QVcrnVdRUkChkW/qCZCWaDt+mPqATfOW7YwEZuzMPgX9OIUqRR2wcWhnG 9v7fnosvlvWUyUVPlaHJdE76f8XcTusvwIaqW3c6FXZZM9JI9CVRVqWrhuuScRkIPxbT gl7p7S2RQU70yAmNZWssRf87q6hJI/HkoklkKx6tRZIGyfhMKSX9eG+YNyYV7uXYqqCJ RQDK7B4vmqPRGdfQRjk2uLXYxQ1vf1e6GNzh/i3y/1a6+K55Ll4IEU29ZYmMPIErDK+0 fzjg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=28F6B+XfJcAoIjSah0EKFQ4YbyJflZ+azIwHWMKVNcM=; b=TxMuskCQWayYOc+U/+9ar60jrsSiEhjtwG77mmDDmlvgzY54xhH/Ty4YVjYHNthzmN DsaOYsyr43LbPFWqll1zoNYzYgNXn9+BBDjxs/ogN5D1dDlWWM98zkICr0sBmOT8gz6Y jcT4bRMogVKCamBvFqYti+qRCaTT4zxlz2ZgHjFn5R3qF++PPGbjZnr8LEFJ7uApKb1A jJLDbCY5Ai+PKyox/UtIiaJYJ8C4+xpfZMBvvYPDGPi0IuIfU4p4uJxousZWf8yrl1AD miZFkXmnWN3UVEzbMfCfGZlKjpgDHWwKv+UaWUSo9zGb71AeYc4/9cbOEag6++wqlMcV L+zg== X-Gm-Message-State: APjAAAUC7JmzoYzM7yHaIkant4lx2CwfJsVYHGxNPAl93AwRipxW9rvL nrt+Co3lxCn/5AFxqsCMhdpEiLMu5TlPgLmAkSevnlxV X-Google-Smtp-Source: APXvYqwcBiPUkYT6FAHbDaY4NiznJoUlNpSCp/1mFlTR7SA8zoAcPUL8tvZvYWkPKKPw6LcdqbSyhNOeQrydsbeLjtA= X-Received: by 2002:a2e:93c5:: with SMTP id p5mr1661996ljh.192.1581675915222; Fri, 14 Feb 2020 02:25:15 -0800 (PST) MIME-Version: 1.0 References: <466bb718-4513-4a87-81e9-295ad3983443@www.fastmail.com> In-Reply-To: Date: Fri, 14 Feb 2020 11:24:59 +0100 Message-ID: To: Larry Garfield Cc: php internals Content-Type: multipart/alternative; boundary="00000000000013a38d059e869ff0" Subject: Re: [PHP-DEV] [RFC] token_get_all() TOKEN_AS_OBJECT mode From: nikita.ppv@gmail.com (Nikita Popov) --00000000000013a38d059e869ff0 Content-Type: text/plain; charset="UTF-8" On Fri, Feb 14, 2020 at 9:48 AM Nikita Popov wrote: > On Thu, Feb 13, 2020 at 6:06 PM Larry Garfield > wrote: > >> On Thu, Feb 13, 2020, at 3:47 AM, Nikita Popov wrote: >> > Hi internals, >> > >> > This has been discussed a while ago already, now as a proper proposal: >> > https://wiki.php.net/rfc/token_as_object >> > >> > tl;dr is that it allows you to get token_get_all() output as an array of >> > PhpToken objects. This reduces memory usage, improves performance, makes >> > code more uniform and readable... What's not to like? >> > >> > An open question is whether (at least to start with) PhpToken should be >> > just a data container, or whether we want to add some helper methods to >> it. >> > If this generates too much bikeshed, I'll drop methods from the >> proposal. >> > >> > Regards, >> > Nikita >> >> I love everything about this. >> >> 1) I would agree with Nicolas that a static constructor would be better. >> I don't know about polyfilling it, but it's definitely more >> self-descriptive. >> >> 2) I'm skeptical about the methods. I can see them being useful, but >> also being bikeshed material. For instance, if you're doing annotation >> parsing then docblocks are not ignorable. They're what you're actually >> looking for. >> >> Two possible additions, feel free to ignore if they're too complicated: >> >> 1) Should it return an array of token objects, or a lazy iterable? If >> I'm only interested in certain types (eg, doc strings, classes, etc.) then >> a lazy iterable would allow me to string some filter and map operations on >> to it and use even less memory overall, since the whole tree is not in >> memory at once. >> > > I'm going to take you up on your offer and ignore this one :P Returning > tokens as an iterator is inefficient because it requires full lexer state > backups and restores for each token. Could be optimized, but I wouldn't > bother with it for this feature. I also personally have no use-case for a > lazy token stream. (It's technically sufficient for parsing, but if you > want to preserve formatting, you're going to be preserving all the tokens > anyway.) > > >> 2) Rather than provide bikesheddable methods, would it be feasible to >> take a queue from PDO and let users specify a subclass of PhpToken to fetch >> into? That way the properties are always there, but a user can attach >> whatever methods make sense for them. >> > > It would be technically feasible. If we go with a static method for > construction, then one might even say that there's reasonable expectation > that PhpToken::getAll(...) is going to return PhpToken[] and > MyPhpTokenExtension::getAll() is going to return MyPhpTokenExtension[]. > > I'm a bit apprehensive about this though, specifically because you mention > PDO... which, I think, isn't exactly a success story when it comes to this. > If we do this, then the behavior would be that the object gets created, the > properties populated, and *no constructor gets called*. The last part is > important -- when you start calling constructors and magic methods, that's > where the mess starts and you get PDO. > After thinking about this a bit more, there's a very nice solution to this: Something that's missing from the current proposal is a constructor. Right now, if code wants to insert new tokens, then those would have to be constructed by creating the object and then manually assigning properties, so we should definitely have a constructor. Once we have one, we can mark it final, and thus make the construction behavior well-defined, even if the class is extended. Nikita --00000000000013a38d059e869ff0--