Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:98621 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 16298 invoked from network); 24 Mar 2017 18:35:34 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 24 Mar 2017 18:35:34 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@fleshgrinder.com; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php@fleshgrinder.com; sender-id=unknown Received-SPF: error (pb1.pair.com: domain fleshgrinder.com from 77.244.243.83 cause and error) X-PHP-List-Original-Sender: php@fleshgrinder.com X-Host-Fingerprint: 77.244.243.83 mx102.easyname.com Received: from [77.244.243.83] ([77.244.243.83:44460] helo=mx102.easyname.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 77/48-40046-4F665D85 for ; Fri, 24 Mar 2017 13:35:33 -0500 Received: from cable-81-173-135-7.netcologne.de ([81.173.135.7] helo=[192.168.178.20]) by mx.easyname.com with esmtpsa (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.84_2) (envelope-from ) id 1crU3b-0000xd-QS; Fri, 24 Mar 2017 18:35:29 +0000 Reply-To: internals@lists.php.net References: <16.06.40046.20A35D85@pb1.pair.com> To: Andrea Faulds , internals@lists.php.net, nikita.ppv@gmail.com Message-ID: <153a6a0f-1c22-6560-8bca-584e92915840@fleshgrinder.com> Date: Fri, 24 Mar 2017 19:35:16 +0100 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:45.0) Gecko/20100101 Thunderbird/45.8.0 MIME-Version: 1.0 In-Reply-To: <16.06.40046.20A35D85@pb1.pair.com> Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-DNSBL-PBLSPAMHAUS: YES Subject: Re: [PHP-DEV] Re: TOKEN_AS_OBJECT for token_get_all() From: php@fleshgrinder.com (Fleshgrinder) On 3/24/2017 4:23 PM, Andrea Faulds wrote: > Hi Nikita, > > Nikita Popov wrote: > >> I'd like to add a new TOKEN_AS_OBJECT flag to token_get_all(), which >> returns an array of PhpToken objects, rather than the mix of plain >> strings >> and arrays we currently have. The PhpToken class is defined as: >> >> class PhpToken { >> public $type; >> public $text; >> public $line; >> } > > Rather than adding a flag to token_get_all() to return objects, you > could potentially instead make an equivalent static method on PhpToken > (PhpToken::getAll() perhaps). That would avoid mixing “object-oriented” > and “procedural” styles, though I don't know if it matters. It seems > cleaner to me. > > Thanks! > I am also against adding another flag because it violates the single responsibility principle. However, adding a `getAll` method to the `PhpToken` data class also seems very wrong, because it violates the single responsibility once again. The real and proper solution is to have an actual PHP Lexer that is capable of creating a token stream. class Lexer { public function __construct(string $source); public static function fromFile(string $path): self; public function tokenize(): TokenStream; } final class TokenStream implements IteratorAggregate { public function getIterator(): Generator public function toArray(): Token[] } final class Token { // Ideally this would be an enum, but... public const OPEN_TAG; public const CLOSE_TAG; // ... // I hope these are not mutable! public int $category; // type public string $lexeme; // text public int $line; public int $column; // we don't have that :( /** @see token_name */ public function getCategoryName(): string; // We could add `is*` methods for the various categories here. public function isOpenTag(): bool; public function isCloseTag(): bool; // ... } With this in place, we're ready for the future. The `TokenStream` can easily use a generator over the internal array. Or we offer the `toArray` method only at the beginning. Users will have to call it, but that overhead is tiny, considering that we can extend upon it in the future without introducing any kind of breaking change. Obviously this could go into the namespace `PHP\Parser`, or we prefix everything with `PHP` (I'd be for the former). On a side note, going for `Php` instead of `PHP` is inconsistent with the naming that is currently dominating the PHP core: https://secure.php.net/manual/en/indexes.functions.php There are already some things that violate it, e.g. `Phar` instead of `PHAR`, but most things keep acronyms intact (`DOM`, `XML`, `PDO`, ...). Introducing even more inconsistency to PHP seems like a very bad idea to me. I know that this is considered bikeshedding by many people, but consistency is very important and we are already lacking it on too many levels as it is. -- Richard "Fleshgrinder" Fussenegger