Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:115505 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 59301 invoked from network); 19 Jul 2021 23:58:45 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 19 Jul 2021 23:58:45 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 0CD7A1804D1 for ; Mon, 19 Jul 2021 17:23:51 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_PASS, SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from NAM10-MW2-obe.outbound.protection.outlook.com (mail-mw2nam10olkn2075.outbound.protection.outlook.com [40.92.42.75]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 19 Jul 2021 17:23:50 -0700 (PDT) ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=FOYswoLUmInbc1UfYLlwW3xHLcFiJTdklEjd8wlSGftd7Z7oSaaABX1lPQoB3qkohmGghbkEWEQ2VSbTP9f3k5fwOKlyrke92z3DuL/IRjiecKRZZV/fQq213hzfHNrf0/z1a883iTfXOVbcJn/iFv20g8TB3OiX5Z+VrE+bkSgCrH2kAJfxnmVrBknOGKYS/tlPlxVERrbNAHFMwZ2tIXQTaJQ7r3FdhJ3FEFLpwkrOkV6VSf2aGLWVlbLSFBbmNE5rICgsWfvbYFbWWi4JxmI71njGMp7ZX/Tic/SSz7lhx0pOvFRpf/87mBfFGh1cAxAYpRiDEOczDxAMv8a+Xg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zjdWFyPLjKU1WjtT6YPKtCxP2sInKRI0iWnCIvSfIM4=; b=G6zmqUPkf1ZVlZ8SaLa5Op0UUFnNNdh4QtmFY8/GZwxkczO7AdF1H4HavnO+kBUqSiArfBeGwtFOgBMTMatwvCxF9s+xC4UkQpTv57BGC5yfkzMcyyQmluiD1u5uWeOo4784eUL6K2tuPw3eUjBzVszuRTTCW/UP+X+U06h56aAQr6cMVwe5viRG8wEDjR0c3Ni3xSJiLL1uc8w4MOO6IvzLMZGgNsW7VjTxXWQXxGdp0UUfKi2g4hISc3TkXmJhQ7O7WrBW141GuBBXBQ7zBaubbYhqA2C3O4syNBDhutEQ4IdZphNurUpySl/aS6bft+JiNdRij2KCfkF4GDYXpw== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=hotmail.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=zjdWFyPLjKU1WjtT6YPKtCxP2sInKRI0iWnCIvSfIM4=; b=C7+sj7DsbyWGIqvEMaV5Lndaw1N9hRPXWTBvHaW/AoUiwahPnxRlluAlRB0QMyNf3pIy7FWjHXB4cAbm/XA8hiH1XfpwTOOWyRNr9B4AOmF+xVBGdmpgg5wHLQyY4Mq0iiDPn7M7JFZzRBAf7ZJube8RbAv50eEE1OFo9+/yHfVzo4aiuoXw9fNvvZGJuDU7IJhSzdPrmwwEnogD7WKIYeD+WQh9un7d857l9KYbDyjOwV0tApaLmbdaU3BHN3dux4EDlb8fUpDxmIUn401TQIyN1BbcdSRQflp1+/IM3RirDd9Cts+o8yFVxO8IvcAAtRFbW3u/QcI2E3QLwhenEA== Received: from MW2NAM10FT004.eop-nam10.prod.protection.outlook.com (2a01:111:e400:7e87::53) by MW2NAM10HT118.eop-nam10.prod.protection.outlook.com (2a01:111:e400:7e87::405) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.26; Tue, 20 Jul 2021 00:23:48 +0000 Received: from BY5PR07MB6610.namprd07.prod.outlook.com (2a01:111:e400:7e87::49) by MW2NAM10FT004.mail.protection.outlook.com (2a01:111:e400:7e87::428) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4331.21 via Frontend Transport; Tue, 20 Jul 2021 00:23:47 +0000 Received: from BY5PR07MB6610.namprd07.prod.outlook.com ([fe80::1865:ec2b:cd3:4ff9]) by BY5PR07MB6610.namprd07.prod.outlook.com ([fe80::1865:ec2b:cd3:4ff9%4]) with mapi id 15.20.4331.032; Tue, 20 Jul 2021 00:23:47 +0000 To: PHP internals Thread-Topic: [PHP-DEV] [RFC] [VOTE] is_literal Thread-Index: AQHXccmyKTmd0jicgUa5JF7Uc1BfmqtKvPKAgABLO0E= Date: Tue, 20 Jul 2021 00:23:47 +0000 Message-ID: References: , In-Reply-To: Accept-Language: en-CA, en-US Content-Language: en-CA X-MS-Has-Attach: X-MS-TNEF-Correlator: x-incomingtopheadermarker: OriginalChecksum:2E1FBB7843384420C16D1C0FAD8016ED529384DB19D213EE4F2E15BD22A81C6F;UpperCasedChecksum:7C554B505F6008234B965B4EBBF0FCE65D1FAAADF728D967FA0E2C19A5F900A9;SizeAsReceived:7058;Count:44 x-ms-exchange-messagesentrepresentingtype: 1 x-tmn: [13+f905dKa9MMAg9hpdJm2ql6tZR3kPvyBCZtnRvaZ9ZaZ7Rz1BWqvu6zteY19mX] x-ms-publictraffictype: Email x-incomingheadercount: 44 x-eopattributedmessage: 0 x-ms-office365-filtering-correlation-id: 189ad022-5f4b-4895-54a7-08d94b14a46f x-ms-traffictypediagnostic: MW2NAM10HT118: x-microsoft-antispam: BCL:0; x-microsoft-antispam-message-info: qamqza0qxUAujk1KRmptyWiURYrOHCN+zGyi1QkC4RBxXqBfC9x0J+OEfqyyf8QEGMQvxxQD37ZraQtBMu4nBYgFQ00x1sLTwyxdsK4X2GheiWr9c08eRan8qff/2etZU2IK3f4zyWj9nz2HniVxHXmq0Ttr5pfj6qVH9ARasNDE2+Bza7m3tGpvzr/KTYgv9i1afSRhXT2DzfE2+GugjeY2T3QcM2w0/6nEg5uqWDZHsQ5E7+F6KvgpbJB9v12wKZn/E+qcpe5Ha2Ntq+VNKZEu9Oviupks2w+ggNoNFA1Jxg2B5UdBVl0zMuSCRglzlVc1Ei8s9NAnu/6P1Cwkeivxwc/dK+KWyQ8wCvc9ayYjE1pic+95xQij6TX8967TvPxRC4oNpzbtGmAOD/k+w3RaZdrgNLdOGvEZqOfYZUBHXEIOmc5XBxMaH/zoIWMZDjt6xzQ1lgzIG+BuEcLIDHbBqAOAZBlIwJoDXUA3rYc= x-ms-exchange-antispam-messagedata: SmccnN8rWZQbmr5b9SZbXNjrA+ZBfNH5tJ5/AjDf3aw2p6x2gU1JvrrnaH6lIPZ8tjSczAPr9BSf3mva7fI5mOdfqT4jjoq5TfsKGDZGikLYcRJkylaZ/+DzPwiTjnVx6r59P3h70kvNpK2ADEhiT6aVQ1jr1uLhqrh3M/85A9BqcEVjuCCsyHp8RPa17VFa1ZO/A1wGH/V6VT1ngPDoow== x-ms-exchange-transport-forked: True Content-Type: text/plain; charset="iso-8859-1" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-OriginatorOrg: hotmail.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-AuthSource: MW2NAM10FT004.eop-nam10.prod.protection.outlook.com X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-CrossTenant-Network-Message-Id: 189ad022-5f4b-4895-54a7-08d94b14a46f X-MS-Exchange-CrossTenant-originalarrivaltime: 20 Jul 2021 00:23:47.9364 (UTC) X-MS-Exchange-CrossTenant-fromentityheader: Internet X-MS-Exchange-CrossTenant-id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-rms-persistedconsumerorg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: MW2NAM10HT118 Subject: Re: [PHP-DEV] [RFC] [VOTE] is_literal From: tysonandre775@hotmail.com (tyson andre) Hi Craig Francis,=0A= =0A= > As an aside, only 4 of 23 'no' voters provided any comment as to why they= =0A= > voted that way on the mailing list, which I feel undermines the point of= =0A= > the Request For Comment process, with an additional 5 responding personal= ly=0A= > off-list after prompting. This makes it harder (or impossible) for points= =0A= > to be discussed and addressed.=0A= =0A= 1. My earlier comments about static analysis, and on behavior depending on = whether opcache is enabled=0A= 2. This might prevent certain optimizations in the future. For example, cur= rently, 1-byte strings are all interned to save memory.=0A= If is_literal was part of php prior to proposing that optimization, the= n that optimization may be rejected.=0A= 3. PHP's `string` type is used both for (hopefully) valid unicode strings a= nd for low level operations on literal byte arrays (e.g. cryptogrophy).=0A= It seems really, really strange for a type system to track trustedness = for a low level primitive to track byte arrays. (the php 6 unicode string r= efactoring failed)=0A= =0A= Further, if this were to be extended in the future beyond the original = proposal (e.g. literal strings that are entirely digits are automatically i= nterned or marked as trusted),=0A= this might open previously safe code acting on byte arrays to side chan= nel issues such as timing attacks (https://en.wikipedia.org/wiki/Timing_att= ack)=0A= 4. Internal functions and userland polyfills for those functions may uninte= ntionally differ significantly for the resulting taintedness,=0A= e.g. base64_decode in userland being built up byte by byte would end up= being possibly untainted?=0A= 5. The fact that 1-byte strings are almost always interned seems like a not= iceable inconsistency (though library authors can deal with it once they're= aware of it), though for it to become an issue a library may need to take = multiple strings as input=0A= (e.g. a contrived example`"echo -- " . $trustedPrefix . shell_escape($n= otTrusted)` for $trustedPrefix of "'" (or "\n") and $notTrusted of "; evalu= ate command"=0A= 6. Including it in core would make it harder to remove later if it interfer= ed with opcache or jit work, or to migrate code to alternative interpreters= for php if those were ever implemented (if frameworks were to extensively = depend on is_literal)=0A= 7. Tracking whether a string is untrusted is a definition only suitable for= a few (extremely common) formats for php. But for less common features, ev= en stringified integers may be a problem (e.g. binary file formats, etc)=0A= =0A= This is relatively minor given that php is typically used in a web prog= ramming context with json or html or js/css output=0A= =0A= I'd think is_interned()/intern_string() is much closer to tracking some= thing that corresponds with php's internals (e.g. and may allow saving memo= ry in long-running processes which receive duplicate strings as input), tho= ugh the 10 people who wanted fully featured trustedness checking would prob= ably want is_literal instead=0A= 8. Serializing and unserializing data would lose information about trustedn= ess of inputs, unpredictably (e.g. unserialize() in php 8.0 interns array k= eys).=0A= =0A= There's no (efficient) way to change trusted strings to untrusted or vi= ce versa, though there are inefficient workarounds (modifying a byte and re= storing it to stop trusting it, imploding single characters to create a tru= sted string)=0A= =0A= This may done implicitly in frameworks using APCu/memcached/redis as a = cache=0A= =0A= (I definitely don't think the serialization data format should track is= _literal())=0A= =0A= 9. Future refactorings, optimizations or deoptimizations (or security fixes= ) to unserialize(), etc. may unexpectedly break code using is_literal that = throw instead of warn (more bug reports, discourage users from upgrading, e= tc.)=0A= 10. This RFC adds an unknown amount of future work for php-src and PECLs to= *intuitively* support mapping trusted inputs to trusted outputs or vice ve= rsa - less commonly used or unmaintained functions may not behave as expect= ed for a while=0A= 11. https://pecl.php.net/package/taint is available already for a use case = with some overlap for setups that need this=0A= =0A= =0A= Aside: I'd have to wonder if ZSTR_IS_INTERNED (and the function to make an = interned string) would make sense to expose in a PECL as a regular `extensi= on` (not a `zend_extension`) if is_interned also fails.=0A= Unlike the zend_extension for https://www.php.net/manual/en/function.is-tai= nted.php ,=0A= something simple may be possible without needing the performance hit and=0A= future conflicts with XDebug that I assume https://www.php.net/manual/en/fu= nction.is-tainted.php would be prone to.=0A= (https://pecl.php.net/package/taint seems to use a separate bit to track th= is. The latest release of the Taint pecl fixes XDebug compatibility)=0A= =0A= - Other languages, such as Java, have exposed this for memory management pu= rposes (rather than security) though it's rarely used directly or in framew= orks, e.g. https://docs.oracle.com/javase/10/docs/api/java/lang/String.html= #intern()=0A= (https://docs.python.org/3/library/sys.html#sys.intern)=0A= =0A= Thanks,=0A= Tyson=