Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:115542 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 23361 invoked from network); 21 Jul 2021 14:45:11 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 21 Jul 2021 14:45:11 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 3EA851804DA for ; Wed, 21 Jul 2021 08:10:42 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-wm1-f50.google.com (mail-wm1-f50.google.com [209.85.128.50]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 21 Jul 2021 08:10:41 -0700 (PDT) Received: by mail-wm1-f50.google.com with SMTP id y21-20020a7bc1950000b02902161fccabf1so3632702wmi.2 for ; Wed, 21 Jul 2021 08:10:41 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=craigfrancis.co.uk; s=default; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=caLeS90ZXiqt/Sg1hAreYRvx5fNZDA14spr3cxweoqA=; b=axHZVaSVzyFOQ/+0DsKlSCti9qMGLa1bfEyTfPfeCbZ8AARhb+Hhn0Ine70DaKs0Ih 0YFeobQsRHc6S+1Gd2FAlv69Tcq+YH08Ow6wGpMOC9ExKUhlrHTte7VcAMK9MWICQkYY S8PO/lsJYb/C4QfGQ6385AYl3Ntac4VgHDzv8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=caLeS90ZXiqt/Sg1hAreYRvx5fNZDA14spr3cxweoqA=; b=qn4MF7NBqq3xCta4O5rdWs4o1b8GL42gYlA0PORwq+cJrvM40KEOY3w34ghSD4g9Jr 7q+U2FebZ2oMqWePfrPt79RoQtQAnh1rpx4R1RrQCn4VVlHzKjD0zQvk904GVmDK0ud7 FosTPL8y8Ty/Um+Rr6WeP46jDBGyq4RXqf+FdBcF7HNGdR5WdtSUFixJmiVkcT0Adpx5 Iif1k0Z24rLuWUPH5E418A/wVal+3XZB6V6w0gkDgQDVf829CWWCBBR2BCEdGh3TUCgE IfUdHOj35VoC8co0Abj0RHidll2sKV+KdKDtKoIAuiMrExcs5d1M6lvZi9WGbYaNx9j6 ME7A== X-Gm-Message-State: AOAM532amOLT5Wa9Vpi7XROMnDuFIMpxCQneXpxSfxp4Tx8Ej2UU9sD/ XfX0Wfyoj5Sz5Aojsz6utAzc3w== X-Google-Smtp-Source: ABdhPJwvj9P1BIfd+/lGKJlZxlaRrxlE35Tw0X1bH0MsNcaJo6WzTMebaMFWExrLGiHqMquVB7Pz0Q== X-Received: by 2002:a05:600c:1c07:: with SMTP id j7mr4647638wms.165.1626880238584; Wed, 21 Jul 2021 08:10:38 -0700 (PDT) Received: from smtpclient.apple ([92.237.247.170]) by smtp.gmail.com with ESMTPSA id u2sm203293wmm.37.2021.07.21.08.10.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Wed, 21 Jul 2021 08:10:37 -0700 (PDT) Message-ID: <5D38B75D-9381-44C6-BD4D-86ACC17531BA@craigfrancis.co.uk> Content-Type: multipart/alternative; boundary="Apple-Mail=_C1EFC932-A018-46E8-9DE8-A3258C2D260E" Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.100.0.2.22\)) Date: Wed, 21 Jul 2021 16:10:36 +0100 In-Reply-To: Cc: PHP internals To: tyson andre References: X-Mailer: Apple Mail (2.3654.100.0.2.22) Subject: Re: [PHP-DEV] [RFC] [VOTE] is_literal From: craig@craigfrancis.co.uk (Craig Francis) --Apple-Mail=_C1EFC932-A018-46E8-9DE8-A3258C2D260E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii > On 20 Jul 2021, at 01:23, tyson andre = wrote: >=20 >> As an aside, only 4 of 23 'no' voters provided any comment as to why = they >> voted that way on the mailing list, which I feel undermines the point = of >> the Request For Comment process, with an additional 5 responding = personally >> off-list after prompting. This makes it harder (or impossible) for = points >> to be discussed and addressed. >=20 > 1. My earlier comments about static analysis, and on behavior = depending on whether opcache is enabled Thanks Tyson, Just to check I've not missed anything, are you referring to your = comments from March 2020? https://news-web.php.net/php.internals/109192 = I mentioned your email in the RFC, as I agree that developers should use = Static Analysis, but I think it's highly unlikely we will be able to get = most PHP developers to use it: https://wiki.php.net/rfc/is_literal#static_analysis = As to the OPcache, Joe's implementation already works with its = optimisations, so the results are consistent. https://wiki.php.net/rfc/is_literal#compiler_optimisations = > 2. This might prevent certain optimizations in the future. For = example, currently, 1-byte strings are all interned to save memory. > If is_literal was part of php prior to proposing that optimization, = then that optimization may be rejected. Like the OPcache optimisations, the 1-byte strings and other existing = optimisations work with Joe's implementation. I cannot comment or = predict any future optimisation work, but this argument could be made = for any change to PHP. > 3. PHP's `string` type is used both for (hopefully) valid unicode = strings and for low level operations on literal byte arrays (e.g. = cryptogrophy). > It seems really, really strange for a type system to track = trustedness for a low level primitive to track byte arrays. (the php 6 = unicode string refactoring failed) >=20 > Further, if this were to be extended in the future beyond the = original proposal (e.g. literal strings that are entirely digits are = automatically interned or marked as trusted), > this might open previously safe code acting on byte arrays to side = channel issues such as timing attacks = (https://en.wikipedia.org/wiki/Timing_attack) We're just adding a flag to say if the string was written by the = developer, which is key for identifying Injection Vulnerabilities (it's = similar to the Interned flag). With cryptography and timing attacks, the only part affected would be = during string concatenation, which has already got several optimisations = that introduce variability in timings (e.g. concatenating variables = containing 'a' with 'b' is about twice as slow than 'a' with ''). We did have a discussion about integer support, but the way = integers/floats/booleans are currently stored in memory would require a = big change to note if those values were defined in the developers PHP = source code. We also explored the idea of allowing all integers, and = while they cannot lead to an Injection Vulnerability, the consensus was = that a developer defined string is easier to understand: https://wiki.php.net/rfc/is_literal#integer_values = > 4. Internal functions and userland polyfills for those functions may = unintentionally differ significantly for the resulting taintedness, > e.g. base64_decode in userland being built up byte by byte would = end up being possibly untainted? All of these functions would return a non-literal string, as the values = were not written by the developer (as in, directly defined in their = source code). > 5. The fact that 1-byte strings are almost always interned seems like = a noticeable inconsistency (though library authors can deal with it once = they're aware of it), though for it to become an issue a library may = need to take multiple strings as input > (e.g. a contrived example`"echo -- " . $trustedPrefix . = shell_escape($notTrusted)` for $trustedPrefix of "'" (or "\n") and = $notTrusted of "; evaluate command" 1-byte strings have already been addressed in the implementation. Libraries need to take user values separately from developer defined = strings, because developers frequently make mistakes: = https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification= /mistakes.php?ts=3D4 = Especially when it comes to escaping values: = https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification= /escaping.php?ts=3D4 = In your command example, the library could copy how Parameterised = Queries work in SQL. = https://github.com/craigfrancis/php-is-literal-rfc/blob/main/examples/cli-= basic.php = > 6. Including it in core would make it harder to remove later if it = interfered with opcache or jit work, or to migrate code to alternative = interpreters for php if those were ever implemented (if frameworks were = to extensively depend on is_literal) This is why Joe was very careful with the implementation. Overall the = approach is simple, where it's just providing libraries with the key bit = of information to identify their primary source of Injection = Vulnerabilities (when a developer has incorrectly included user data). > 7. Tracking whether a string is untrusted is a definition only = suitable for a few (extremely common) formats for php. But for less = common features, even stringified integers may be a problem (e.g. binary = file formats, etc) >=20 > This is relatively minor given that php is typically used in a web = programming context with json or html or js/css output >=20 > I'd think is_interned()/intern_string() is much closer to tracking = something that corresponds with php's internals (e.g. and may allow = saving memory in long-running processes which receive duplicate strings = as input), though the 10 people who wanted fully featured trustedness = checking would probably want is_literal instead The implementation is pretty close to matching the Interned flag, but = isn't exactly the same, as the Interned flag can be unpredictable for = normal PHP developers (e.g. the chr() function), and it could be changed = in the future (e.g. Joe noted how there was a suggestion to intern keys = while JSON decoding or unserializing). > 8. Serializing and unserializing data would lose information about = trustedness of inputs, unpredictably (e.g. unserialize() in php 8.0 = interns array keys). >=20 > There's no (efficient) way to change trusted strings to untrusted = or vice versa, though there are inefficient workarounds (modifying a = byte and restoring it to stop trusting it, imploding single characters = to create a trusted string) >=20 > This may done implicitly in frameworks using APCu/memcached/redis = as a cache >=20 > (I definitely don't think the serialization data format should = track is_literal()) Agreed, serializing and unserializing (or any other modifications) does = not set the literal flag. As to providing an easy way to mark a non-literal string to a literal - = in short, we do not want to recreate the biggest flaw of Taint Checking, = where naive developers would see this as a way to mark escaped strings = as "safe", giving them a false sense of security that these strings can = now be trusted. > 9. Future refactorings, optimizations or deoptimizations (or security = fixes) to unserialize(), etc. may unexpectedly break code using = is_literal that throw instead of warn (more bug reports, discourage = users from upgrading, etc.) The implementation already includes several tests, and output from = functions like unserialize() do not set the literal flag. > 10. This RFC adds an unknown amount of future work for php-src and = PECLs to *intuitively* support mapping trusted inputs to trusted outputs = or vice versa - less commonly used or unmaintained functions may not = behave as expected for a while This shouldn't be an issue, as the flag is only being used to identify = literal strings. If new strings are being created or modified, they are = no longer considered literal strings (the flag is off by default). > 11. https://pecl.php.net/package/taint is available already for a use = case with some overlap for setups that need this Libraries need something that's available to everyone (we all make = mistakes; and the developers most likely to introduce these = vulnerabilities are unlikely to install an extension, or use Static = Analysis). And with the Taint extension in particular, it has really helped me = experiment with this concept, but that approach does give a false sense = of security, which is why this RFC did not re-create it: https://wiki.php.net/rfc/is_literal#taint_checking = > Aside: I'd have to wonder if ZSTR_IS_INTERNED (and the function to = make an interned string) would make sense to expose in a PECL as a = regular `extension` (not a `zend_extension`) if is_interned also fails. > Unlike the zend_extension for = https://www.php.net/manual/en/function.is-tainted.php , > something simple may be possible without needing the performance hit = and > future conflicts with XDebug that I assume = https://www.php.net/manual/en/function.is-tainted.php would be prone to. > (https://pecl.php.net/package/taint seems to use a separate bit to = track this. The latest release of the Taint pecl fixes XDebug = compatibility) As noted above, we couldn't expose the Interned flag directly, as it's a = little unpredictable. That's why we added a new flag - one that at first = glance might look the same, but handles the edge cases. Thanks again, Craig --Apple-Mail=_C1EFC932-A018-46E8-9DE8-A3258C2D260E--