Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:115009 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 29265 invoked from network); 22 Jun 2021 08:51:45 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Jun 2021 08:51:45 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 276461804C0 for ; Tue, 22 Jun 2021 02:09:55 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,GUARANTEED_100_PERCENT,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 22 Jun 2021 02:09:54 -0700 (PDT) Received: by mail-qk1-f175.google.com with SMTP id o6so2258419qkh.4 for ; Tue, 22 Jun 2021 02:09:54 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=newclarity-net.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=mSkmQq1gKoUG3X8xPhKxEUz4EQVcDwri+tEmAnAYcOc=; b=p0F/O56MbhvtaM+KN6oBkGGypUfH6a5e/msfyfKJ5hAs1AlHHuqEMZRJhQcfCJ9KEQ gPxqmBkmCsu7M3PgQQaAfySwR9jVpo4gTern0ZF53gMX3IZE4DF+gghBYgma99hSOSNa lR1rD1s2UOgXQ7oGTDKJhauLlY2t5SvYP5sgHzVvEN8GzjrGTzNE016FCT3BLNxJccfk J9mTFWcy3FvK4wqFBPFaWmKbjyU+scrKmKoqu8r/SsbJGUokfjZQxPc4e/7ZFzoFXKgU f6sYplw2uCJUJONhkEYGACQNfY5jBUkzXoZSAv9pui5pXj6z8JkcH9KaI7I1MpU5Eh54 2VIw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=mSkmQq1gKoUG3X8xPhKxEUz4EQVcDwri+tEmAnAYcOc=; b=ZFFJmVjD5AK0P89ortkUYeOAVsc8I0fAICkU+zTCiPcwduSNUUmyyhLXSaA/6dea4U bMuc36F5VAqpn+8tdxgVRaAAsOQCvNbB9KqOBHH36tR27U02e9pSO6w1p71jNDFx1VJu NyyuncaLlasa7Lt2LApymOsKz6K0C/xRvbYdb/JjJArjnPOO7I0xjvI0d4qAP+BdOKRt lvovy6EG6hwoWvqeJz0zvWPoAnYTp3RkQ2yx59tMX6UBOkOpsSCtYhnttuYGih+ihP6d OGzc5guFkwNI1NfiKSAwSX1Wff7nn4/JVG2zWr8ru0HdjDKtW+4877MOoLxY7av704Q2 xhKA== X-Gm-Message-State: AOAM530vTjDiuD8DnKGbMMFZfRKX0eHAFayB7VNUFPX0A51391AJnxh1 e9OnX6N/S5xQcrUd1gaQ5/3/6Q== X-Google-Smtp-Source: ABdhPJyfxn3plvfXX/T/KIThr1mo2rI3pv1Pg6WCcBDxhIljgm7qVnurI2es+jHaIJpHOPVYg+XFPQ== X-Received: by 2002:a37:7841:: with SMTP id t62mr3111393qkc.412.1624352992984; Tue, 22 Jun 2021 02:09:52 -0700 (PDT) Received: from [192.168.1.10] (c-24-98-254-8.hsd1.ga.comcast.net. [24.98.254.8]) by smtp.gmail.com with ESMTPSA id j13sm9219833qko.9.2021.06.22.02.09.51 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Jun 2021 02:09:52 -0700 (PDT) Message-ID: <7D9709D1-2607-4CC5-8B99-A2CB5ED22E47@newclarity.net> Content-Type: multipart/alternative; boundary="Apple-Mail=_572FF0EA-8A51-4AEC-8D4F-DE04D6D6D09F" Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Date: Tue, 22 Jun 2021 05:09:50 -0400 In-Reply-To: Cc: Derick Rethans , PHP Internals , Yasuo Ohgaki , Craig Francis To: Benjamin Morel References: <0CD1762E-6094-4DEB-B1B5-22CFBDAAFF44@php.net> X-Mailer: Apple Mail (2.3608.120.23.2.7) Subject: Re: [PHP-DEV] [RFC] is_trusted - was is_literal From: mike@newclarity.net (Mike Schinkel) --Apple-Mail=_572FF0EA-8A51-4AEC-8D4F-DE04D6D6D09F Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jun 21, 2021, at 7:18 PM, Benjamin Morel = wrote: >=20 > On Tue, 22 Jun 2021 at 01:06, Derick Rethans wrote: >=20 >> On 21 June 2021 23:37:56 BST, Yasuo Ohgaki = wrote: >>>=20 >>> The name "is_trusted" is misleading. >>> Literal is nothing but literal. >>=20 >> I agree with this. The name is_trusted is going to be the same naming >> mistake as "safe mode" was. Developers will put their trust in it = that it >> is 100% guaranteed safe. >=20 >=20 > FWIW, agreed, too. Trusted is vague and may imply some false sense of > security. Literal is literally what it says on the tin. I proposed the name change originally. For my inspiration take a look at Trusted Types API in Javascript: https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API = And also Trusted Types API from the W3C https://w3c.github.io/webappsec-trusted-types/dist/spec/ = The reason is_trusted() would be better than is_literal() is so that the = function could in the future handle other trusted types that are = definitely not literal and allow most code that using is_trusted() to = continue to work. =20 So, for example, in the future we could add a new TrustedInterface and = if an object implemented the trusted interface and a __ToString() method = it could be considered trusted by code that uses the is_trusted() to = guard against untrusted code. Alternately if in the future PHP added a TrustedAttribute we could use = on the __ToString() method to declare an object that could be used when = a trusted string is required. ----- Conversely, I am *strongly* opposed to is_literal() as proposed. The = current RFC proposes half a solution without first envisioning what the = other half of the solution will look like. If an is_literal() RFC were passed I would envision the following things = happening: 1. Bloggers would start blogging about how all PHP users should use = is_literal() when accepting SQL and HTML code, but won't provide any of = the nuances about special-cases because many won't understand them, they = will just see a new PHP feature to hype. 2. PHP listing tools will jump on is_literal() and champion its use. = Just look at Psalm. 3. Coding standards teams in large organizations will see that the = listing tools are flagging non-literals and wil start requiring all SQL = and HTML etc. code to be literals. And since they already bar the use of = eval() there will be no way to work around the requirement when you = actually know what you are doing.=20 And because they are in large organizations, they will be divorced from = the developers in the trenches who 95% of the time will have no problem = with the requirement, but 5% of the time will simply not be able to = achieve their development needs. Unfortunately these trench developers = will not have enough power in the organization to get the coding = standards teams to even list to their issues, especially if they are an = agency or contractor. 4. Library developers will similarly adopt is_literal() as a requirement = to use their library. While the most popular ones will likely be = enlightened enough they will address edge cases. many in the long tail = will not. So lots of the long-tail PHP libraries out there =E2=80=94 = and especially the ones only needed by a small audience so there won't = be alternatives =E2=80=94 will not address exceptions. And for any of = many reasons, many of those library developers will not be responsive to = those who submit PRs to provide workarounds. THIS is the future I do not want to see, and if is_trusted() is renamed = back to is_literal() this is the future I predict. Maybe it won't = happen, but I personally do not want to risk it. The biggest problem with the RFC is it does not address how to bypass = someone else's requirement for use of is_literal() when you *know* what = you are doing and you absolutely need to. Craig and I have had several = long discussions about this, and he said he did not want to allow that = because someone might misuse it (Craig, if I am paraphrasing = incorrectly, please do correct me here.) =20 And I completely understand Craig's misgivings, but then he would not = address mine. His only solution to get around the special-case needs = that *will* exist is to use eval(). which is a lot like saying: "So you = are in pain? Why not try Meth?" =20 Craig shared this talk from Google which inspired him: https://www.youtube.com/watch?v=3DccfEu-Jj0as = I would highly recommend watching this before voting on Craig's RFC. The speaker talks about how they used "CompileTimeConstants" to reduce = security bugs, BUT: 1. They used a @CompileTimeConstant attribute and a static checker to = guard against unsafe strings instead of a function baked into Java. So = convention and process over language enforcement. https://errorprone.info/bugpattern/CompileTimeConstant = =20 = https://github.com/google/error-prone/blob/master/annotations/src/main/jav= a/com/google/errorprone/annotations/CompileTimeConstant.java = =20 2. They developed APIs that allow packaging data to be used in place of = literals. See qb.setParameter() at 9:45. 3. They did this for code in Google's mono-repo, so they have control = over how people will be using it, and they have the ability to address = special cases. If PHP adds is_literal() the cat will be out the bag and = we won't be able to address special cases easily. 4. Even Craig mentioned to me that a literal cannot be trusted in all = contexts. An HTML literal passed to SQL is not what SQL wants, and might = in some rare case being able to bring down a DB. If so, how then to do = we quickly fix PHP so that can't happen? Adding a false sense of = security does not seem like a good idea. 5. Craig complained about escaping because it can't be used universally = without understanding use-case. So maybe what we need instead are = use-case specific tools that can sanitize use-case specific string = content? SqlSanitizer, HtmlSanitizer, etc? And/or templating systems = built into PHP that can handle escaping (I'm not necessarily suggesting = these, but do think they should be considered.) 6. At ~35:45 the speaker states it is their responsibility as API = vendors to sanitize and escape strings. He also said most data is = user-supplied data and not literals. So he is saying you MUST allow for = user-supplied data, but is_literal() does not allow for that. 6. They implement a whitelist of exceptions, because they were Google = they based the whitelist on their internal projects. But PHP cannot do = that with is_literal() because it does not address exceptions. 7. He said you need a "lightweight approval process for exceptions" at = around ~40:00. Again, including is_literal() results in an extremely = heavyweight process for user's exceptions. 8. He mentions "Design for Reviewability" around 44:00; if there were a = make_literal() it would be very easy to find places that need to be = reviewed for literal exceptions. So, IF we add is_trusted() we can plan to add other trusted data types = later. If we add is_literal(), we *should* map out all the issues in advance = and make sure we know that workable solutions at least exist. In = addition we will need some way to create data that can pass as literal = (but then it's not a literal, hence why "is_trusted" makes more sense.) = Google did it with a method attribute. Maybe PHP should too? In summary, is_literal() is a hammer and people will us it to treat = everything like a nail. At least is_trusted() has the future potential = to be treated like a screwdriver. -Mike P.S. If I had my druthers is_trusted() would not pass either, and we'd = hash out the full vision for security before we go implementing = something that could paint us into a corner forever, or worse one that = would create a security hole that will be hard to fix. --Apple-Mail=_572FF0EA-8A51-4AEC-8D4F-DE04D6D6D09F--