Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:115009
Message-ID: <7D9709D1-2607-4CC5-8B99-A2CB5ED22E47@newclarity.net>
Content-Type: multipart/alternative;
	boundary="Apple-Mail=_572FF0EA-8A51-4AEC-8D4F-DE04D6D6D09F"
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
Date: Tue, 22 Jun 2021 05:09:50 -0400
In-Reply-To: <CAG9XoMQUUYGhA6QL3Se4FNJgydjzX0MAeR_CWr5O1SH8xgypLA@mail.gmail.com>
Cc: Derick Rethans <derick@php.net>,
 PHP Internals <internals@lists.php.net>,
 Yasuo Ohgaki <yohgaki@ohgaki.net>,
 Craig Francis <craig@craigfrancis.co.uk>
To: Benjamin Morel <benjamin.morel@gmail.com>
References: <CAFv4g+GjXqiZkA2fawt7jHDtDAKjB0rv=Lfbcobu7bn8B=wvpw@mail.gmail.com>
 <CAGa2bXbGv33zHfRpu-hN_wNcNKPpFw_ed1ZRnn96tpDf0ROcLg@mail.gmail.com>
 <0CD1762E-6094-4DEB-B1B5-22CFBDAAFF44@php.net>
 <CAG9XoMQUUYGhA6QL3Se4FNJgydjzX0MAeR_CWr5O1SH8xgypLA@mail.gmail.com>
Subject: Re: [PHP-DEV] [RFC] is_trusted - was is_literal
From: mike@newclarity.net (Mike Schinkel)

--Apple-Mail=_572FF0EA-8A51-4AEC-8D4F-DE04D6D6D09F
Content-Transfer-Encoding: quoted-printable
Content-Type: text/plain;
	charset=utf-8

> On Jun 21, 2021, at 7:18 PM, Benjamin Morel <benjamin.morel@gmail.com> =
wrote:
>=20
> On Tue, 22 Jun 2021 at 01:06, Derick Rethans <derick@php.net> wrote:
>=20
>> On 21 June 2021 23:37:56 BST, Yasuo Ohgaki <yohgaki@ohgaki.net> =
wrote:
>>>=20
>>> The name "is_trusted" is misleading.
>>> Literal is nothing but literal.
>>=20
>> I agree with this. The name is_trusted is going to be the same naming
>> mistake as "safe mode" was. Developers will put their trust in it =
that it
>> is 100% guaranteed safe.
>=20
>=20
> FWIW, agreed, too. Trusted is vague and may imply some false sense of
> security. Literal is literally what it says on the tin.


I proposed the name change originally.

For my inspiration take a look at Trusted Types API in Javascript:

https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API =
<https://developer.mozilla.org/en-US/docs/Web/API/Trusted_Types_API>

And also Trusted Types API from the W3C

https://w3c.github.io/webappsec-trusted-types/dist/spec/ =
<https://w3c.github.io/webappsec-trusted-types/dist/spec/>

The reason is_trusted() would be better than is_literal() is so that the =
function could in the future handle other trusted types that are =
definitely not literal and allow most code that using is_trusted() to =
continue to work. =20

So, for example, in the future we could add a new TrustedInterface and =
if an object implemented the trusted interface and a __ToString() method =
it could be considered trusted by code that uses the is_trusted() to =
guard against untrusted code.

Alternately if in the future PHP added a TrustedAttribute we could use =
on the __ToString() method to declare an object that could be used when =
a trusted string is required.

-----

Conversely, I am *strongly* opposed to is_literal() as proposed.  The =
current RFC proposes half a solution without first envisioning what the =
other half of the solution will look like.

If an is_literal() RFC were passed I would envision the following things =
happening:

1. Bloggers would start blogging about how all PHP users should use =
is_literal() when accepting SQL and HTML code, but won't provide any of =
the nuances about special-cases because many won't understand them, they =
will just see a new PHP feature to hype.

2. PHP listing tools will jump on is_literal() and champion its use.  =
Just look at Psalm.

3. Coding standards teams in large organizations will see that the =
listing tools are flagging non-literals and wil start requiring all SQL =
and HTML etc. code to be literals. And since they already bar the use of =
eval() there will be no way to work around the requirement when you =
actually know what you are doing.=20

And because they are in large organizations, they will be divorced from =
the developers in the trenches who 95% of the time will have no problem =
with the requirement, but 5% of the time will simply not be able to =
achieve their development needs. Unfortunately these trench developers =
will not have enough power in the organization to get the coding =
standards teams to even list to their issues, especially if they are an =
agency or contractor.

4. Library developers will similarly adopt is_literal() as a requirement =
to use their library.  While the most popular ones will likely be =
enlightened enough they will address edge cases. many in the long tail =
will not.  So lots of the long-tail PHP libraries out there =E2=80=94 =
and especially the ones only needed by a small audience so there won't =
be alternatives =E2=80=94 will not address exceptions. And for any of =
many reasons, many of those library developers will not be responsive to =
those who submit PRs to provide workarounds.

THIS is the future I do not want to see, and if is_trusted() is renamed =
back to is_literal() this is the future I predict. Maybe it won't =
happen, but I personally do not want to risk it.

The biggest problem with the RFC is it does not address how to bypass =
someone else's requirement for use of is_literal() when you *know* what =
you are doing and you absolutely need to.  Craig and I have had several =
long discussions about this, and he said he did not want to allow that =
because someone might misuse it (Craig, if I am paraphrasing =
incorrectly, please do correct me here.) =20

And I completely understand Craig's misgivings, but then he would not =
address mine.  His only solution to get around the special-case needs =
that *will* exist is to use eval(). which is a lot like saying: "So you =
are in pain?  Why not try Meth?" =20

Craig shared this talk from Google which inspired him:

https://www.youtube.com/watch?v=3DccfEu-Jj0as =
<https://www.youtube.com/watch?v=3DccfEu-Jj0as>

I would highly recommend watching this before voting on Craig's RFC.

The speaker talks about how they used "CompileTimeConstants" to reduce =
security bugs, BUT:

1. They used a @CompileTimeConstant attribute and a static checker to =
guard against unsafe strings instead of a function baked into Java. So =
convention and process over language enforcement.

https://errorprone.info/bugpattern/CompileTimeConstant =
<https://errorprone.info/bugpattern/CompileTimeConstant>=20
=
https://github.com/google/error-prone/blob/master/annotations/src/main/jav=
a/com/google/errorprone/annotations/CompileTimeConstant.java =
<https://github.com/google/error-prone/blob/master/annotations/src/main/ja=
va/com/google/errorprone/annotations/CompileTimeConstant.java>=20

2. They developed APIs that allow packaging data to be used in place of =
literals. See qb.setParameter() at 9:45.

3. They did this for code in Google's mono-repo, so they have control =
over how people will be using it, and they have the ability to address =
special cases. If PHP adds is_literal() the cat will be out the bag and =
we won't be able to address special cases easily.

4. Even Craig mentioned to me that a literal cannot be trusted in all =
contexts. An HTML literal passed to SQL is not what SQL wants, and might =
in some rare case being able to bring down a DB. If so, how then to do =
we quickly fix PHP so that can't happen? Adding a false sense of =
security does not seem like a good idea.

5. Craig complained about escaping because it can't be used universally =
without understanding use-case.  So maybe what we need instead are =
use-case specific tools that can sanitize use-case specific string =
content?  SqlSanitizer, HtmlSanitizer, etc? And/or templating systems =
built into PHP that can handle escaping  (I'm not necessarily suggesting =
these, but do think they should be considered.)

6. At ~35:45 the speaker states it is their responsibility as API =
vendors to sanitize and escape strings. He also said most data is =
user-supplied data and not literals.  So he is saying you MUST allow for =
user-supplied data, but is_literal() does not allow for that.

6. They implement a whitelist of exceptions,  because they were Google =
they based the whitelist on their internal projects.  But PHP cannot do =
that with is_literal() because it does not address exceptions.

7. He said you need a "lightweight approval process for exceptions" at =
around ~40:00. Again, including is_literal() results in an extremely =
heavyweight process for user's exceptions.

8. He mentions "Design for Reviewability" around 44:00; if there were a =
make_literal() it would be very easy to find places that need to be =
reviewed for literal exceptions.

So, IF we add is_trusted() we can plan to add other trusted data types =
later.

If we add is_literal(), we *should* map out all the issues in advance =
and make sure we know that workable solutions at least exist.  In =
addition we will need some way to create data that can pass as literal =
(but then it's not a literal, hence why "is_trusted" makes more sense.) =
Google did it with a method attribute.  Maybe PHP should too?

In summary, is_literal() is a hammer and people will us it to treat =
everything like a nail.  At least is_trusted() has the future potential =
to be treated like a screwdriver.

-Mike
P.S. If I had my druthers is_trusted() would not pass either, and we'd =
hash out the full vision for security before we go implementing =
something that could paint us into a corner forever, or worse one that =
would create a security hole that will be hard to fix.





--Apple-Mail=_572FF0EA-8A51-4AEC-8D4F-DE04D6D6D09F--