Newsgroups: php.internals
Path: news.php.net
Xref: news.php.net php.internals:114850
Content-Type: text/plain;
	charset=utf-8
Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\))
In-Reply-To: <CFB0B0B2-A646-4DB1-83B9-DEF7907EDD7E@newclarity.net>
Date: Sun, 13 Jun 2021 20:41:09 -0400
Cc: PHP internals <internals@lists.php.net>
Content-Transfer-Encoding: quoted-printable
Message-ID: <5630B850-23F4-4734-B916-D9E181B491AC@newclarity.net>
References: <CAFv4g+F-7F7o6va5wCrojKe+Uysx2Ec3H4yV_mUZYEyNm3MkdQ@mail.gmail.com>
 <D1B12C84-A0FE-4D3E-811C-DF38F1FA4469@newclarity.net>
 <CAFv4g+ES=592UbazDbkZB7AB485hu9gh+=qG-A+KWgOWD4ArVw@mail.gmail.com>
 <CFB0B0B2-A646-4DB1-83B9-DEF7907EDD7E@newclarity.net>
To: Craig Francis <craig@craigfrancis.co.uk>
Subject: Re: [PHP-DEV] [RFC] is_literal
From: mike@newclarity.net (Mike Schinkel)

> On Jun 13, 2021, at 12:51 PM, Craig Francis <craig@craigfrancis.co.uk> =
wrote:
>=20
> On Sun, 13 Jun 2021 at 07:46, Mike Schinkel <mike@newclarity.net> =
wrote:
> =20
> 1. [...] Minimally I'd ask for a separate vote on is_literal() vs. =
is_from_literal().
>=20
> It would definitely be possible to have a vote on the name.

Cool.

> 2. Regarding the sections "Non-Parameterised Values" and "Future =
Scope," if I understand correctly...
>=20
> ...you're right, we do need to support PhpMyAdmin, Adminer, systems =
like Drupal (...)

That is good to know.

> Native functions like mysqli_query() should be covered in a future =
RFC, but still only use warnings. The exact details would need to be =
confirmed, but should provide a simple way to mark certain non-literal =
strings as trusted, and still have a way to switch off those warnings =
entirely (e.g. legacy systems).

Having code that is valid for a given application throw warnings is a =
huge non-starter to me.=20

During development I believe is is a best practice to develop with =
display of errors *and* warnings to eliminate all warnings and all =
errors before moving to testing, staging and production. If warnings are =
not eliminated it is to easy for a developer to miss warnings that are =
important vs. ones that are "OK to ignore."

To illustrate, I just added a single trigger_error() to the project I am =
currently working on.  Take a look at what it does to my project:

https://www.evernote.com/l/AAV2_-KSz2RA9pcwtWo8AQJseoZYHtnD-aw

It would be impossible to program with any library that throws a warning =
and also develop with a "zero warnings" policy.

So again, having legitimate code display warnings is a full-stop "no" in =
my book.

More on this below.

> Ideally as future scope, we would also provide a way for certain =
projects to opt-in to having exceptions raised (the militant option), =
which would not be appropriate for the vast majority of developers, but =
it's the level I'd find useful for some of my projects (dealing with =
medical records and other sensitive information).
=20
Yes, when you are in full control of your source code, being able to use =
is_literal() liberally would be great.=20

BUT.  There needs to be a way to mark certain non-literal strings as =
trusted for when using Other People's Code(tm), and as I will argue in =
the rest of this reply, that way needs to be part of the RFC that also =
introduces is_literal().  Doing otherwise is like manufacturing a poison =
but not also manufacturing its antidote.

> As to how we could (in a future RFC) mark certain non-literal strings =
as trusted, I've briefly touched on how this can work in the "Future =
Scope" section (it's an idea I copied from Trusted Types in JavaScript, =
which follows the same principle). I only did a summary because the =
exact implementation would need to be discussed later (there are a few =
options, but they all rely on is_literal() being implemented first).
> =20
> First, we need libraries to use is_literal() to check inputs from =
developers (i.e. the code they didn't write), which is their main source =
of injection vulnerabilities.=20

And that is where the first problem lies.  Assuming libraries do start =
using is_literal() and there is not yet an existing way to mark a string =
as trusted, then at best-case the developer using that library will have =
to jump through hoops when they need a non-literal use-case, and =
worse-case this will mean the developer will be unable to use said =
library for their use-case.=20

And it might be the only library available, or only one "approved" by =
their company (which would be worse than the above worst case.)

Introducing is_literal() without make_literal() is like manufacturing a =
gun without a safety catch. =20

I assume that implementing make_literal() would not be a technical =
challenge, or would it?

Would there be any reason *not* to include make_literal() as part of =
your RFC?

> What goes on inside the library is entirely up to them (they know that =
they are doing). For their output, they create a stringable =
value-object, which (as future scope) could be marked as trusted for =
certain functions (like mysqli_query).
>=20
> These value-objects could be as simple as a single private property =
(value), a __construct() method that takes the libraries output, and a =
__toString() method to return it; or the library might want to be extra =
sure of its output, and use something like the Query Builder example =
linked in the "Non-Literal Values" section.

Having library developers create a stringable value-object is a nice =
idea, but you are assuming all library vendors will be enlightened =
enough to do so.=20

On one end you will have library developers who are not sufficiently =
educated on this topic (and I will remind you of the RFC's comment =
regarding education) and on the other end you will have library =
developers who are dogmatic in their belief that requiring actual =
literals is the only one true way to program (and here I will remind you =
of the RFC's comment regarding static analysis.) =20

In either extreme you will have the library vendor who has taken control =
away from the developer to use the library and still be able to =
implement their use-case.

And if you think I am being unrealistic I point you to the developers of =
the 60k plugins on WordPress.org.  You have lots of less than fully =
educated developers publishing plugins that are used by unsuspecting =
users there, and a handful of super dogmatic developers too.

> As future scope, this allows functions like mysqli_query() to simply =
check for a literal, or one of these trusted value-objects (or have =
warnings switched off entirely).

How might mysqli_query() check for a "trusted value-object" and who =
could declare a class of that nature?=20

Elaborating on ways this could be done really needs to be added to the =
RFC otherwise how are voters to know what is likely and/or even possible =
in the future?

> 3. I notice your RFC does not grant sprintf()...
>=20
> <snip>
>=20
> It's like the "String Splitting" part in the RFC, it's probably fine, =
and we can do it if there's interest; but we also need to consider what =
issues it might cause (if any). =46rom a security point of view, it's =
always best to keep something as simple as possible (and that may well =
include supporting sprintf), because "You cannot prove security. You can =
only prove insecurity".

If the RFC provides make_literal() then having sprintf() be able to =
return a literal flag would just become a nice-to-have instead of a =
must-have.

> 4. Regarding the section "WHERE IN" you argue that we should push =
everyone to use parameters properly =E2=80=94 which is a great idea in =
theory =E2=80=94 but that would have a significant breakage in existing =
sites online.
>=20
> Hopefully my answer to #2 addresses this, as you're right, that would =
definitely cause more problems than it solves. Certainly for this =
implementation I support libraries providing warnings and think those =
would be the most appropriate, rather than exceptions. (Though obviously =
libraries know their user bases best of course).

A big *NO* on warnings. Full stop.

An alternate could be (something like):

declare(relax_literal_checks=3D1). =20

While I am not a big fan of adding declare() directives, this seems like =
a good use because it would only be needed for special cases, and a =
developer could structure their PHP files such that the files needing it =
could be limited to just the code that needs to relax literal checking.

BTW, I am assuming that setting this one would have the affect of adding =
the literal flag to all strings used in the file.

> In summary of my future scope answer above, libraries could be able to =
mark their output as trusted for functions like mysqli_query(). For SQL =
it would ideally still be a literal string, but that's not always =
possible (especially with HTML), so that's where the trusted stringable =
value-objects would be used.=20

If the library vendor must develop the trusted value object then you are =
depending on an unreliable actor.=20

Further, if every library vendor gets to define their own value object =
it could make life very confusing for application developers who have to =
learn which different value objects for every different library will =
solve the "Sorry, you can't use a non literal here" warning or =
exception. =20

I can already see the avalanche of StackOverflow questions that requires =
lots of questions for the questioner, and the answer is always "It =
depends..."

So ideally there would either be "trusted value-objects" provided by PHP =
core, or they would be able to be easily identified by some consistent =
trait, such as a "as_literal()" method, or similar.

> 5. Given #2 and #4, I think your section "Support Functions" is =
missing [...] as_literal() or make_literal() or even as_unsafe_literal()
>=20
> I'm not against it, but I don't think it's needed (and with it raising =
warnings not exceptions it won=E2=80=99t be necessary to prevent things =
breaking).

Count me as one who thinks that it is absolutely, unequivocally =
essential because warnings are not a valid solution.

See the result of this simple trigger_error() again?  =
https://www.evernote.com/l/AAV2_-KSz2RA9pcwtWo8AQJseoZYHtnD-aw

> In the Example Library I made under the "Try It" section in the RFC:
>=20
> =
https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification=
/example.php?ts=3D4
>=20
> It checks for literals but also accepts a value-object allowing you to =
bypass that check (which I think is probably the best way to implement =
this at the moment). It shows how an `unsafe_value` value-object could =
be used and accepted by the example library, in the same way DB::raw() =
works in Laravel. In the future a similar setup in PHP=E2=80=99s native =
functions could accept literals or value-objects which can be trusted, =
so we shouldn=E2=80=99t need a way to lie to PHP and tell it to treat =
one as the other.

That is a great example showing how a qualified developer would attack =
the requirement.

It is also significantly more elaborate than many developers working on =
libraries will go to the trouble to implement. =20

And I don't mean that most popular libraries, I mean the long tail of =
libraries, including internal-use libraries developed within an =
organization that have to be used by others in that organization where =
internal politics do not allow the developer in need to modify the =
library written by their developer in control of the library.=20

With great power comes great responsibility, but we both know that in =
the real world power corrupts.  In my experience, anyway.

>=20
> 6. Regarding the section "Reflection API," shouldn't their be an =
is_literal()/is_from_literal() method added to the ReflectionParameter() =
and ReflectionProperty() classes?
>=20
> <snip>
> The ReflectionParameter/ReflectionProperty allows us to return their =
respective types (ReflectionType). In this case, it's saying these two =
are "string|int", which isn't the current value, just what they can =
accept.
>=20
> I think this is touching on an area that Joe mentioned yesterday, "a =
possible future where we have first class support" in a more intricate =
way. I think there is value in this, as it would allow PHP itself to =
reject non-literal values, but that should be a separate RFC, as that's =
a bit more of a language change, and will need its own discussion.

Lack of reflection support is not a deal-killer in my eyes, just a =
missing aspect to make the feature fully fleshed out.  But I think it =
could come later without causing major issues for developers in the mean =
time.

-Mike