Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:114850 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 78121 invoked from network); 14 Jun 2021 00:25:05 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 14 Jun 2021 00:25:05 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 7208E1804D8 for ; Sun, 13 Jun 2021 17:41:13 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL, SPF_HELO_NONE,SPF_NONE autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-qt1-f180.google.com (mail-qt1-f180.google.com [209.85.160.180]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 13 Jun 2021 17:41:13 -0700 (PDT) Received: by mail-qt1-f180.google.com with SMTP id o20so7313028qtr.8 for ; Sun, 13 Jun 2021 17:41:13 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=newclarity-net.20150623.gappssmtp.com; s=20150623; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=hm8XPrC2e465VBLIutzpyLKiIdFT1/WBMcEuCFIdh6k=; b=iBn7nvzvkNGx0iNzEHuJj0weMDeT7lMnkMy2EtXteWXh6M6DaP0WvGFJNJ3nopYDqz 6VWsrrUUU3Qxv0URBQEQ77IzKaIn1q+OJlbLp3PYxZBp0BPvt8qvCsx0NW6uPfb7KvdX lNt+kwRRQTNFD395IeCsJAN3Bq5K2UJjFVfNLvPykGAexXubA+ELFwlVbthJWOGB9DcD dGTPIBQxtPsK8hfkzm7s1azTkJ0ph3imdmBB+xinrM1KgyJZnY56Zc3rsZ5N0U2QY2gW gNoEMjUpdWdhiIQVP9z9SWPFsvZhfRJivgSTI0JY4wGdWuuJItbA/wnD/cK9SAv13wfQ br4w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=hm8XPrC2e465VBLIutzpyLKiIdFT1/WBMcEuCFIdh6k=; b=ZIlwBf1HlYdSzB56eyTKDIXzRGlpm/3022clHtBUGbGP0zI8uoWvmGucU3wdoRJHiO Gt6z//vFFSTsxK6rgUjXkN49meqwj63bxm47+k9G0BJljvSpw9Qjjo3NeM9jrTMGI+xu MYjU0hZyat9yaBbHDQQbqOgEQ+wbZWMGX7arRr+v++8dnTfEu9Z8zI+3dQPwftdsqAsP DPsMgW73yonbUOkOppHRGGmHKyOMAz0e1CkiNDOW3KDp23CGg1jBErs+MYdgVcbrcGS0 ZxLLZDoayvXfzUbsuYyExM07MIJI7Dt/D81ltwzzX0JayWu5hdkqNS7F8pkZyquaIJR1 G/GA== X-Gm-Message-State: AOAM5304PPGrk65J6Ob/9084vJocW+MzvuHTSJf7SFohU8DGlSEH/T1h sQdU8pyyhKrINo5sXGBcAGJVGg== X-Google-Smtp-Source: ABdhPJzyzJQZD2vzJOIRdcSvgaHLl43np8uiI248uD39yjz+QCo4Vyzt7WzOQPaYDWnxCXqEYgA+Dg== X-Received: by 2002:ac8:7b86:: with SMTP id p6mr14082015qtu.48.1623631271410; Sun, 13 Jun 2021 17:41:11 -0700 (PDT) Received: from [192.168.1.10] (c-24-98-254-8.hsd1.ga.comcast.net. [24.98.254.8]) by smtp.gmail.com with ESMTPSA id y16sm7394723qta.72.2021.06.13.17.41.10 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sun, 13 Jun 2021 17:41:11 -0700 (PDT) Content-Type: text/plain; charset=utf-8 Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) In-Reply-To: Date: Sun, 13 Jun 2021 20:41:09 -0400 Cc: PHP internals Content-Transfer-Encoding: quoted-printable Message-ID: <5630B850-23F4-4734-B916-D9E181B491AC@newclarity.net> References: To: Craig Francis X-Mailer: Apple Mail (2.3608.120.23.2.7) Subject: Re: [PHP-DEV] [RFC] is_literal From: mike@newclarity.net (Mike Schinkel) > On Jun 13, 2021, at 12:51 PM, Craig Francis = wrote: >=20 > On Sun, 13 Jun 2021 at 07:46, Mike Schinkel = wrote: > =20 > 1. [...] Minimally I'd ask for a separate vote on is_literal() vs. = is_from_literal(). >=20 > It would definitely be possible to have a vote on the name. Cool. > 2. Regarding the sections "Non-Parameterised Values" and "Future = Scope," if I understand correctly... >=20 > ...you're right, we do need to support PhpMyAdmin, Adminer, systems = like Drupal (...) That is good to know. > Native functions like mysqli_query() should be covered in a future = RFC, but still only use warnings. The exact details would need to be = confirmed, but should provide a simple way to mark certain non-literal = strings as trusted, and still have a way to switch off those warnings = entirely (e.g. legacy systems). Having code that is valid for a given application throw warnings is a = huge non-starter to me.=20 During development I believe is is a best practice to develop with = display of errors *and* warnings to eliminate all warnings and all = errors before moving to testing, staging and production. If warnings are = not eliminated it is to easy for a developer to miss warnings that are = important vs. ones that are "OK to ignore." To illustrate, I just added a single trigger_error() to the project I am = currently working on. Take a look at what it does to my project: https://www.evernote.com/l/AAV2_-KSz2RA9pcwtWo8AQJseoZYHtnD-aw It would be impossible to program with any library that throws a warning = and also develop with a "zero warnings" policy. So again, having legitimate code display warnings is a full-stop "no" in = my book. More on this below. > Ideally as future scope, we would also provide a way for certain = projects to opt-in to having exceptions raised (the militant option), = which would not be appropriate for the vast majority of developers, but = it's the level I'd find useful for some of my projects (dealing with = medical records and other sensitive information). =20 Yes, when you are in full control of your source code, being able to use = is_literal() liberally would be great.=20 BUT. There needs to be a way to mark certain non-literal strings as = trusted for when using Other People's Code(tm), and as I will argue in = the rest of this reply, that way needs to be part of the RFC that also = introduces is_literal(). Doing otherwise is like manufacturing a poison = but not also manufacturing its antidote. > As to how we could (in a future RFC) mark certain non-literal strings = as trusted, I've briefly touched on how this can work in the "Future = Scope" section (it's an idea I copied from Trusted Types in JavaScript, = which follows the same principle). I only did a summary because the = exact implementation would need to be discussed later (there are a few = options, but they all rely on is_literal() being implemented first). > =20 > First, we need libraries to use is_literal() to check inputs from = developers (i.e. the code they didn't write), which is their main source = of injection vulnerabilities.=20 And that is where the first problem lies. Assuming libraries do start = using is_literal() and there is not yet an existing way to mark a string = as trusted, then at best-case the developer using that library will have = to jump through hoops when they need a non-literal use-case, and = worse-case this will mean the developer will be unable to use said = library for their use-case.=20 And it might be the only library available, or only one "approved" by = their company (which would be worse than the above worst case.) Introducing is_literal() without make_literal() is like manufacturing a = gun without a safety catch. =20 I assume that implementing make_literal() would not be a technical = challenge, or would it? Would there be any reason *not* to include make_literal() as part of = your RFC? > What goes on inside the library is entirely up to them (they know that = they are doing). For their output, they create a stringable = value-object, which (as future scope) could be marked as trusted for = certain functions (like mysqli_query). >=20 > These value-objects could be as simple as a single private property = (value), a __construct() method that takes the libraries output, and a = __toString() method to return it; or the library might want to be extra = sure of its output, and use something like the Query Builder example = linked in the "Non-Literal Values" section. Having library developers create a stringable value-object is a nice = idea, but you are assuming all library vendors will be enlightened = enough to do so.=20 On one end you will have library developers who are not sufficiently = educated on this topic (and I will remind you of the RFC's comment = regarding education) and on the other end you will have library = developers who are dogmatic in their belief that requiring actual = literals is the only one true way to program (and here I will remind you = of the RFC's comment regarding static analysis.) =20 In either extreme you will have the library vendor who has taken control = away from the developer to use the library and still be able to = implement their use-case. And if you think I am being unrealistic I point you to the developers of = the 60k plugins on WordPress.org. You have lots of less than fully = educated developers publishing plugins that are used by unsuspecting = users there, and a handful of super dogmatic developers too. > As future scope, this allows functions like mysqli_query() to simply = check for a literal, or one of these trusted value-objects (or have = warnings switched off entirely). How might mysqli_query() check for a "trusted value-object" and who = could declare a class of that nature?=20 Elaborating on ways this could be done really needs to be added to the = RFC otherwise how are voters to know what is likely and/or even possible = in the future? > 3. I notice your RFC does not grant sprintf()... >=20 > >=20 > It's like the "String Splitting" part in the RFC, it's probably fine, = and we can do it if there's interest; but we also need to consider what = issues it might cause (if any). =46rom a security point of view, it's = always best to keep something as simple as possible (and that may well = include supporting sprintf), because "You cannot prove security. You can = only prove insecurity". If the RFC provides make_literal() then having sprintf() be able to = return a literal flag would just become a nice-to-have instead of a = must-have. > 4. Regarding the section "WHERE IN" you argue that we should push = everyone to use parameters properly =E2=80=94 which is a great idea in = theory =E2=80=94 but that would have a significant breakage in existing = sites online. >=20 > Hopefully my answer to #2 addresses this, as you're right, that would = definitely cause more problems than it solves. Certainly for this = implementation I support libraries providing warnings and think those = would be the most appropriate, rather than exceptions. (Though obviously = libraries know their user bases best of course). A big *NO* on warnings. Full stop. An alternate could be (something like): declare(relax_literal_checks=3D1). =20 While I am not a big fan of adding declare() directives, this seems like = a good use because it would only be needed for special cases, and a = developer could structure their PHP files such that the files needing it = could be limited to just the code that needs to relax literal checking. BTW, I am assuming that setting this one would have the affect of adding = the literal flag to all strings used in the file. > In summary of my future scope answer above, libraries could be able to = mark their output as trusted for functions like mysqli_query(). For SQL = it would ideally still be a literal string, but that's not always = possible (especially with HTML), so that's where the trusted stringable = value-objects would be used.=20 If the library vendor must develop the trusted value object then you are = depending on an unreliable actor.=20 Further, if every library vendor gets to define their own value object = it could make life very confusing for application developers who have to = learn which different value objects for every different library will = solve the "Sorry, you can't use a non literal here" warning or = exception. =20 I can already see the avalanche of StackOverflow questions that requires = lots of questions for the questioner, and the answer is always "It = depends..." So ideally there would either be "trusted value-objects" provided by PHP = core, or they would be able to be easily identified by some consistent = trait, such as a "as_literal()" method, or similar. > 5. Given #2 and #4, I think your section "Support Functions" is = missing [...] as_literal() or make_literal() or even as_unsafe_literal() >=20 > I'm not against it, but I don't think it's needed (and with it raising = warnings not exceptions it won=E2=80=99t be necessary to prevent things = breaking). Count me as one who thinks that it is absolutely, unequivocally = essential because warnings are not a valid solution. See the result of this simple trigger_error() again? = https://www.evernote.com/l/AAV2_-KSz2RA9pcwtWo8AQJseoZYHtnD-aw > In the Example Library I made under the "Try It" section in the RFC: >=20 > = https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification= /example.php?ts=3D4 >=20 > It checks for literals but also accepts a value-object allowing you to = bypass that check (which I think is probably the best way to implement = this at the moment). It shows how an `unsafe_value` value-object could = be used and accepted by the example library, in the same way DB::raw() = works in Laravel. In the future a similar setup in PHP=E2=80=99s native = functions could accept literals or value-objects which can be trusted, = so we shouldn=E2=80=99t need a way to lie to PHP and tell it to treat = one as the other. That is a great example showing how a qualified developer would attack = the requirement. It is also significantly more elaborate than many developers working on = libraries will go to the trouble to implement. =20 And I don't mean that most popular libraries, I mean the long tail of = libraries, including internal-use libraries developed within an = organization that have to be used by others in that organization where = internal politics do not allow the developer in need to modify the = library written by their developer in control of the library.=20 With great power comes great responsibility, but we both know that in = the real world power corrupts. In my experience, anyway. >=20 > 6. Regarding the section "Reflection API," shouldn't their be an = is_literal()/is_from_literal() method added to the ReflectionParameter() = and ReflectionProperty() classes? >=20 > > The ReflectionParameter/ReflectionProperty allows us to return their = respective types (ReflectionType). In this case, it's saying these two = are "string|int", which isn't the current value, just what they can = accept. >=20 > I think this is touching on an area that Joe mentioned yesterday, "a = possible future where we have first class support" in a more intricate = way. I think there is value in this, as it would allow PHP itself to = reject non-literal values, but that should be a separate RFC, as that's = a bit more of a language change, and will need its own discussion. Lack of reflection support is not a deal-killer in my eyes, just a = missing aspect to make the feature fully fleshed out. But I think it = could come later without causing major issues for developers in the mean = time. -Mike