Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:114842 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 40988 invoked from network); 13 Jun 2021 16:35:46 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 13 Jun 2021 16:35:46 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 75CFB1804C0 for ; Sun, 13 Jun 2021 09:51:49 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-lf1-f47.google.com (mail-lf1-f47.google.com [209.85.167.47]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 13 Jun 2021 09:51:49 -0700 (PDT) Received: by mail-lf1-f47.google.com with SMTP id h4so2121565lfu.8 for ; Sun, 13 Jun 2021 09:51:48 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=craigfrancis.co.uk; s=default; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=r6RBSfaQWvEDEoQjYNHQX2AbQnHkQboWyv6SIFnMF8U=; b=aXzbZZBy/KBGA8fDkVSkk5RtIBzekxWLh38gzeqA1vUZA4iasBNKRbFHvkDTwVsNth V6uEjDd2Krj8eC0DcXHBzlgycXyeYTBAKyhjNxglFPD0d7JqoQutq2EW3/fumu2xjdu3 HJBuyEl6nXCZxyz/RH6BVNy/+0tfPq4GqGSOA= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=r6RBSfaQWvEDEoQjYNHQX2AbQnHkQboWyv6SIFnMF8U=; b=kemmccuJYzQJFmfTARI93TYeL5TPCtm1uva6N7aOoO0PV+FdPKkiqM3VIWKD2YuKqJ zgVz7syT+gRwSLEQywykkD78UB/nRqu4gFSLcCMUNHKTGBFSuw8NlP1ugr4Al1vq/Ygz Zl4ygTXrrA5ISOczGPINGJ4vxtr8ZLG28ekNqlRspYyubqPNzcxzvyVWUrxtgauBT2K/ alEEQ8KL0Cuw+PFnvRdsTzpYKwXmMHEnt21c/jmPjwzyiuR5BWSpX7W77ULLg22bZzbY RO9dt/APt/bk4B9l8EYvQ/6VqnSXUdAjuxTPJGvQFe+7eFXq/fA7T1lMUx7BOcKNpP3N leDw== X-Gm-Message-State: AOAM530rouiUANwJEspBErg5NDmPpNnOx00F6dgvPbg1aDTUXm7Go3D/ kC1r5SedQoUlQJiZgWZ0Itxhu91rgY+W8NEscYaWPg== X-Google-Smtp-Source: ABdhPJzRmUxAyXVpRq3MYlVfjsWCU1f8LxvFhpwEafAGvBg+VprQo8v2GPm2XKK4w6cDDWF3UnG7HMoFLOPKwCnsJ6k= X-Received: by 2002:a05:6512:3494:: with SMTP id v20mr8833633lfr.519.1623603107216; Sun, 13 Jun 2021 09:51:47 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Sun, 13 Jun 2021 17:51:36 +0100 Message-ID: To: Mike Schinkel Cc: PHP internals Content-Type: multipart/alternative; boundary="0000000000007650d505c4a88ef8" Subject: Re: [PHP-DEV] [RFC] is_literal From: craig@craigfrancis.co.uk (Craig Francis) --0000000000007650d505c4a88ef8 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, 13 Jun 2021 at 07:46, Mike Schinkel wrote: > Nice! There is an awful lot to like here. > Thank you. > 1. [...] Minimally I'd ask for a separate vote on is_literal() vs. > is_from_literal(). > It would definitely be possible to have a vote on the name. I only went with is_literal() because it's shorter (I'm a slow typer), and to match the current convention of existing "is_*" functions (which all use a single word suffix). 2. Regarding the sections "Non-Parameterised Values" and "Future Scope," if > I understand correctly, your vision is to ultimately require all SQL and > other vulnerable literals to be written as literals in the source code fo= r > them to be used or otherwise disallowed by future library code and/or PHP > internal functionality? > No, fortunately that's not the case, as you're right, we do need to support PhpMyAdmin, Adminer, systems like Drupal (which take certain values, like the table name, from user input), and (as noted in your 4th point), there is a lot of existing code that would break if we were militant about requiring is_literal() for mysqli_query() and other functions. Native functions like mysqli_query() should be covered in a future RFC, but still only use warnings. The exact details would need to be confirmed, but should provide a simple way to mark certain non-literal strings as trusted, and still have a way to switch off those warnings entirely (e.g. legacy systems). Ideally as future scope, we would also provide a way for certain projects to opt-in to having exceptions raised (the militant option), which would not be appropriate for the vast majority of developers, but it's the level I'd find useful for some of my projects (dealing with medical records and other sensitive information). As to how we could (in a future RFC) mark certain non-literal strings as trusted, I've briefly touched on how this can work in the "Future Scope" section (it's an idea I copied from Trusted Types in JavaScript, which follows the same principle). I only did a summary because the exact implementation would need to be discussed later (there are a few options, but they all rely on is_literal() being implemented first). First, we need libraries to use is_literal() to check inputs from developers (i.e. the code they didn't write), which is their main source of injection vulnerabilities. What goes on inside the library is entirely up to them (they know that they are doing). For their output, they create a stringable value-object, which (as future scope) could be marked as trusted for certain functions (like mysqli_query). These value-objects could be as simple as a single private property (value), a __construct() method that takes the libraries output, and a __toString() method to return it; or the library might want to be extra sure of its output, and use something like the Query Builder example linked in the "Non-Literal Values" section. As future scope, this allows functions like mysqli_query() to simply check for a literal, or one of these trusted value-objects (or have warnings switched off entirely). 3. I notice your RFC does not grant sprintf() the ability to return a > string with the "literal" flag even though sprintf() can frequently retur= n > strings composed from known literals and safe numeric value. > Joe has created a `literal_sprintf()` function for testing, I think it does exactly what you're suggesting, and I believe that logic can be applied directly to sprintf(). The only reason I've been hesitant is to keep this implementation of is_literal() limited to identifying literals and supporting concatenation (to keep it simple, and easy to use). It's like the "String Splitting" part in the RFC, it's probably fine, and we can do it if there's interest; but we also need to consider what issues it might cause (if any). From a security point of view, it's always best to keep something as simple as possible (and that may well include supporting sprintf), because "You cannot prove security. You can only prove insecurity". > 4. Regarding the section "WHERE IN" you argue that we should push everyon= e > to use parameters properly =E2=80=94 which is a great idea in theory =E2= =80=94 but that > would have a significant breakage in existing sites online. > Hopefully my answer to #2 addresses this, as you're right, that would definitely cause more problems than it solves. Certainly for this implementation I support libraries providing warnings and think those would be the most appropriate, rather than exceptions. (Though obviously libraries know their user bases best of course). In summary of my future scope answer above, libraries could be able to mark their output as trusted for functions like mysqli_query(). For SQL it would ideally still be a literal string, but that's not always possible (especially with HTML), so that's where the trusted stringable value-objects would be used. For legacy systems, developers could simply switch off the warnings entirely (but at least that's a choice they are actively choosing to make). > 5. Given #2 and #4, I think your section "Support Functions" is missing > [...] as_literal() or make_literal() or even as_unsafe_literal() > I'm not against it, but I don't think it's needed (and with it raising warnings not exceptions it won=E2=80=99t be necessary to prevent things bre= aking). In the Example Library I made under the "Try It" section in the RFC: https://github.com/craigfrancis/php-is-literal-rfc/blob/main/justification/= example.php?ts=3D4 It checks for literals but also accepts a value-object allowing you to bypass that check (which I think is probably the best way to implement this at the moment). It shows how an `unsafe_value` value-object could be used and accepted by the example library, in the same way DB::raw() works in Laravel. In the future a similar setup in PHP=E2=80=99s native functions co= uld accept literals or value-objects which can be trusted, so we shouldn=E2=80= =99t need a way to lie to PHP and tell it to treat one as the other. 6. Regarding the section "Reflection API," shouldn't their be an > is_literal()/is_from_literal() method added to the ReflectionParameter() > and ReflectionProperty() classes? > If I can just check we are talking about the same thing, so where we have: class my_class { private string|int $my_property =3D 0; private function my_method(string|int $my_parameter) { } } $reflector =3D new ReflectionClass('my_class'); $property =3D $reflector->getProperty('my_property'); $method =3D $reflector->getMethod('my_method'); $parameter =3D $method->getParameters()[0]; var_dump( $property->getType()->__toString(), $parameter->getType()->__toString(), ); The ReflectionParameter/ReflectionProperty allows us to return their respective types (ReflectionType). In this case, it's saying these two are "string|int", which isn't the current value, just what they can accept. I think this is touching on an area that Joe mentioned yesterday, "a possible future where we have first class support" in a more intricate way. I think there is value in this, as it would allow PHP itself to reject non-literal values, but that should be a separate RFC, as that's a bit more of a language change, and will need its own discussion. So assuming you address at least #5 =E2=80=94 and thus #2 and #4 indirectly= =E2=80=94 I > would really like to see this RFC pass. > > -Mike > Thank you Mike Craig --0000000000007650d505c4a88ef8--