Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:114840 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 98959 invoked from network); 13 Jun 2021 06:30:45 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 13 Jun 2021 06:30:45 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 1E9F71804D9 for ; Sat, 12 Jun 2021 23:46:41 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_NONE autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-qv1-f52.google.com (mail-qv1-f52.google.com [209.85.219.52]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 12 Jun 2021 23:46:40 -0700 (PDT) Received: by mail-qv1-f52.google.com with SMTP id x6so13963572qvx.4 for ; Sat, 12 Jun 2021 23:46:40 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=newclarity-net.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=a/jsIuMwSnpwlsCyDl0xVyDpGLStEkxbtyTkizbODSU=; b=gBt+ztBJxA/+NUVH5qE8QGKvMWDXHw2iZjO7mef0ZHgx8DMXZRg9nw7yUbsc08uZLS nt88yWhTA4reSTl6kAySqCEchjN2H6nao9R4jYy7GN4x04RRkNLxOZ++PkkdykuzViNY I0wYdODV44ZZzIiW8iSzyjKusXN4XfetO873f7HChJMjJ4u5A4zT4CEjwVNJbkqq4YxP DyiqgI8IT1zSmXNN1tBp+xFzR5MdFzjhX63u0riaRS9iPnnZ/6T/U98waVvg5gzA2IZD lkYO4kDfLcE5+QsnTdrKD1uP/B08uqosy+g+ZJr6dhTX69020y/KtJxs3Ogox1aBqAe/ x77g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=a/jsIuMwSnpwlsCyDl0xVyDpGLStEkxbtyTkizbODSU=; b=fvqGhir+tMMr7LVZvNbmxqLBqnfa65c8CNv1jgwO+pBl6EyhQG02V5wf4lJNglHXIe ClOTk7O38O4I43eGYQ7Pe/b1OkEijjgyrCepFoYw7P+AcAK1p5dn4lhyhZGLXrzZS0FE KtyvCwxTy9kzYJt4wz2T+47qZbOp1on4b6eFxukkFz2xYDbbL13I6rZOxa1Y/L4RHItZ LoUHXZne3ifqAgIUIgJlJezFT+4wUxvkQjqQmxIlaKgM+Ksx3S7M8IGhuXHA2I+yZn3O QH3SU7hFEs2/H58jS0V9W3pAdSLyypdiQiHbIUbpNiHAHozIjXqjEi+Z+zs2/rWrz3AF PDCA== X-Gm-Message-State: AOAM531Ea5AJg5ITSWNFhzxqLoqNTX+uEyBmdvjWtYoPTKJOMzET9ffn yq0RhHclnI2xJP20DIY4waqvroHxCjDOSv6I X-Google-Smtp-Source: ABdhPJyhsCB63mV6ouOkB3OpqKgW6Gbbr/5L4GDYDd6+D3QubvxzERsJpyPN6CnXXZtaajLUKeufLw== X-Received: by 2002:ad4:5aeb:: with SMTP id c11mr12681190qvh.34.1623566799936; Sat, 12 Jun 2021 23:46:39 -0700 (PDT) Received: from [192.168.1.10] (c-24-98-254-8.hsd1.ga.comcast.net. [24.98.254.8]) by smtp.gmail.com with ESMTPSA id p199sm7916711qka.128.2021.06.12.23.46.37 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 12 Jun 2021 23:46:38 -0700 (PDT) Message-ID: Content-Type: multipart/alternative; boundary="Apple-Mail=_7365E864-4CC7-44C5-89E3-E0EED5696923" Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Date: Sun, 13 Jun 2021 02:46:35 -0400 In-Reply-To: Cc: PHP internals To: Craig Francis References: X-Mailer: Apple Mail (2.3608.120.23.2.7) Subject: Re: [PHP-DEV] [RFC] is_literal From: mike@newclarity.net (Mike Schinkel) --Apple-Mail=_7365E864-4CC7-44C5-89E3-E0EED5696923 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 Hi Craig, > On Jun 12, 2021, at 1:00 PM, Craig Francis = wrote: >=20 > Hi Internals, >=20 > I'd like to start the discussion on the is_literal() RFC: >=20 > https://wiki.php.net/rfc/is_literal Nice! There is an awful lot to like here. And few bits of concern. What's to like? ------------------- 1. Adding a proactive method for guarding against injection = vulnerabilities would be a huge step forward for PHP. This should not be = underemphasized. 2. That you added the section "Previous Work" and covered what other = languages are doing in this regard. 3. The "Thanks" section and all the external references and prior = comments that provides insight into how this RFC was developed. 4. And I love that you publicly stated these points in an RFC: a. Why not use static analysis? It will never be used by most = developers. b. Why not educate everyone? You can't - developer training = simply does not scale, and mistakes still happen. What's of concern?=20 ------------------------- (Note: the last point could address concerns #2 and #4.) 1. Bike-shedding a bit, but I find a bit of cognitive dissonance over = the semantic incorrectness of is_literal(), similar to Jakob Givoni's = concern.=20 I also fear it will be confusing for developers who know what a literal = is, and thus I agree with Jakob.=20 Minimally I'd ask for a separate vote on is_literal() vs. = is_from_literal().=20 2. Regarding the sections "Non-Parameterised Values" and "Future Scope," = if I understand correctly, your vision is to ultimately require all SQL = and other vulnerable literals to be written as literals in the source = code for them to be used or otherwise disallowed by future library code = and/or PHP internal functionality? If I did not misunderstand I believe you would disallow an important = class of functionality from being implemented in PHP, including several = existing and widely used applications such as but not limited to = PhpMyAdmin and Adminer that open and interact with arbitrary SQL = databases. It would also disallow using any data returned by APIs where = the PHP application cannot know in advance what data will be acquired, = even if that data has been properly escaped and/or sanitized. 3. I notice your RFC does not grant sprintf() the ability to return a = string with the "literal" flag even though sprintf() can frequently = return strings composed from known literals and safe numeric value. And = I have frequently see it used for this purpose, e.g.: $fields =3D ['id','name']; $table =3D 'my_table'; $sql =3D sprintf( 'SELECT %s FROM %s WHERE ID=3D%d',=20 implode( ',', $fields ),=09 $my_table, intval($_GET['id'])); Of course your section "WHERE IN" "addresses" this, but please see #4 = below.=20 I argue sprintf() should be able to pass thru literal flags and validate = integer values to determine that the result is literal. Am I missing = something here? 4. Regarding the section "WHERE IN" you argue that we should push = everyone to use parameters properly =E2=80=94 which is a great idea in = theory =E2=80=94 but that would have a significant breakage in existing = sites online. While many here in Internals seem to look down on = WordPress if does power over 40%[1] of the web at this point and this = would break most if not all those sites if the site owners or their = hosts were to upgrade to a version of PHP that was militant about = requiring is_literal() for mysqli_query()[2]. =20 And yes, WordPress could upgrade their core code[3] to address this, but = that would not also fix the 60k plugins and 10k themes on WordPress.org = , or likely the million+ combined custom plugins = and themes in use on the web.=20 Is it really reasonable to change PHP to just break of all those sites = so that we can force all the site owners to beg and plead with their = plugin and theme developers to better protect them from injection = errors? Seems that medicine might be worse than the cure, at least for = all those site owners.=20 5. Given #2 and #4, I think your section "Support Functions" is missing = what would allow WordPress, PhpMyAdmin, Adminer and all other PHP = applications to address the concerns in #2 and #4 and it is neither = literal_concat() nor literal_implode().=20 The function we would need is the one Lauri Kentt=C3=A4 wrote (in = jest?), but we'd need one that is performant, not a userland version. = Name it as_literal() or make_literal() or even as_unsafe_literal() =E2=80=94= whichever chosen is unimportant IMO =E2=80=94 but applications like the = ones I mentioned need to be able to say "I absolutely know what I am = explicitly doing so do not throw a warning or an error telling me my = value is `not a literal`, dammit!" =20 Certainly most people would not use such a method by accident, = especially if called something like `as_unsafe_literal()`. There might = even be ways to validate that this was not used by a copy-and-paste = artist, but I will leave that validation to other people who are smarter = than me. 6. Regarding the section "Reflection API," shouldn't their be an = is_literal()/is_from_literal() method added to the ReflectionParameter() = and ReflectionProperty() classes? So assuming you address at least #5 =E2=80=94 and thus #2 and #4 = indirectly =E2=80=94 I would really like to see this RFC pass. -Mike [1] https://w3techs.com/blog/entry/40_percent_of_the_web_uses_wordpress = =20 [2] = https://www.php.net/manual/en/mysqli.query.php#refsect1-mysqli.query-param= eters = =20 [3] = https://github.com/WordPress/WordPress/blob/master/wp-includes/wp-db.php#L= 2056 = --Apple-Mail=_7365E864-4CC7-44C5-89E3-E0EED5696923--