Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:115033 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 92567 invoked from network); 22 Jun 2021 16:28:21 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Jun 2021 16:28:21 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 44C521804E3 for ; Tue, 22 Jun 2021 09:46:36 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_NONE autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-qk1-f171.google.com (mail-qk1-f171.google.com [209.85.222.171]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 22 Jun 2021 09:46:35 -0700 (PDT) Received: by mail-qk1-f171.google.com with SMTP id g4so39954102qkl.1 for ; Tue, 22 Jun 2021 09:46:35 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=newclarity-net.20150623.gappssmtp.com; s=20150623; h=from:message-id:mime-version:subject:date:in-reply-to:cc:to :references; bh=ACMmN0udfdgUtDYeazrNyp6ZYjTaLOBK5ihNyFI2yrg=; b=DIWJFz12g7twu+uP8+gAit4LlD5rCNrELsHuqdNVCLNPCGnyB86zhfBMXde50DCOJT ez2GJRSYCzpQy0WtoQwO+EJ8J1waYRyqJyt+D7UeQJwvETCXZPDqzHcG7ExPwDeflv/Z mehxOWz0pqpurU2qrdfM8Md8pTm3igvllmGdhZyP7HpZ+POx+0tMhP+zXUkeDfRaeNEJ Z2lWUcs4mQOGIBxPehxGEiI16R52Uj7LLNg+pV+zw8mA6cPUmcTWVjn17XeR6QyI7GpO 1Nrc5h6in0UNNphV5lztGNVk0iRk1CXA8LmNnDwZo1TikCfwe0TtytzjXgbVxLCvTkTV Vtbw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:message-id:mime-version:subject:date :in-reply-to:cc:to:references; bh=ACMmN0udfdgUtDYeazrNyp6ZYjTaLOBK5ihNyFI2yrg=; b=A6QTEXIXKb+iQ+ACbEcD5pi7YejW+MxFxMLpdYF88JTq41RQ4znh7VOVu05IQ3d3mz 1m//LUMIXDdX2II5T7ixMPKGZZfxeBPoDYD4DqGCLPUEKgTWhoKtwvJpvDx6bGMtu882 hzqXecRsCasMkAYI6b+my17ELhxu4Q1N94m2iV+L4farmbDnAkVRils40pltH1m5SbWA cZcl9Tt+5Hmr3gLYr2NFOjhnmFbJWpLVrO3wBBN7Scmablt5X59ZCZRWDZVxRdMkEirA N/bKdz8e5UObHyKpLWvtAUXEgdVoBKyOXpQ0hbhfyb15q5nlnwlJvWVe4yzrEPS4f5qB UB2A== X-Gm-Message-State: AOAM531KpfX+OTSD0vFZNJNDU0JGDGBdLfbCL1PVhdeY0oVRbkH7rPWI PWU4tn940C1jgymbnxmbFko+IQ== X-Google-Smtp-Source: ABdhPJypjNvFobxpX1ThzElpEfO0YO6f5ZSsLCMv2weJKtF8yNHq2abDiABZFBHPKPETNv5zZqJB3w== X-Received: by 2002:a37:7c07:: with SMTP id x7mr2961463qkc.417.1624380394773; Tue, 22 Jun 2021 09:46:34 -0700 (PDT) Received: from [192.168.1.10] (c-24-98-254-8.hsd1.ga.comcast.net. [24.98.254.8]) by smtp.gmail.com with ESMTPSA id w2sm12772877qkf.88.2021.06.22.09.46.34 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 22 Jun 2021 09:46:34 -0700 (PDT) Message-ID: <9ACF5579-6721-4C23-B725-1018B7971B20@newclarity.net> Content-Type: multipart/alternative; boundary="Apple-Mail=_13D34866-290A-4CFD-8A60-42A66949E8AD" Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Date: Tue, 22 Jun 2021 12:46:33 -0400 In-Reply-To: <9e8faf2b-fdc7-49f4-bdc1-0822841616e2@www.fastmail.com> Cc: php internals To: Larry Garfield References: <4FFA0160-1A05-4DA0-9C9A-79F778443A35@newclarity.net> <9e8faf2b-fdc7-49f4-bdc1-0822841616e2@www.fastmail.com> X-Mailer: Apple Mail (2.3608.120.23.2.7) Subject: Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal) From: mike@newclarity.net (Mike Schinkel) --Apple-Mail=_13D34866-290A-4CFD-8A60-42A66949E8AD Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Jun 22, 2021, at 11:41 AM, Larry Garfield = wrote: >=20 > On Tue, Jun 22, 2021, at 8:39 AM, Mike Schinkel wrote: >>> On Jun 22, 2021, at 9:00 AM, Kamil Tekiela = wrote: >>>=20 >>> Hi Mike, >>>=20 >>> Please don't do this. We already have PDO with prepared statements. = The data must be bound. This is the secure way of writing SQL queries.=20= >>=20 >> The problem is that over 40% of the web currently runs on PHP code = that=20 >> using mysqli. That CMS does not support PDO nor prepared statements,=20= >> and is unlikely to switch to it anytime some in the foreseeable = future.=20 >=20 > WordPress is not going to leverage anything we do here until and = unless there is a major change of leadership and culture at that = project. Please don't waste any mental effort on it; they clearly waste = no mental effort on what the rest of the PHP community considers good, = secure practices. Anything involving them is tilting at windmills. You misunderstand. What I am (likely) "wasting" my mental effort is to = discuss features that I would be able to use WITHOUT WordPress having to = make ANY changes. HOWEVER, these would be beneficial beyond WordPress, since parameterized = queries cannot parameterize table names nor field names.=20 This is all trying to address the concerns that Craig Francis brought up = off list when he said that you cannot escape or sanitize without knowing = context when I was asking his to provide a make_literal() function, add = support for a IsLiteral attribute, or support an IsLiteralInterface so = that people don't latch on to using is_literal() and make certain edge = cases impossible.=20 When I am trying to address a problem, I come at it from many angles = until I find a solution. One potential solution then is to have a class = built into PHP that can ensure the SQL returned is indeed safe.=20 > Mike, speaking as someone who has written an SQL abstraction layer and = query builder with significant usage (Drupal 7-9), you are *grossly* = under-estimating the complexity of what you describe. It might be = possible to hack together for SQL92, aka "what most PHP devs actually = use because they haven't noticed that it's not 1992 anymore", but that's = already been done. We have DBTNG in Drupal, we have Doctrine, problem = solved. I think what is happening here is you are making an assumption I am = proposing a much larger scope than I am. Think of the scope I am proposing being on par with $mysqli->prepare(), = but a bit more to be able to handle more than just values. > Modern SQL, though, is a stupidly complex and stupidly inconsistent = beast. Most of the syntax beyond the basics is different on every = damned database. The official spec *is not even publicly available*, = and costs a lot of money to access. And no DB engine actually supports = all of it; they all support different subsets with their own different = extensions that may or may not be comparable. Both Modern SQL and legacy SQL are both still text-based query languages = and they all have grammars that can be represented by BNF rules.=20 https://github.com/ronsavage/SQL Those rules could be abstracted into a form accessible via a "dialect" = interface and that is how these would literally any version of SQL could = be supported. Could we finish all of a given dialect at once? No. Iteration based on = what is found to be supported is how this could be approached. = Remember, these dialects could be implemented in userland. By any PHP = developer.=20 Could we ever get them to be perfect? Probably not. But they would be = good at the start and then very good and there would like to be many = people providing PRs to fix new edge cases for the dialects that get the = most use. > Building a tool that parses an arbitrary string to an AST for a spec = that is inconsistent, inaccessible, and not implemented correctly by = anyone is a fool's errand, and that's just the first part of it. =20 That is not what I am proposing. I am proposing to build a tool that = knows how to parse based on a simplified grammar and then sanitize based = on those rules. There really are only a few contexts in a SQL query to be concerned = about. Keywords, field names, table name, table aliases, and typed = values (plus maybe a few other things I missed?) The grammars themselves = can evolve independently as userland implementations of interfaces. This = is not rocket science. > That's not even getting into designing an API for people to modify it, No API per se, an interface. (Yes that is also an API, but not one that = you call; instead it calls you.) > or questions of performance, Parsing a SQL query should not take nearly as long as executing the = query itself, especially if the SqlObjectModel were written in C. > or compiling the AST back into a DB-specific string Sounds like you are envisioning something I am not. > AND then doing parameter binding which varies from one database to = another. Again, that would be handled by the dialect class. > You're talking about reimplementing major portions of MySQL, = PostgreSQL, Oracle, etc. themselves in PHP, all at the same time. Well, = good luck, you're going to need it. You seem to be focused on that fact you think this would be hard instead = of if it would be a viable solution if implemented? > Personally I've long since concluded that database portability is no = longer an achievable or even desirable feature. SQL is just too = fragmented a language, leaving you with a least common denominator that = is grossly under-whelming for modern needs. If you want more than = SQL92, it's not really viable anymore. I agree with you. That is why portability is not relevant to what I am = proposing. Maybe this will help. There are a billion XML schemas, but DomDocument = and its related classes can process them all. A SqlObjectModel would be = similar; it would know how to process text queries where the dialect = interface implementors would be the equivalent of the XML schema. =20 It is not a perfect analogy, but it is close enough. -Mike P.S. I put this at the end, because it is important. There is no reason = we would have to implement all syntax for a given SQL dialect. I'll bet = 5% of any SQL dialect is used 95% of the time. Implement that 5% and = then tell developers who need the other 95% of edge cases =E2=80=94 like = DML =E2=80=94that they have to sanitize those queries manually. =20 And that ~5% would provide injection protection for that ~95% of most = common use-cases. Why let perfect be the enemy of the good? --Apple-Mail=_13D34866-290A-4CFD-8A60-42A66949E8AD--