Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:115034 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 94202 invoked from network); 22 Jun 2021 16:35:55 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Jun 2021 16:35:55 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 57CE61804D8 for ; Tue, 22 Jun 2021 09:54:13 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-qk1-f175.google.com (mail-qk1-f175.google.com [209.85.222.175]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Tue, 22 Jun 2021 09:54:12 -0700 (PDT) Received: by mail-qk1-f175.google.com with SMTP id i68so41281754qke.3 for ; Tue, 22 Jun 2021 09:54:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=OUQHia5MhN2RpQCyIUOMXubYVPcstwAnJPF3M7s9iRU=; b=SemZjdZAEXJs4qqRW9heIEj17226khchLDOuHkMiQSahm411r7za3pWp9nOzlgEmQZ 5RIzDaoMsGZsuWnZ596akUYIOhFJVocPzjXtOxVNwDQjxG9ELAn2iwO3Ei2H6X9dfF4M lMnx4A36cto51Kl3IK6Te6zeLqL4PlUAFJnarLwercBwqR+Lny2GiTh0lg8G3Qz57ir3 pa0e1qIQRFo6sOBiY0S1Qg3sgS4V1ZJbhC7ao/5wgK8esJvPuArQE6fsrwXfPvAg8nLt 281++UP80zNLdfsqyQFk/+I7716dJI+xnShy3MSL9Wbz8mZi+ekyXqkRpLl0vqMhKPOr FbuA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=OUQHia5MhN2RpQCyIUOMXubYVPcstwAnJPF3M7s9iRU=; b=Wq0gCZPdL32v8FvSbmxqilwz+xrjoE8tgNguD5feZhN827quhLsAEJUvXMsuyZypXZ NAwg+CbVzJ9xzyPCIrd1xDvtJtT+A6nq4yXietYKvcUaNWBC9F4ti0qSQMgW61GPWEe5 6JI2O51KzMkXruBHf1hsCFUSiOUpa0zu2TidBnV1DMWpyD4ivlxc77EbIPRo2jEeaTkC xvgTL5gSVxActMwHKKwd4v7oy/r29SQDo+ibvvet5DEHSXkIdCGgrqyKoFFnL20VbItq p1fgnmjzcck/j/mXSPw/3DPWzn/Bq/yqVI14pLZXbzgqq9Q4royshLB24FJ+wgxZ4xXR DFjQ== X-Gm-Message-State: AOAM5339+1UsyQ44c/wi6/hvba7Jn33XB1vDz9lQ/Nf1rUnTRpavOgJL DENck+m/HUpx1R3QhWlpzjqlWlxNzWmbR7MwQEOMh2EjTpm3qSECfao= X-Google-Smtp-Source: ABdhPJzmxEdCpQLQDiYVnekouhOZ6SdJxu2RHOsxbcklchHK87ribzIvA3ZcF8KUe9rwYBdQdRvis4CspGwc6++/isg= X-Received: by 2002:a05:620a:4e9:: with SMTP id b9mr5155742qkh.358.1624380852052; Tue, 22 Jun 2021 09:54:12 -0700 (PDT) MIME-Version: 1.0 References: <4FFA0160-1A05-4DA0-9C9A-79F778443A35@newclarity.net> <9e8faf2b-fdc7-49f4-bdc1-0822841616e2@www.fastmail.com> <9ACF5579-6721-4C23-B725-1018B7971B20@newclarity.net> In-Reply-To: <9ACF5579-6721-4C23-B725-1018B7971B20@newclarity.net> Date: Tue, 22 Jun 2021 17:54:01 +0100 Message-ID: To: Mike Schinkel Cc: Larry Garfield , php internals Content-Type: multipart/alternative; boundary="000000000000aaa40505c55da35e" Subject: Re: [PHP-DEV] Sql Object Model Parser & Sanitizer (was [RFC] is_literal) From: tekiela246@gmail.com (Kamil Tekiela) --000000000000aaa40505c55da35e Content-Type: text/plain; charset="UTF-8" While it's true that a lot of the internet is using mysqli due to WordPress, this doesn't change the fact that PHP already offers a solution to the problem. Both PDO and mysqli support server-side prepared statements. We don't talk about WordPress. They should not hold back PHP. That project is so behind that they still offer mysql_* API in their abstraction library. Nonetheless, whatever solution we offer, they're unlikely to pick it up. They haven't used prepared statements for whatever reason, and they created their own client-side prepared statements. What makes you think that if we write "Parser & Sanitizer", they're going to start using it? The proposal for `is_literal` is going to be completely useless to WordPress as they have no place to use it in their codebase. It's a pity we can't help them, but it's true. To clarify the proposal from Craig, I would like to point out that `is_literal()` will not help with the data being passed to SQL. User input cannot be trusted in any sense, no matter where it comes from. The proposal for `is_literal()` helps libraries that try to build the SQL dynamically, without the user input. For example, it would help to write a secure method like this: function select(mysqli $mysqli, string $table) { if(!is_literal($table)) { throw new RuntimeException('The table name has to be a hardcoded string'); } return $mysqli->query("SELECT * FROM {$table}")->fetch_all(); } This code would prevent the developer from misusing this function by passing user input to it. >That is why the title of this email said "Parse and Sanitizer" not "Builder." Then we definitely don't need that. Both PDO and mysqli can escape string literals in a context aware manner (to some extent at least). PDO tried to implement a parser, but it proved challenging and to this day doesn't work correctly. As Larry explained, SQL is not a standardised language anymore. The flavours diverged and created a widely inconsistent language. To build a context-aware parser for SQL, you would need to know: - The SQL flavour - The database type (MySQL or MariaDB) - The database version - The settings maintained on that database server For example, in MySQL, you need to know which character is used to escape quotes in string literals, and you need to know the SQL_MODE. Then you need to know which syntax is allowed and understood by that specific MySQL version. The SQL syntax differs a lot between MySQL and MariaDB. Building, parsing and compiling the SQL in PHP would involve rebuilding the same parser that MySQL server is using. And you'd have to do it for every single possible database vendor/version. On top of that, you need to have direct access to the database/table/column/SP/functions names, so that your parser can apply the business logic. Not to mention escaping the data if you don't want to use prepared statements. >OTOH, what about when a developer needs to parameterize field and table names in PDO You don't parameterize field and table names. Period. You hardcode these names in your business logic. Here's an example from DBAL: $queryBuilder ->select('id', 'name') // <-- Hardcoded ->from('users') // <-- Hardcoded ->where('email = ?') // <-- Hardcoded ->setParameter(0, $userInputEmail) // <-- Parameterized ; Irrelevant of which parameterization technique is used here, the idea is that you never parameterize SQL code. You parameterize the data. WordPress does exactly the same with their prepare() method. If you need to have dynamic field identifiers, then it's your job as a developer to write the code that can safely handle such requirement. A native PHP "SQL Object Model with parser and sanitizer functionality" wouldn't prevent SQL injection in the same way that userland DB abstraction libraries don't. You can still create SQL injection with wordpress $wpdp class, you can do it with DBAL, and you can do it with PDO. To summarize, we are not looking for a solution to prevent SQL injection or to help Wordpress. We are looking for a solution that would prevent the misuse of libraries like DBAL, so that methods like select(), from(), and where() could generate a warning when used with non-hardcoded data. Regards, Kamil P.S. By "sanitization" I understand removing unwanted information from a piece of data. SQL injection cannot be prevented by removing data. It can only be prevented by strict separation of SQL and data, or by careful formatting of the data when included in the SQL string itself. This includes quoting, type casting and escaping of quotes. Sanitization cannot prevent SQL injection! --000000000000aaa40505c55da35e--