Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118727 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 21425 invoked from network); 2 Oct 2022 15:10:18 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 2 Oct 2022 15:10:18 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 24BA7180211 for ; Sun, 2 Oct 2022 08:10:16 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS, SPF_NONE,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS29838 64.147.123.0/24 X-Spam-Virus: No X-Envelope-From: Received: from wout3-smtp.messagingengine.com (wout3-smtp.messagingengine.com [64.147.123.19]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 2 Oct 2022 08:10:15 -0700 (PDT) Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.west.internal (Postfix) with ESMTP id 5E3E8320090F for ; Sun, 2 Oct 2022 11:10:14 -0400 (EDT) Received: from imap50 ([10.202.2.100]) by compute1.internal (MEProxy); Sun, 02 Oct 2022 11:10:14 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= garfieldtech.com; h=cc:content-type:date:date:from:from :in-reply-to:in-reply-to:message-id:mime-version:references :reply-to:sender:subject:subject:to:to; s=fm2; t=1664723413; x= 1664809813; bh=W+u7XCZvYAy2V6nkxTHiNODIengGdv9Q/NRRGwG757g=; b=j QpECtHq4hf1QV/gu/Yp7+EcPzvA5UBU1KShyM2u1WjaHfoW8jPnzDbnDkkl4nG/D HVQamClTul4VP8Oa/Q3MVIRTLRybxMIhJrf0JaCeTS3JQJk1aCVaNwnOtH63mn4c F95WXe69udeln7eFaGtL8d5z8P7+Fqc6VHC4jTJyJFh/LIQaaPSAGiStmXMjFHnM C6qu6HYBCu0UX53bq94hBR7HU6LFgJc67gic5R/Rk8ywo+1u3Sa6uLcGfibmNi8X z814H4SJwwQFLRdB86ZJL6kxe12ZnWW7PXTYn4/IuGFtYiR+3BnbAnJzfa2h7To7 3S/aV8hqoJDjDXe7vpapA== DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:date:date:feedback-id :feedback-id:from:from:in-reply-to:in-reply-to:message-id :mime-version:references:reply-to:sender:subject:subject:to:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm2; t=1664723413; x=1664809813; bh=W+u7XCZvYAy2V6nkxTHiNODIengG dv9Q/NRRGwG757g=; b=RDXUlab8y3bmsET99cpiizRl2ZFrFfV7aRJOlgxNzAuX xcVxgvgIMvXNbFGxoJgclbD6vK4IQag6cyEZw0wVPPWY9cUU5GjWyq0XY3QL5lzx gglpKl6+QFk702B+WOE8pp+/4CDiPzGL9VuDk5ld/zn49s+rkcL6YzizwyXW4CHt zELr2AwUfAvYurcPbJ80zbXSZGMrHrdTRSf/FwHDKAj5vrFtcwFNwaGVjNLnATlq eO/EnO5hNpZ+FcJKK+odi664FC6r+8QRMl3ibObA1P0f8hXct3U4xvAbZU7k3cr2 vpqYkFUDcdSELpzPtNCYKOT/P3bg0ktcPUZLbEblIg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvfedrfeehjedgkeegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhepofgfggfkjghffffhvffutgesthdtredtreertdenucfhrhhomhepfdfnrghr rhihucfirghrfhhivghlugdfuceolhgrrhhrhiesghgrrhhfihgvlhguthgvtghhrdgtoh hmqeenucggtffrrghtthgvrhhnpeevheehvdevjeelvdevgfelvefftdejkeelvdekgeeh fffgiedvjefhhfeltdduteenucffohhmrghinhepphhhphdrnhgvthenucevlhhushhtvg hrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehlrghrrhihsehgrghrfhhi vghlughtvggthhdrtghomh X-ME-Proxy: Feedback-ID: i8414410d:Fastmail Received: by mailuser.nyi.internal (Postfix, from userid 501) id 94AB41700083; Sun, 2 Oct 2022 11:10:13 -0400 (EDT) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.7.0-alpha0-968-g04df58079d-fm-20220921.001-g04df5807 Mime-Version: 1.0 Message-ID: <0cfb9a7b-1168-42ef-ae1a-bdc72210de43@app.fastmail.com> In-Reply-To: References: Date: Sun, 02 Oct 2022 10:09:52 -0500 To: "php internals" Content-Type: text/plain Subject: Re: [PHP-DEV] Sanitize filters From: larry@garfieldtech.com ("Larry Garfield") On Sat, Oct 1, 2022, at 10:39 AM, Kamil Tekiela wrote: > Hi Internals, > > For quite some time now, PHP's sanitize filters have "Rustled My Jimmies". > These filters bother me because I can't really justify their existence. I > can understand that a few of them are sensible and may come in handy, but I > would like to talk about some of these in particular. > > In PHP 8.1, we have deprecated FILTER_SANITIZE_STRING which I deemed to be > a priority due to its confusing name and behaviour. The rest is slightly > less dangerous, but as was pointed out to me in a recent conversation with > a PHP developer, these filters are all very confusing. > > I would like to have some opinions on the following filters. What do you > think we should do with them? Deprecate? Fix? Provide better documentation? > > --- > > *FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or encode > special characters." > Now, what does that mean? PHP has two functions for URL encoding: urlencode > used for encoding query-string parts, and rawurlencode used for encoding > any other URL part (two different RFCs are followed by these functions). > Which of these RFCs is applied in this filter? Furthermore, the description > says that "special characters" can be stripped or encoded. Is one of these > actions the default and the other can be selected by a flag or are both > optional? What are these special characters? Are they special in the > context of URL? If so, why did we encode them first? If these are HTML > special characters (there's no single definition of special HTML chars), > then why does this filter encode them if the filter is for URL > sanitization? What does backtick have to do with any of this > (FILTER_FLAG_STRIP_BACKTICK)? > > *FILTER_SANITIZE_ADD_SLASHES - "*Apply addslashes(). (Available as of PHP > 7.3.0)" > This filter was added as a replacement for magic_quotes filter. According > to PHP documentation, addslashes is supposed to be used when injecting PHP > variables into eval'd string. Real-life showed that this function is used > in a lot of places that have nothing to do with PHP's eval. I am not sure > if the sanitize filter is misused in a similar fashion, but judging from > the fact that it was meant as a replacement for magic_quotes, my guess is > that it's very likely still abused. > > *FILTER_SANITIZE_EMAIL *- "Remove all characters except letters, digits and > !#$%&'*+-=?^_`{|}~@.[]." > Which RFC does this adhere to? It strips slashes and quoted parts, doesn't > allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This > filter is ok for simple usage, but it isn't true to any known specification > AFAIK. > > *FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters with > ASCII value less than 32, optionally strip or encode other special > characters." > What's the intended purpose of this filter? "Special characters" are still > not clearly defined, but at least it's more clear than > the FILTER_SANITIZE_ENCODED description. Same question about backticks > though: why? Why encode ASCII <32 chars? > > *FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling > htmlspecialchars() with ENT_QUOTES set. Encoding quotes can be disabled by > setting FILTER_FLAG_NO_ENCODE_QUOTES. Like htmlspecialchars(), this filter > is aware of the default_charset and if a sequence of bytes is detected that > makes up an invalid character in the current character set then the entire > string is rejected resulting in a 0-length string. When using this filter > as a default filter, see the warning below about setting the default flags > to 0." > Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's not > used with filter_input(), it's the least problematic. We > have htmlspecialchars() though, so how useful is this filter? > > *FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just > called FILTER_RAW_STRING? If the value being filtered is something other > than a string, what will this filter return? Integers, floats, booleans and > nulls are converted to a string, Arrays and objects make the filter fail. > > --- > > Let's quickly mention the filter flags. > > The FILTER_FLAG_STRIP_LOW flag will also remove tabs, carriage returns and > newlines as these are all less than 32 ASCII codes. When is this useful and > expected? > > The FILTER_FLAG_ENCODE_LOW flag "encodes" ASCII <32 codes presumably into > HTML entities, although that's not specified anywhere in the PHP manual. > The word HTML does not appear on the > https://www.php.net/manual/en/filter.filters.flags.php page. What do these > characters look like when presented by HTML? When is it ever useful to use > this flag? > > FILTER_FLAG_ENCODE_AMP & FILTER_FLAG_STRIP_BACKTICK - why is this even a > thing? > > Due to flags, FILTER_VALIDATE_EMAIL will happily validate email addresses > that would be otherwise mangled by FILTER_SANITIZE_EMAIL. > > These are just the things I found confusing and strange about the sanitize > filters. Let's try to put ourselves in the shoes of an average PHP > developer trying to comprehend these filters. It's quite easy to shoot > yourself in the foot if you try to use them. The PHP manual doesn't do a > good job of explaining them, but that's probably because they are not easy > to explain. I can't come up with good examples of when they should be used. > > Regards, > Kamil The filter extension has always been a stillborn mess. Its API is an absolute disaster and, as you note, its functionality is unclear at best, misleading at worst. Frankly it's worse than SPL. I'd be entirely on board with jettisoning the entire thing, but baring that, ripping out large swaths of it that are misleading suits me fine. --Larry Garfield