Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118724 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 97009 invoked from network); 2 Oct 2022 06:19:11 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 2 Oct 2022 06:19:11 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 2264D180381 for ; Sat, 1 Oct 2022 23:19:09 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-qt1-f178.google.com (mail-qt1-f178.google.com [209.85.160.178]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 1 Oct 2022 23:19:05 -0700 (PDT) Received: by mail-qt1-f178.google.com with SMTP id e20so4753118qts.1 for ; Sat, 01 Oct 2022 23:19:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=kcur9miGnkm4U3vcuNvZldHSLuFvUqJeDeQ0tCa9ke8=; b=onrZtk18AXAFdNyzo+lid/t1Ns0ZpImKNefoeRA701kmQcZ6uOfyKtB1MyQFMn7DUA rlzH7obPedYren9Hb+w2Nwnm9Yd/lE2BNr1hyAciCIuGQHN90KDchcSxSYrqJJsZbvT8 voh2rF7BVM9v21YQDSUr5BTHMo1M4h5Ys/rNjXVnNjkGyrqLCGQMNEXOPE77vgTMVvxZ bJoz8cEmKPxdQfTiIziFypbsG5BKM5NrTzjODh/L6evyTH021kz0BbL/TYMUFl653zc4 A/lDdnqrXzJB/AyXIqYi5XhDLlO9G6Lek4SLhhYGzMNTjyQ9fbg3OuKW+s+YstKZFem+ UU+Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=kcur9miGnkm4U3vcuNvZldHSLuFvUqJeDeQ0tCa9ke8=; b=JsvzjVrRkdJqvPLWVtq5XUe49L52SSPB7L7T/99TVBZGypbaETBhyBSnVPEX7WiHeC 9Je9tsNs0ExLrGbeYjQIoNHbwVQDBsU+YDTkIcntwXx0XzTZFN43fSLIOisRK6Ziy0/d 8XYfSnnAuydJmzWfG6txhLcraaguFbdQnR3Gn/YiZUPNnAuCdwkTWZt8OMUUYutW0VAj yJYg0FagfbI0ueQM6uAnSg53E8C6LeGC8opcO/lh97ITtqLcSgyKFcyikgqaWR5WecRM iDoF+6trPtEokiyDJm7esoSBrM7rdtCsL2CBG2AhJC7CCJ9gxs8o7Lop7tvkoc4yMMno EpkA== X-Gm-Message-State: ACrzQf1Fp09tuy94zO/OQFIEaukUVLqpC9uugL2WcZ1/KK6kQOMQg25G kf1Wyx4YM2+6e88FudnpQ0CKLvpr3OI7DK1Hxig= X-Google-Smtp-Source: AMsMyM6oAfoU7rbJXlmdoy8NQGVH8BSbpbIYl+WHfVSgEITL1mVGmdeaV9O3l/D/Tfrw0WjijClZo4vCg+ABD4gyDEU= X-Received: by 2002:ac8:58c4:0:b0:35c:e2e9:1ebe with SMTP id u4-20020ac858c4000000b0035ce2e91ebemr12190905qta.38.1664691544558; Sat, 01 Oct 2022 23:19:04 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Sun, 2 Oct 2022 09:18:53 +0300 Message-ID: To: Kamil Tekiela Cc: PHP internals Content-Type: multipart/alternative; boundary="0000000000002c925a05ea07346a" Subject: Re: [PHP-DEV] Sanitize filters From: lokrain@gmail.com (Lokrain) --0000000000002c925a05ea07346a Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello Kamil, I believe that PHP should not try to act as a =E2=80=9Cframework=E2=80=9D t= hat provides you with ready solutions for such cases. Being able to actually modify the default behaviour of some functions through the ini .. is even scarier. For 25 year writing in PHP I never relied on this =E2=80=9Cmagic=E2=80=9D f= or security:) Regards, Dimitar On Sat, 1 Oct 2022 at 18:39, Kamil Tekiela wrote: > Hi Internals, > > For quite some time now, PHP's sanitize filters have "Rustled My Jimmies"= . > These filters bother me because I can't really justify their existence. I > can understand that a few of them are sensible and may come in handy, but= I > would like to talk about some of these in particular. > > In PHP 8.1, we have deprecated FILTER_SANITIZE_STRING which I deemed to b= e > a priority due to its confusing name and behaviour. The rest is slightly > less dangerous, but as was pointed out to me in a recent conversation wit= h > a PHP developer, these filters are all very confusing. > > I would like to have some opinions on the following filters. What do you > think we should do with them? Deprecate? Fix? Provide better documentatio= n? > > --- > > *FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or encod= e > special characters." > Now, what does that mean? PHP has two functions for URL encoding: urlenco= de > used for encoding query-string parts, and rawurlencode used for encoding > any other URL part (two different RFCs are followed by these functions). > Which of these RFCs is applied in this filter? Furthermore, the descripti= on > says that "special characters" can be stripped or encoded. Is one of thes= e > actions the default and the other can be selected by a flag or are both > optional? What are these special characters? Are they special in the > context of URL? If so, why did we encode them first? If these are HTML > special characters (there's no single definition of special HTML chars), > then why does this filter encode them if the filter is for URL > sanitization? What does backtick have to do with any of this > (FILTER_FLAG_STRIP_BACKTICK)? > > *FILTER_SANITIZE_ADD_SLASHES - "*Apply addslashes(). (Available as of PHP > 7.3.0)" > This filter was added as a replacement for magic_quotes filter. According > to PHP documentation, addslashes is supposed to be used when injecting PH= P > variables into eval'd string. Real-life showed that this function is used > in a lot of places that have nothing to do with PHP's eval. I am not sure > if the sanitize filter is misused in a similar fashion, but judging from > the fact that it was meant as a replacement for magic_quotes, my guess is > that it's very likely still abused. > > *FILTER_SANITIZE_EMAIL *- "Remove all characters except letters, digits a= nd > !#$%&'*+-=3D?^_`{|}~@.[]." > Which RFC does this adhere to? It strips slashes and quoted parts, doesn'= t > allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This > filter is ok for simple usage, but it isn't true to any known specificati= on > AFAIK. > > *FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters with > ASCII value less than 32, optionally strip or encode other special > characters." > What's the intended purpose of this filter? "Special characters" are stil= l > not clearly defined, but at least it's more clear than > the FILTER_SANITIZE_ENCODED description. Same question about backticks > though: why? Why encode ASCII <32 chars? > > *FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling > htmlspecialchars() with ENT_QUOTES set. Encoding quotes can be disabled b= y > setting FILTER_FLAG_NO_ENCODE_QUOTES. Like htmlspecialchars(), this filte= r > is aware of the default_charset and if a sequence of bytes is detected th= at > makes up an invalid character in the current character set then the entir= e > string is rejected resulting in a 0-length string. When using this filter > as a default filter, see the warning below about setting the default flag= s > to 0." > Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's no= t > used with filter_input(), it's the least problematic. We > have htmlspecialchars() though, so how useful is this filter? > > *FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just > called FILTER_RAW_STRING? If the value being filtered is something other > than a string, what will this filter return? Integers, floats, booleans a= nd > nulls are converted to a string, Arrays and objects make the filter fail. > > --- > > Let's quickly mention the filter flags. > > The FILTER_FLAG_STRIP_LOW flag will also remove tabs, carriage returns an= d > newlines as these are all less than 32 ASCII codes. When is this useful a= nd > expected? > > The FILTER_FLAG_ENCODE_LOW flag "encodes" ASCII <32 codes presumably into > HTML entities, although that's not specified anywhere in the PHP manual. > The word HTML does not appear on the > https://www.php.net/manual/en/filter.filters.flags.php page. What do thes= e > characters look like when presented by HTML? When is it ever useful to us= e > this flag? > > FILTER_FLAG_ENCODE_AMP & FILTER_FLAG_STRIP_BACKTICK - why is this even a > thing? > > Due to flags, FILTER_VALIDATE_EMAIL will happily validate email addresses > that would be otherwise mangled by FILTER_SANITIZE_EMAIL. > > These are just the things I found confusing and strange about the sanitiz= e > filters. Let's try to put ourselves in the shoes of an average PHP > developer trying to comprehend these filters. It's quite easy to shoot > yourself in the foot if you try to use them. The PHP manual doesn't do a > good job of explaining them, but that's probably because they are not eas= y > to explain. I can't come up with good examples of when they should be use= d. > > Regards, > Kamil > --0000000000002c925a05ea07346a--