Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118725 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 98981 invoked from network); 2 Oct 2022 06:31:09 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 2 Oct 2022 06:31:09 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 8C8A91804B3 for ; Sat, 1 Oct 2022 23:31:08 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3,RCVD_IN_MSPIKE_WL,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-pl1-f179.google.com (mail-pl1-f179.google.com [209.85.214.179]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sat, 1 Oct 2022 23:31:05 -0700 (PDT) Received: by mail-pl1-f179.google.com with SMTP id v1so7242482plo.9 for ; Sat, 01 Oct 2022 23:31:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=DFms1CEtOXXOFjAlqGsx7rnyqwb415TnTbbDAZfMHuk=; b=NweSjA3kdAAQjlhOoRMQ6QTTdMc4TSd/QCnz3a3qDwbR3O8hrepaoApAf88+CeGXH5 sYNz4pjLaPLeMLnnBoXCSXSJlGe5ICUyoZswm9iYbmFNjfFWfh1i7Izy81h4hPkpzkqj 4SrKPWi0LAMc2A4mUWlmrgnt2rb0LL1YKTxIUtPjzlHUsobO30gtCxuSPl+lrPdZn92m EpZui4kbEj68KlQ6DmErxeaOz+nS3lPyK0cFBPXGu//XMc5EKAyiAKdE7toUVgvMuuPw vfi4z9gJ3xeGnXUJ+0cQK+bCBHab8qrJmqU0mMJfuW15aDB+dYGp14dK9/Qy4/fKCu7f Hlsw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=DFms1CEtOXXOFjAlqGsx7rnyqwb415TnTbbDAZfMHuk=; b=jP51O/90Mb6uAF8sanUeczI625gyZOUGlO9ub6MxT3xEAYbcEyCla3BYjZAXSYBYYC aTaat1v0UDAluIPqCdJlAKCA9Lv9AVgOsrl7rpVBcyhBMrSJDGD2b0wshYRuAKK9/4YS +bqya2jzk+THNBNN6dO+aUeOpeMyOVSKYtYXBTRohg1k+AFkKDitGs0yaEknZl9UBJI4 wthNQqUJTtk78BkjOcgAGpqhutIC/Ao3t6G1wi/j8CVZASXW2+JN4A/lKUeQrACwGI0O stdLZBKk+Vdi9GMq9kDeAMFnBUXDjOw4BnGtesKvKrM1EqFPviCU7ABUAqwjt8Q7W4oc 3GIA== X-Gm-Message-State: ACrzQf3SV5Qs0k3McCBzX0n05rw3Wait7RD/jgYZLukHtS9VxZBNRKBe BgJ+HImpDwlzLXQW7FwGKbf0aXomOx7su4+Uzg5r1Xpu X-Google-Smtp-Source: AMsMyM5veTqUs+/SmFwK9VP5MK8tmYfh8W4WgfQ9XlF4oCg+xox5iMowaxBsuTUMrEVFGCTvrhqcDyYBy37ZSx1DRSk= X-Received: by 2002:a17:90a:948a:b0:209:7137:3f3c with SMTP id s10-20020a17090a948a00b0020971373f3cmr6315728pjo.245.1664692263984; Sat, 01 Oct 2022 23:31:03 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Sun, 2 Oct 2022 02:30:51 -0400 Message-ID: To: Lokrain Cc: Kamil Tekiela , PHP internals Content-Type: multipart/alternative; boundary="0000000000000e28d105ea075ffe" Subject: Re: [PHP-DEV] Sanitize filters From: vasilii.b.shpilchin@gmail.com (Vasilii Shpilchin) --0000000000000e28d105ea075ffe Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable All right if you are writing on PHP for 25 years, you noticed the PHP was always about high-order web-focused functionality out-of-box. This is one of basic benefits of PHP to other general-purpose languages where you can write everything you want and you also have to write it since the language itself is very basic. I'm for PHP to keep built-in solutions for most common problems in the context of the web. Having passe ZCE exam and writing just 15 years on php. On Sun, Oct 2, 2022, 2:19 AM Lokrain wrote: > Hello Kamil, > > I believe that PHP should not try to act as a =E2=80=9Cframework=E2=80=9D= that provides you > with ready solutions for such cases. > > Being able to actually modify the default behaviour of some functions > through the ini .. is even scarier. > > For 25 year writing in PHP I never relied on this =E2=80=9Cmagic=E2=80=9D= for security:) > > Regards, > Dimitar > > On Sat, 1 Oct 2022 at 18:39, Kamil Tekiela wrote: > > > Hi Internals, > > > > For quite some time now, PHP's sanitize filters have "Rustled My > Jimmies". > > These filters bother me because I can't really justify their existence.= I > > can understand that a few of them are sensible and may come in handy, > but I > > would like to talk about some of these in particular. > > > > In PHP 8.1, we have deprecated FILTER_SANITIZE_STRING which I deemed to > be > > a priority due to its confusing name and behaviour. The rest is slightl= y > > less dangerous, but as was pointed out to me in a recent conversation > with > > a PHP developer, these filters are all very confusing. > > > > I would like to have some opinions on the following filters. What do yo= u > > think we should do with them? Deprecate? Fix? Provide better > documentation? > > > > --- > > > > *FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or > encode > > special characters." > > Now, what does that mean? PHP has two functions for URL encoding: > urlencode > > used for encoding query-string parts, and rawurlencode used for encodin= g > > any other URL part (two different RFCs are followed by these functions)= . > > Which of these RFCs is applied in this filter? Furthermore, the > description > > says that "special characters" can be stripped or encoded. Is one of > these > > actions the default and the other can be selected by a flag or are both > > optional? What are these special characters? Are they special in the > > context of URL? If so, why did we encode them first? If these are HTML > > special characters (there's no single definition of special HTML chars)= , > > then why does this filter encode them if the filter is for URL > > sanitization? What does backtick have to do with any of this > > (FILTER_FLAG_STRIP_BACKTICK)? > > > > *FILTER_SANITIZE_ADD_SLASHES - "*Apply addslashes(). (Available as of P= HP > > 7.3.0)" > > This filter was added as a replacement for magic_quotes filter. Accordi= ng > > to PHP documentation, addslashes is supposed to be used when injecting > PHP > > variables into eval'd string. Real-life showed that this function is us= ed > > in a lot of places that have nothing to do with PHP's eval. I am not su= re > > if the sanitize filter is misused in a similar fashion, but judging fro= m > > the fact that it was meant as a replacement for magic_quotes, my guess = is > > that it's very likely still abused. > > > > *FILTER_SANITIZE_EMAIL *- "Remove all characters except letters, digits > and > > !#$%&'*+-=3D?^_`{|}~@.[]." > > Which RFC does this adhere to? It strips slashes and quoted parts, > doesn't > > allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This > > filter is ok for simple usage, but it isn't true to any known > specification > > AFAIK. > > > > *FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters wit= h > > ASCII value less than 32, optionally strip or encode other special > > characters." > > What's the intended purpose of this filter? "Special characters" are > still > > not clearly defined, but at least it's more clear than > > the FILTER_SANITIZE_ENCODED description. Same question about backticks > > though: why? Why encode ASCII <32 chars? > > > > *FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling > > htmlspecialchars() with ENT_QUOTES set. Encoding quotes can be disabled > by > > setting FILTER_FLAG_NO_ENCODE_QUOTES. Like htmlspecialchars(), this > filter > > is aware of the default_charset and if a sequence of bytes is detected > that > > makes up an invalid character in the current character set then the > entire > > string is rejected resulting in a 0-length string. When using this filt= er > > as a default filter, see the warning below about setting the default > flags > > to 0." > > Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's > not > > used with filter_input(), it's the least problematic. We > > have htmlspecialchars() though, so how useful is this filter? > > > > *FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just > > called FILTER_RAW_STRING? If the value being filtered is something othe= r > > than a string, what will this filter return? Integers, floats, booleans > and > > nulls are converted to a string, Arrays and objects make the filter fai= l. > > > > --- > > > > Let's quickly mention the filter flags. > > > > The FILTER_FLAG_STRIP_LOW flag will also remove tabs, carriage returns > and > > newlines as these are all less than 32 ASCII codes. When is this useful > and > > expected? > > > > The FILTER_FLAG_ENCODE_LOW flag "encodes" ASCII <32 codes presumably in= to > > HTML entities, although that's not specified anywhere in the PHP manual= . > > The word HTML does not appear on the > > https://www.php.net/manual/en/filter.filters.flags.php page. What do > these > > characters look like when presented by HTML? When is it ever useful to > use > > this flag? > > > > FILTER_FLAG_ENCODE_AMP & FILTER_FLAG_STRIP_BACKTICK - why is this even = a > > thing? > > > > Due to flags, FILTER_VALIDATE_EMAIL will happily validate email address= es > > that would be otherwise mangled by FILTER_SANITIZE_EMAIL. > > > > These are just the things I found confusing and strange about the > sanitize > > filters. Let's try to put ourselves in the shoes of an average PHP > > developer trying to comprehend these filters. It's quite easy to shoot > > yourself in the foot if you try to use them. The PHP manual doesn't do = a > > good job of explaining them, but that's probably because they are not > easy > > to explain. I can't come up with good examples of when they should be > used. > > > > Regards, > > Kamil > > > --0000000000000e28d105ea075ffe--