Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:118726 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 2253 invoked from network); 2 Oct 2022 07:14:13 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 2 Oct 2022 07:14:13 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 9380B1804BA for ; Sun, 2 Oct 2022 00:14:12 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS, T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-qk1-f179.google.com (mail-qk1-f179.google.com [209.85.222.179]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 2 Oct 2022 00:14:12 -0700 (PDT) Received: by mail-qk1-f179.google.com with SMTP id c19so5126943qkm.7 for ; Sun, 02 Oct 2022 00:14:12 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date; bh=ejSI+bSCgZlS6+MI1ESGcKg5vmTfcYq/D2eSpZdOCBY=; b=pastAloHeMNg1/BNKFIBNtIbcbzjCsdeIr73lDeCk09rCeQUwVQHgSr2U8t9a1kwH2 YrqhsIM1k+OHk3q8z6hUzcH4q/TzgtTGDU2Wv9fMMuKgF3Lf2gHTmw4egHpFgGqzE7RP ZYKAOvgbHViEfEQD3NswWfGMFsYhereydv3tyb4Xc2pOxFXy2W91e23qCZs8C/mIFsaQ 3GiDQB7V2EyQCpg+CpsXXfwR5eiyRUbL1yQc3w0kxwBqKWkK2NU8PfKjh76vE/2LkBw7 NhIaQqj8fZmswyLQ6dcMtfX2QvLp1YVgepGW2QNvCdeTYDlPpWGibPSOpSBvaMFTXjhi cJvA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date; bh=ejSI+bSCgZlS6+MI1ESGcKg5vmTfcYq/D2eSpZdOCBY=; b=DX1rcRREBUTQ+h+sWiPB2rejs3BjSmG5gcqUA1UQ8Qt0+PURGb2BDQPAIHjNNAvKM+ dfYPqIjIHzpoSkCQfqlmDWEYem7c+y0zBgNigKuOtNh/OAu0e4gSenyMvTYfki202uba WS8RF1ecnF1QEFlO2ZaRcCc8kj1FftpWHVb4h0hyGqGG+J48hbN75y+uIVD5oty/Z6RH caVLWP9wXqBEAOZrCnl3/oYR/1ukp8VkzsHLylrboFY4zaf4sG6HheBL3OC4lravCzEW AITatF4lDZCJEVzf1a5f7eoncIymOd/gDvqKXIznFnNbV6r4ryyO1PDxQVn+PVLkQbob gmCQ== X-Gm-Message-State: ACrzQf2kadTzIbRjcL+9zkhvkLILUCJVDcn8lzmSpzxalyy6OiN8/g26 tphMPRhBlmwOQHntJNBO+r+8GleEIP3X7V86NBI= X-Google-Smtp-Source: AMsMyM7UNPq4iZwokGmzYFg1Z1t8IqzscIWtWiYuuek0/r3Z05OCPG+v0e1ZwK/+SbOqQv+9tGy3gmAzQx9AOt/JERo= X-Received: by 2002:a05:620a:2683:b0:6cf:3a7e:e006 with SMTP id c3-20020a05620a268300b006cf3a7ee006mr10667575qkp.474.1664694851039; Sun, 02 Oct 2022 00:14:11 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Sun, 2 Oct 2022 10:14:00 +0300 Message-ID: To: Vasilii Shpilchin Cc: Kamil Tekiela , PHP internals Content-Type: multipart/alternative; boundary="0000000000004175b705ea07f9ce" Subject: Re: [PHP-DEV] Sanitize filters From: lokrain@gmail.com (Lokrain) --0000000000004175b705ea07f9ce Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hello Vasilii, It=E2=80=99s okay to have different opinion I hope. You are missing an important point here - beside my comments, the current way this is developed brings confusion. It would be great if you share your experience on this matter. Regards, Dimitar On Sun, 2 Oct 2022 at 9:31, Vasilii Shpilchin wrote: > All right if you are writing on PHP for 25 years, you noticed the PHP was > always about high-order web-focused functionality out-of-box. This is one > of basic benefits of PHP to other general-purpose languages where you can > write everything you want and you also have to write it since the languag= e > itself is very basic. I'm for PHP to keep built-in solutions for most > common problems in the context of the web. Having passe ZCE exam and > writing just 15 years on php. > > On Sun, Oct 2, 2022, 2:19 AM Lokrain wrote: > >> Hello Kamil, >> >> I believe that PHP should not try to act as a =E2=80=9Cframework=E2=80= =9D that provides >> you >> with ready solutions for such cases. >> >> Being able to actually modify the default behaviour of some functions >> through the ini .. is even scarier. >> >> For 25 year writing in PHP I never relied on this =E2=80=9Cmagic=E2=80= =9D for security:) >> >> Regards, >> Dimitar >> >> On Sat, 1 Oct 2022 at 18:39, Kamil Tekiela wrote: >> >> > Hi Internals, >> > >> > For quite some time now, PHP's sanitize filters have "Rustled My >> Jimmies". >> > These filters bother me because I can't really justify their existence= . >> I >> > can understand that a few of them are sensible and may come in handy, >> but I >> > would like to talk about some of these in particular. >> > >> > In PHP 8.1, we have deprecated FILTER_SANITIZE_STRING which I deemed t= o >> be >> > a priority due to its confusing name and behaviour. The rest is slight= ly >> > less dangerous, but as was pointed out to me in a recent conversation >> with >> > a PHP developer, these filters are all very confusing. >> > >> > I would like to have some opinions on the following filters. What do y= ou >> > think we should do with them? Deprecate? Fix? Provide better >> documentation? >> > >> > --- >> > >> > *FILTER_SANITIZE_ENCODED *- "URL-encode string, optionally strip or >> encode >> > special characters." >> > Now, what does that mean? PHP has two functions for URL encoding: >> urlencode >> > used for encoding query-string parts, and rawurlencode used for encodi= ng >> > any other URL part (two different RFCs are followed by these functions= ). >> > Which of these RFCs is applied in this filter? Furthermore, the >> description >> > says that "special characters" can be stripped or encoded. Is one of >> these >> > actions the default and the other can be selected by a flag or are bot= h >> > optional? What are these special characters? Are they special in the >> > context of URL? If so, why did we encode them first? If these are HTML >> > special characters (there's no single definition of special HTML chars= ), >> > then why does this filter encode them if the filter is for URL >> > sanitization? What does backtick have to do with any of this >> > (FILTER_FLAG_STRIP_BACKTICK)? >> > >> > *FILTER_SANITIZE_ADD_SLASHES - "*Apply addslashes(). (Available as of >> PHP >> > 7.3.0)" >> > This filter was added as a replacement for magic_quotes filter. >> According >> > to PHP documentation, addslashes is supposed to be used when injecting >> PHP >> > variables into eval'd string. Real-life showed that this function is >> used >> > in a lot of places that have nothing to do with PHP's eval. I am not >> sure >> > if the sanitize filter is misused in a similar fashion, but judging fr= om >> > the fact that it was meant as a replacement for magic_quotes, my guess >> is >> > that it's very likely still abused. >> > >> > *FILTER_SANITIZE_EMAIL *- "Remove all characters except letters, digit= s >> and >> > !#$%&'*+-=3D?^_`{|}~@.[]." >> > Which RFC does this adhere to? It strips slashes and quoted parts, >> doesn't >> > allow IPv6 addresses and doesn't accept RFC 6530 email addresses. This >> > filter is ok for simple usage, but it isn't true to any known >> specification >> > AFAIK. >> > >> > *FILTER_SANITIZE_SPECIAL_CHARS *- "HTML-encode '"<>& and characters wi= th >> > ASCII value less than 32, optionally strip or encode other special >> > characters." >> > What's the intended purpose of this filter? "Special characters" are >> still >> > not clearly defined, but at least it's more clear than >> > the FILTER_SANITIZE_ENCODED description. Same question about backticks >> > though: why? Why encode ASCII <32 chars? >> > >> > *FILTER_SANITIZE_FULL_SPECIAL_CHARS *- "Equivalent to calling >> > htmlspecialchars() with ENT_QUOTES set. Encoding quotes can be disable= d >> by >> > setting FILTER_FLAG_NO_ENCODE_QUOTES. Like htmlspecialchars(), this >> filter >> > is aware of the default_charset and if a sequence of bytes is detected >> that >> > makes up an invalid character in the current character set then the >> entire >> > string is rejected resulting in a 0-length string. When using this >> filter >> > as a default filter, see the warning below about setting the default >> flags >> > to 0." >> > Not to be mistaken with FILTER_SANITIZE_SPECIAL_CHARS. As long as it's >> not >> > used with filter_input(), it's the least problematic. We >> > have htmlspecialchars() though, so how useful is this filter? >> > >> > *FILTER_UNSAFE_RAW *- What makes it unsafe? Why isn't this just >> > called FILTER_RAW_STRING? If the value being filtered is something oth= er >> > than a string, what will this filter return? Integers, floats, boolean= s >> and >> > nulls are converted to a string, Arrays and objects make the filter >> fail. >> > >> > --- >> > >> > Let's quickly mention the filter flags. >> > >> > The FILTER_FLAG_STRIP_LOW flag will also remove tabs, carriage returns >> and >> > newlines as these are all less than 32 ASCII codes. When is this usefu= l >> and >> > expected? >> > >> > The FILTER_FLAG_ENCODE_LOW flag "encodes" ASCII <32 codes presumably >> into >> > HTML entities, although that's not specified anywhere in the PHP manua= l. >> > The word HTML does not appear on the >> > https://www.php.net/manual/en/filter.filters.flags.php page. What do >> these >> > characters look like when presented by HTML? When is it ever useful to >> use >> > this flag? >> > >> > FILTER_FLAG_ENCODE_AMP & FILTER_FLAG_STRIP_BACKTICK - why is this even= a >> > thing? >> > >> > Due to flags, FILTER_VALIDATE_EMAIL will happily validate email >> addresses >> > that would be otherwise mangled by FILTER_SANITIZE_EMAIL. >> > >> > These are just the things I found confusing and strange about the >> sanitize >> > filters. Let's try to put ourselves in the shoes of an average PHP >> > developer trying to comprehend these filters. It's quite easy to shoot >> > yourself in the foot if you try to use them. The PHP manual doesn't do= a >> > good job of explaining them, but that's probably because they are not >> easy >> > to explain. I can't come up with good examples of when they should be >> used. >> > >> > Regards, >> > Kamil >> > >> > --0000000000004175b705ea07f9ce--