Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:116258 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 70796 invoked from network); 11 Oct 2021 00:47:31 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 11 Oct 2021 00:47:31 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 614571804E1 for ; Sun, 10 Oct 2021 18:33:21 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-3.7 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,HTML_MESSAGE, NICE_REPLY_A,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-pl1-f172.google.com (mail-pl1-f172.google.com [209.85.214.172]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 10 Oct 2021 18:33:20 -0700 (PDT) Received: by mail-pl1-f172.google.com with SMTP id t11so10146160plq.11 for ; Sun, 10 Oct 2021 18:33:20 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=wikimedia.org; s=google; h=subject:to:cc:references:from:message-id:date:user-agent :mime-version:in-reply-to:content-language; bh=2xGVWoEbXKWipwZZjH31D4Ue42rcbiwQER0bDQKFZDM=; b=rOim6mSa8KvdGK/T9Js6NHvmPiNaZuYJS+UmD6+L3ZMUdUwNOnPp3KY1dC/H6g1OT+ LXD0lvWBOrUbbYnzobEibjBLjITtmMTb7dU7JVZWR4wONoMj3m2YplmZdZ1F2clDp86p M2OFlbqG/+QfquYlEjpsGhcDstRRBfwS9dwQI= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:subject:to:cc:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-language; bh=2xGVWoEbXKWipwZZjH31D4Ue42rcbiwQER0bDQKFZDM=; b=aGXLPn1LLg6+6NEaPQH9TdVtf7no98T8/CHTwwtshsoiK8IeA194Yug3k/UxvCu+f0 oPEK8W63+G+dBQfV+E0mv+RIp169qsimwmtl4HuV0aMFzxGblJlNaVfFQaR6rEnlVrhq wsKKqW1XTchaX9m0AwCduXMIMJBGyMgzqkJWsmsgkotBVqUUVCbo6O8flHFe6Eq3V+Ne 2+Ov2I3rE2N5hsT6dXPycTW7cPhDQPoAsc1u4dF+igKfWfd12tjSqAetoNCRJpM+D+IC vJda2xvh+kbbNPD+jYOYM/MuAzUjYIFfEEon6j+kQ8Cfex7MwP75unIugdyf7DevkItC YjCw== X-Gm-Message-State: AOAM531mwuF8Ga/MdMnG4mJD53peVeXH7YQ3mhmKSDpx87zRHWYDHmsD Si45xQgBZmeS6+w3nDsTcowMLYIhoINVjbPB X-Google-Smtp-Source: ABdhPJwm/G8vggj++Bzqo9TQkdmerfQQwJ6zCQQYWAxgh7RFZDpBVjUj78FRSiw/iB4S1C//NqUcLw== X-Received: by 2002:a17:902:be0c:b0:13e:2b53:d3 with SMTP id r12-20020a170902be0c00b0013e2b5300d3mr22096475pls.86.1633915996654; Sun, 10 Oct 2021 18:33:16 -0700 (PDT) Received: from [10.1.1.45] (124-168-132-124.dyn.iinet.net.au. [124.168.132.124]) by smtp.gmail.com with ESMTPSA id y18sm5589802pff.184.2021.10.10.18.33.14 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 10 Oct 2021 18:33:16 -0700 (PDT) To: Nikita Popov Cc: PHP internals References: Message-ID: <88b5171e-48b3-0176-47de-ee1499832b57@wikimedia.org> Date: Mon, 11 Oct 2021 12:33:12 +1100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.13.0 MIME-Version: 1.0 In-Reply-To: Content-Type: multipart/alternative; boundary="------------15B7692380993887C522413E" Content-Language: en-US Subject: Re: [PHP-DEV] [RFC] Locale-independent case conversion From: tstarling@wikimedia.org (Tim Starling) --------------15B7692380993887C522413E Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 7bit On 4/10/21 9:08 pm, Nikita Popov wrote: > > Hi Tim, > > Thanks for creating this proposal, it looks great! > > I think this is a very beneficial change, and the amount of > incorrect locale-dependent calls we had just in php-src further > convinced me of this: We're generally aware of the problem, and we > still made this mistake. Many times. > > The only open question I have is regarding the ctype_* functions. > One might argue that these functions should be locale-independent as > well. Certainly, whenever I have used ctype_digit() I only intended > it to match [0-9]. It seems like some people try to use > ctype_alpha() in a locale-sensitive way > (https://stackoverflow.com/questions/19929965/php-setlocale-not-working-for-ctype-alpha-check > ) > and then fail because it doesn't support UTF-8. > OK, I removed ctype_tolower() and ctype_toupper() from the RFC and the PR since they would be incompatible with a move towards a locale-independent ctype extension. The non-controversial parts of the PR were split and merged, so I rebased the PR and updated the RFC accordingly. Do you think the RFC is ready for voting now? > PS: Regarding escapeshellarg(), are you aware of the array command > support for proc_open() that was added in PHP 7.4? That does away > the need to escape arguments. It doesn't really help us. I recently wrote a new shell command execution system for MediaWiki called Shellbox. As part of that project, I reviewed how shell execution is used in the MediaWiki ecosystem. There are a lot of callers which are using shell features, for example redirecting inputs or outputs, or constructing pipelines. I didn't really want to break them all or reimplement those features without the shell. And we have security and containerization wrappers which depend on construction of a shell command string. So we need to be able to construct shell command strings safely. After studying locale sensitivity for this RFC, I decided to get rid of escapeshellarg() from MediaWiki. Instead we are doing our own shell escaping: https://gerrit.wikimedia.org/r/c/mediawiki/libs/Shellbox/+/722548 I also made MediaWiki use a fixed locale, instead of being configurable. -- Tim Starling --------------15B7692380993887C522413E--