Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:117082 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 43376 invoked from network); 20 Feb 2022 21:52:48 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 20 Feb 2022 21:52:48 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 7CFDA1804B3 for ; Sun, 20 Feb 2022 15:11:52 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f42.google.com (mail-wr1-f42.google.com [209.85.221.42]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 20 Feb 2022 15:11:52 -0800 (PST) Received: by mail-wr1-f42.google.com with SMTP id x5so19198517wrg.13 for ; Sun, 20 Feb 2022 15:11:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :references:from:in-reply-to:content-transfer-encoding; bh=R12RSypfJycZATAGJL4J2HMC0CFUZYF531LgPP+TveQ=; b=Ps59I1HWkKiyCkFyLh7lIoxuu1fKTbBdF/v7k5SzCjMjij7HH7A3jM062No1N09pA2 DgWrferiG9gDiLQEvRmu6QSKptv+HtiZjXM1akhxMVftTLuOb5TaR7GwCYymMY8EKRl6 QyMxxoOeyh+pWz9c+aIKaZ60hqDAk2n/4+hCUPceRIv1bukkbvhF+duz2QAYpuQIjOmU /U4NbY8GL8owh6M+blijs8+g2q2Jc09VpQJkVn/l5tA7nYl65vM7CMNovpLcAUXjiEdC r+43cj6+N//K/G+Bh6pYYPN5QRg87x9qoMe9dre7htgz/05aCLTE6yKUrQK0uCaMLUmn ji+g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=R12RSypfJycZATAGJL4J2HMC0CFUZYF531LgPP+TveQ=; b=Mp9k8HgbcChZcN8TVbZu9a3erL6o9wR57oztIFGMoJl9DQsTxuILlnF6Ip/tLgYzBy AE2ZY0pu46mnA6HVkBSqiPUCAKOSFZDBoQNLmOLuWjx14m4pFppuUUna3mAMjASieTjX 2Hkzt+kU2Iu68DKNbwp0tbbsBsxiSn51gSfqjfTBomCb61SKNqJmYPj4rWJRiJDOxOBC ZP9j2a2/OtZNKRUxQDk8IbEHthFIBWrGkj/+bmmEDHcu3ELuJhX4irHdUq6vmhRKvZYY PYOpbJQZmqy7DLGTKFmJa/MrNepT2UZGCmWREpPHxJb2tqYsItpmJ6KKv5kPrPOGcfw4 4VIQ== X-Gm-Message-State: AOAM5313H4M7GkMwXORGDlTZcgZe5gUP8BouRA4iEjjP8WzBDl4bHpBv fL1kEsdRvByoRSOrh0V1AGB1kS125hVVHg== X-Google-Smtp-Source: ABdhPJzNXWD0+28ye9qc9ogDrX7r+pj4lJLiMX96i/MroNBVhClPNIRhwFvGcg/uAjGgjaYh0c5+AA== X-Received: by 2002:adf:cc92:0:b0:1e5:c7da:2c72 with SMTP id p18-20020adfcc92000000b001e5c7da2c72mr14304851wrj.573.1645398710719; Sun, 20 Feb 2022 15:11:50 -0800 (PST) Received: from [192.168.0.22] (cpc104104-brig22-2-0-cust548.3-3.cable.virginm.net. [82.10.58.37]) by smtp.googlemail.com with ESMTPSA id j30sm6092374wms.2.2022.02.20.15.11.50 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 20 Feb 2022 15:11:50 -0800 (PST) Message-ID: <93e83a99-8f03-b823-1b4b-a10519d41dd7@gmail.com> Date: Sun, 20 Feb 2022 23:11:49 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.6.1 Content-Language: en-GB To: PHP Internals References: <22242169-a16d-5261-696c-3cf00b00336a@gmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Subject: Re: [PHP-DEV] [RFC] Deprecate and Remove utf8_encode and utf8_decode From: rowan.collins@gmail.com (Rowan Tommins) On 20/02/2022 21:24, Craig Francis wrote: > > Only query I have is about the availability of different functions... > not sure why, but the documentation says these are provided by the > "xml" extension, even though it looks like they are in > `./standard/string.c` (your pull request seems to correct this)... so > I assume projects have used these functions on the basis that they are > always available... I suppose you could argue that "iconv" is enabled > by default, so that's hopefully reliable (even though it can be > disabled with `--without-iconv`)... whereas "mbstring" and "intl" are > non-default extensions. Yes, since 7.2, utf8_encode and utf8_decode have been always available; before that, they were in ext/xml (which in practice meant *nearly* always available). The fact that none of the alternatives are guaranteed to be available is unfortunate, but by their nature they are large amounts of code, so moving or replicating them in core is not really an option. I don't have hard facts to back it up, but my impression is that ext/mbstring is quite commonly installed, and required by apps and libraries. Unlike the other two, it has no system dependencies, because the implementation is entirely in PHP's source tree. I'm not sure how often iconv is enabled (default according to php.net doesn't necessarily mean default according to Ubuntu / Centos / cheap shared hosting), but its functionality isn't very portable between systems - for instance, 3v4l.org rejects 'ISO-8859-1' as an encoding [https://3v4l.org/biGa8], but my local system accepts it, although both report ICONV_IMPL as "glibc". ext/intl is by far the most powerful of the three extensions, albeit extremely poorly documented; but it may not be installed as often, because that power comes from a large external library (ICU). The bright side is that if you really do only need one encoding pair, implementing in pure PHP is pretty trivial, and there are multiple polyfills already out there. That leaves a minority of a minority of a minority, who a) actually need Latin1 <-> UTF-8, and no other encodings; b) can't rely on any of the three listed extensions; AND c) care enough about performance that a pure PHP implementation is problematic. Regards, -- Rowan Tommins [IMSoP]