Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113654 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 93177 invoked from network); 21 Mar 2021 21:44:11 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 21 Mar 2021 21:44:11 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id B35891804F6 for ; Sun, 21 Mar 2021 14:39:17 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-ed1-f46.google.com (mail-ed1-f46.google.com [209.85.208.46]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 21 Mar 2021 14:39:17 -0700 (PDT) Received: by mail-ed1-f46.google.com with SMTP id o19so17016466edc.3 for ; Sun, 21 Mar 2021 14:39:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=z0w1Uuh+w6hM5TErfm8ahmxIBed+VNnsnMHj2UmVrUM=; b=ktidieNi/NQ1kgfgSWZm01K/e2q7tTsYhbe75hJNAzTjzyhhCOpKOZEnOUq7kafhcw 7XnLynmnGT/ke3dqK9U7nZDt2at4ui/otOg0CwGP/HCk8TQ/ZIPLxrAgGSKpVw9FPtEL lmBo1ibuVgG6IHEy/pdunNLAVXa5Kope5qIWHLVG5yg86hksy76uKZpxkx94s4pKHcrW xIRZLkFsd4Rb1zAAh7nKXxZliEa8tMqmcUQHJ1dt4l8t6218W9v0eHwY2rBk3aDmY4OZ ro7bMsitJDZ784VtOfINWT26efvSqJTZkjusflOiBbYbm5lLtrcaCFMc6ZSuPv81T5Tv Db8w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=z0w1Uuh+w6hM5TErfm8ahmxIBed+VNnsnMHj2UmVrUM=; b=iqjdsoqZaFhOWBYczXDqEVYSFGB+2rIlk2szDSfFAvggPWhu+cyna4q2d38Uh7SGeg NdToB7TmsgcabT4VFRbsipphwgxV6TUE+z5rpTAr/70cVntx17hFXHp00jjN8xCstPO8 G4qdAy+ouQib4+a5sc7fJI2TiZou3rxIitJeID39diJq1PId2LIlw+/3CRDhgFuRHqbZ VC7t+wD/DiLhrT/f31KKbUeZSrOoFJ2RJkaYAgRWI1x5U5+TtTUiMacFYAieNtxGXIyZ dGHFBZAh1wnu+ntvjkqIwQZc84NcYS9BphBwsQx1nv4gFzIw51+88A+7tUwL6bfUpcz9 EnzA== X-Gm-Message-State: AOAM533FxiXqQDvKX9ucYN5oyFo4riM5KNMLVDsXg50EZWd2Iib0hH09 nPDzx/pzPXmFgSmDElsTz+cRpldM+ZQ= X-Google-Smtp-Source: ABdhPJwcJfYjbHbKuEnIWtLHMmtJON/K32BVSxFpV+deRMiwYUpijTk+CLuTkGfHvTbghiVQwQSN3w== X-Received: by 2002:a05:6402:b70:: with SMTP id cb16mr22354861edb.11.1616362755078; Sun, 21 Mar 2021 14:39:15 -0700 (PDT) Received: from [192.168.0.22] (cpc104104-brig22-2-0-cust548.3-3.cable.virginm.net. [82.10.58.37]) by smtp.googlemail.com with ESMTPSA id g22sm8067765ejm.69.2021.03.21.14.39.14 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 21 Mar 2021 14:39:14 -0700 (PDT) To: PHP internals References: <3a4d89fc-c5f8-4720-b2e0-f6f3c28684f9@www.fastmail.com> <5f5fd136-e181-d5d3-fe40-1a4cc5c668f2@gmail.com> Message-ID: <25680b8d-af02-c1d4-e630-7bf079881f1c@gmail.com> Date: Sun, 21 Mar 2021 21:39:14 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 7bit Content-Language: en-GB Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode? From: rowan.collins@gmail.com (Rowan Tommins) On 21/03/2021 21:00, Max Semenik wrote: > Just a quick reminder that it's possible to compile PHP without > mbstring and intl, which means that some hosts will provide PHP > without these extensions, and some packagers make them available as > separate packages that users can't or don't know how to install. Maybe > we've got an opportunity to think about making these extensions mandatory? It's somewhat relevant that until PHP 7.2, it was also possible for utf8_encode and utf8_decode to be missing, because they were in ext/xml, which is also optional. Bundling mbstring sounds great, until you look into the details of what's in there and how it works. Its origin as a PHP 4 extension for handling Japanese-specific character encodings is visible in parts of its design - there's a lot of global state, and very little support for the nuances of Unicode. Bundling intl would be great, but it's a wrapper around ICU, which is huge (because Unicode is complicated). I have read that incorporating that into core was one of the icebergs that sunk PHP 6. It's also extremely sparsely documented (if someone's looking for a project, it would be great to fill in all the manual stubs with a few details from the corresponding ICU documentation). For what its worth, it seems these would be the relevant polyfills: function utf8_encode(string $string) { return UConverter::transcode($string, 'UTF8', 'ISO-8859-1'); } function utf8_decode(string $string) { return UConverter::transcode($string, 'ISO-8859-1', 'UTF8'); } Regards, -- Rowan Tommins [IMSoP]