Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:116713 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 11008 invoked from network); 22 Dec 2021 07:57:07 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Dec 2021 07:57:07 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 6AF4518054E for ; Wed, 22 Dec 2021 01:01:01 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.2 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-wr1-f43.google.com (mail-wr1-f43.google.com [209.85.221.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Wed, 22 Dec 2021 01:01:01 -0800 (PST) Received: by mail-wr1-f43.google.com with SMTP id v7so3381395wrv.12 for ; Wed, 22 Dec 2021 01:01:00 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:date:mime-version:user-agent:subject:content-language:to :references:from:in-reply-to:content-transfer-encoding; bh=Hp4w/2pdDInJ+nwgSl6gossgpcwW+Dv3Nog6J4wWIGk=; b=W5jH5UIFi3wWtCL7N17z0I1kf2vHK9Wt5tQHDD/NLsRlDr1d8yMNGpY0eDHN3mP7/P c0hoH2fpa+qWC5y8B6J38SfaGHczfXhowCfe6Wg2n6eQd/LEjLIyiHnS3S3p3chWq8Vj RKPx2yFhNJcJt0q0yL6THsllqR0ITj/6KgenhvBV5cUgwNqou/8mr1Y+LnnxLsKZTHtT LBbpt3ApAJ9ThzuGZMkogb6S7Pmx7Ca2ewFkESZ19+Qkk9KEF6lmnoQ6TPDDFONDwG7B KbhPXGQQQCl124cTowB7J1uHZTcWcNSt0GY2ACRZ8rrPDQFPT0o+nhIG+ylOaXaJmIUk 4DtA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:date:mime-version:user-agent:subject :content-language:to:references:from:in-reply-to :content-transfer-encoding; bh=Hp4w/2pdDInJ+nwgSl6gossgpcwW+Dv3Nog6J4wWIGk=; b=vIHkklTQ8cD3Be7xIRcAftejwzLp6qAedgK6dhnf2rkyZnm/TAwmmlLUYxHVrCyFcd A/bEyDd40laq7zupjxmfm8AgLbzaZMi9q/AiME7ChyBUeBLZBINmsGtYlmiuOFCvaS// KCfZmrpZFcT6KzTtH/PTiKXQ+g04a0V2ramY+W/xlwQYE0iKnPxrO0rrd+kw4bufoZa/ N9wFlBbfsvPj8kQy/Mx1tZ3m60WpQLpIcydAszaHd9Ty5AgQSzyf2K4QY8YFjd2gZ+i+ 8dKuwSmU4x3VZ/Ibb+1bPC4nkcE7YEIRHUMMJllvgsuCGIMoiN06DMD4BnSQvsDgjSH3 AdjQ== X-Gm-Message-State: AOAM531hEtZS+klp+g0+FpugeM60rvTRTojx7IbdDZS+uGWUfcN7el9/ h7pEISR3JeH/kFHq0S8PfWFizDwlcnI= X-Google-Smtp-Source: ABdhPJwS1W4rxUMZ8jfuGRWPFMrKqFkWoJJ8ACORDfGfo22XTx2lpelHx8dt9ka6WbyfnsEvIVtXpA== X-Received: by 2002:a5d:5849:: with SMTP id i9mr1383772wrf.148.1640163659764; Wed, 22 Dec 2021 01:00:59 -0800 (PST) Received: from [192.168.0.22] (cpc104104-brig22-2-0-cust548.3-3.cable.virginm.net. [82.10.58.37]) by smtp.googlemail.com with ESMTPSA id m17sm4795925wms.25.2021.12.22.01.00.59 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 22 Dec 2021 01:00:59 -0800 (PST) Message-ID: <9e93269e-986f-ffa4-7433-cf2c548a133f@gmail.com> Date: Wed, 22 Dec 2021 09:00:55 +0000 MIME-Version: 1.0 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101 Thunderbird/91.4.1 Content-Language: en-GB To: internals@lists.php.net References: <3a4d89fc-c5f8-4720-b2e0-f6f3c28684f9@www.fastmail.com> In-Reply-To: Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode? From: rowan.collins@gmail.com (Rowan Tommins) On 21/12/2021 23:20, Wade Rossmann wrote: > I would suggest adding optional source/destination encoding parameters to > the functions, eg: > > utf8_encode(string $string, string $source_encoding = "ISO-8859-1") > utf8_decode(string $string, string $destination_encoding = "ISO-8859-1") That's an interesting idea, and definitely worth considering. In the much longer term, we could make the parameter mandatory rather than deprecating the entire function. As you say, the challenge is how to implement the other encodings / what to do if ext/mbstring is not installed. It would be very tempting to support Windows-1252 directly, because it's just a few characters on top of the existing mappings, and is so commonly mistaken for ISO-8859-1. Anything else could then perhaps give a run-time error if ext/mbstring wasn't found. On 22/12/2021 00:31, Kris Craig wrote: > Now might be a good time to make this into an RFC. :) I have a draft kicking around with a lot of analysis of current usage. I will try to pick it back up after Christmas. Regards, -- Rowan Tommins [IMSoP]