Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113666 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 52032 invoked from network); 22 Mar 2021 13:15:20 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Mar 2021 13:15:20 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 2AF321804B7 for ; Mon, 22 Mar 2021 06:10:36 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: *** X-Spam-Status: No, score=3.2 required=5.0 tests=BAYES_20, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_SOFTFAIL autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-lj1-f182.google.com (mail-lj1-f182.google.com [209.85.208.182]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 22 Mar 2021 06:10:35 -0700 (PDT) Received: by mail-lj1-f182.google.com with SMTP id f16so21014239ljm.1 for ; Mon, 22 Mar 2021 06:10:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mZ9cWSQ+ieyXjAWOZ+VyjBP/gvl5FW+iObeeMMBu8Rw=; b=JJ2P7D2WdXb/5+yH/HjFDPDlFmcfk7QopxAwFW2eWXjVuGguuzOpDr/UKv3saj+vyh 8SE7JHO4Z/LxsXVtZJg4Ni9hFzW40hQQxcgv6CAxN4Iz26e4NsmcUevwK1mVMyAPIvrt M7ACpEfsSZUy1e4ud/o8saAhU8KRzbGrI6v92xTsFCZh/qD71It5YugzTYSYeOx3A2qA XytJXSCWL1sa5Yd5ZTG4PmNjh+0jg6/p6g4PrmmET/8mBNOmrjKlcIFL/4Tk1ragO2aJ qpGGSu/DF9VGYLsq+f2SM1tGO48twrO4LaT2jt983Az6uYDXclaS6tPrTlFIBLgwKcxi t2cw== X-Gm-Message-State: AOAM530Ve6aqaRGnWwACnpFD99MHIVWemlPzCpXSIp6Q1IRAjlL25vGE mhC6q++JsD/79xMKEHHVCNHQoiHp3PEP9dnWajY/Mg== X-Google-Smtp-Source: ABdhPJyCFgZ6VGFqjq9qyd3jTIOghj1rnI3KBDMa1vm8NqbNtf99tm1+8foBZ84AOs/pdEu6c7mL7qoOB7vbgIoaOUA= X-Received: by 2002:a2e:6e0c:: with SMTP id j12mr9665824ljc.365.1616418634038; Mon, 22 Mar 2021 06:10:34 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Mon, 22 Mar 2021 08:10:23 -0500 Message-ID: To: Rowan Tommins Cc: PHP Internals Content-Type: multipart/alternative; boundary="0000000000007d8ce005be1fca24" Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode? From: pollita@php.net (Sara Golemon) --0000000000007d8ce005be1fca24 Content-Type: text/plain; charset="UTF-8" On Mon, Mar 22, 2021 at 5:24 AM Rowan Tommins wrote: > I'm strongly against any concept of "indefinite deprecation". I consider > any deprecation notice a commitment to remove the feature in the future, > even if a specific timeline for that removal is not given. > I don't feel strongly about indefinite deprecation. If you wanna nuke it in 9.0, have fun. I'm just saying I don't necessarily see the need to do so. The problem being addressed here is that *some* users of this function are probably misusing it, so it's worth putting guiderails on. I'm hesitant to punish the ones who know exactly what they're doing as a result of that well-meaning intention. > * People who just want to replace calls to utf8_decode won't want to go > through every call and make it exception safe. > Then they shouldn't use these replacements, it's not for them. It's for people using iso-8859-1. > * People who want to write a polyfill couldn't use it, because they > wouldn't be able to recover the remainder of the string after an error > is thrown. > If you're writing a polyfill, then write a polyfill. The polyfill for the old functions is trivial, I could have written it a dozen times in the course of writing this email reply. So this replacement is also not for them. > * People who want transcoding without any optional extensions will be > disappointed to find only this one encoding supported. > This function isn't for them.It's for people using iso-8859-1. There's a theme in here. :) > You'd effectively be adding a completely new core function just for > those people who work with Latin1 text, and are confident that it's not > Windows-1252 in disguise. > Yes. I'm specifically addressing the people who have been using utf8_en/decode() correctly all this time. They shouldn't be punished for the stupidity of others. > It's tempting to make any C1 control characters an error as well - > although technically valid in Latin1, these are very rarely used, and > it's much more likely that any bytes in that range are intended as > characters in Windows-1252. But that would feel very odd without having > a corresponding utf8_from_windows1252 function to use instead, at which > point we're into designing a whole new conversion library. And of > course, once you've got that UTF-8 string, you can't do much with it, > because PHP's native string functions are all byte-based, so you've > basically got to re-invent large chunks of ext/mbstring... > I disagree that you'd need to add utf8_from/to_windows1252 "for completeness". The goal isn't to provide all possible conversion utilities. The goal is only to not punish users by taking away a valid API that they were using correctly (for those users who were using it correctly). > -Sara --0000000000007d8ce005be1fca24--