Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113656 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 23702 invoked from network); 22 Mar 2021 08:33:51 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Mar 2021 08:33:51 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 21A301804D8 for ; Mon, 22 Mar 2021 01:29:04 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.8 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-pf1-f169.google.com (mail-pf1-f169.google.com [209.85.210.169]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 22 Mar 2021 01:29:03 -0700 (PDT) Received: by mail-pf1-f169.google.com with SMTP id l123so10448708pfl.8 for ; Mon, 22 Mar 2021 01:29:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=1G++DBJZpE+5p8vM/8rs1OBoX1QVS+7gB3gnnXmLNC4=; b=Fpqktv+efYzLc5+MMlSmzuC87sqt4DQW1BdeTZ3KWD5QjDrKz36srIXSJ6Z1yEaEoj NNxPr7n/mbvXmrLXWGfg3Mr/cRpWG22FRK1kLDI1Hr/GqOR8ILI8mLNXah04IJ12bCMc 00M90BsGdFoookXGOt+llA2qgu26kYnG/FDL4QKd8OzMyNJfPzFPHa9hG8fDWQFzvBvn 7zaOs+vWmKRrbUxTCX/GR4RfEd+bFt/Vbs6ndifhpFpuTbUvbBNNsOQRGmbzwzHKUuKt FdSJMa3iM3PNw9voBZ1r35Bhvrl2FrjSsSlSgZQMfnyRFmKNXsif3q68BRRbs+FsZ94f tp+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=1G++DBJZpE+5p8vM/8rs1OBoX1QVS+7gB3gnnXmLNC4=; b=P97Df9TqF5v8O+Uf8s+DU6XGyCG4NJfyd8HGrBpE7+7hXLbNfmdJ0VUNhyIx0kvigr Vm+04QdOqvFu+zAiXDnFZe2gZa4rMan7CoYsirJmY1iJIUjip1hm+njNZK5W5KOhGYzi c8AUyLBruLAvMkJlUkHxOM3eDroHszBuoHw5cSh8ZtySdYDBQ9/Lp+JGH5Meylvnje5+ B/01cjvqdhGC24fODIPSWZwfutLGTLReoe6X7nrNI622N/C9RQmUpaXTza6EqVo8Evkw LzQ71dSXU9oo7NPXYBOKM2KD6xysW6dU6YGSZVEO6F9WTURWv+ErZ9TNEH59kxtCgCLM EgDQ== X-Gm-Message-State: AOAM530KTkVrTxPV9IVN6QCXY9XFa371RD3xtJglIhkACNTi22xS+3+X aOsFQDjkeXIdmDwWFW0oX/fv8rMk4Uhz+73mFUZR/KK5Q6mFEg== X-Google-Smtp-Source: ABdhPJxZVZsA1G7zP6aSuAv72msCx1TbLGDQnba9VKvBRMwC/AwxBaijJ7jOM0BLOjOumnXxL79yuh0lKsYUC1+W7P0= X-Received: by 2002:a65:67cd:: with SMTP id b13mr8804358pgs.250.1616401741982; Mon, 22 Mar 2021 01:29:01 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Mon, 22 Mar 2021 09:28:51 +0100 Message-ID: To: PHP internals Content-Type: multipart/alternative; boundary="000000000000a52df405be1bdb6e" Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode? From: divinity76@gmail.com (Hans Henrik Bergan) --000000000000a52df405be1bdb6e Content-Type: text/plain; charset="UTF-8" i would prefer to soft-deprecate them like we did with the mysql_ api, where they do not generate E_DEPRECATED for quite some time, but the documentation say "this function is deprecated, instead use mb_convert_encoding ( $str , "UTF-8", "ISO-8859-1" ); or iconv("ISO-8859-1","UTF-8", $str)" and.. make it go E_DEPRECATED in the distant future.. Rowan said "they are commonly used, both correctly and incorrectly", in my experience, no it's not used correctly, people who are using it, are using it incorrectly to convert Windows-1252 to utf-8, not ISO-8859-1... On Mon, 22 Mar 2021 at 02:15, Sara Golemon wrote: > On Sun, Mar 21, 2021 at 9:18 AM Rowan Tommins > wrote: > > > A) Raise a deprecation notice in 8.1, and remove in 9.0. Do not provide > > a specific replacement, but recommend people look at iconv() or > > mb_convert_encoding(). There is precedent for this, such as > > convert_cyr_string(), but it may frustrate those who are using the > > functions correctly. > > > > B) Introduce new names, such as utf8_to_iso_8859_1 and > > iso_8859_1_to_utf8; immediately make those the primary names in the > > manual, with utf8_encode / utf8_decode as aliases. Raise deprecation > > notices for the old names, either immediately or in some future release. > > This gives a smoother upgrade path, but commits us to having these > > functions as outliers in our standard library. > > > > C) Leave them alone forever. Treat it as the user's fault if they mess > > things up by misunderstanding them. > > > > > My preference is for a deprecation notice (but not necessarily removal ever > -- We can argue that part a little). > > As for what users should use instead, obviously there are multiple options > already in core (which you referenced), but those all have third party deps > and can't be guaranteed the way utf8_en/decode() can (this was the point of > moving them from xml). > > While I'm normally in favor of userspace things belonging in userspace > (this particular conversion is trivial since it's a 1:1 mapping), I'm > actually willing to see this added under a new, clearer name in > ext/standard since this is something that's in long use, but used > incorrectly. > > As for details, I don't love iso_8859_1_to_utf8(), but we can use the > common alias for iso-8859-1 known as latin1 and call the new functions: > utf8_from_latin1() and utf8_to_latin1() with the caveat that the later will > throw a ValueError for codepoints which are out of range (one of the more > problematic issues with utf8_decode()). That makes this not just a simple > rename for clarity, but what I'd consider a bug-fix for an unfortunately > unfixable function. > > -Sara > --000000000000a52df405be1bdb6e--