Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113655 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 3510 invoked from network); 22 Mar 2021 01:20:25 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Mar 2021 01:20:25 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 78F471804B7 for ; Sun, 21 Mar 2021 18:15:33 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: ** X-Spam-Status: No, score=2.8 required=5.0 tests=BAYES_05, HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,RCVD_IN_DNSWL_NONE, RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_SOFTFAIL autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-lf1-f45.google.com (mail-lf1-f45.google.com [209.85.167.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 21 Mar 2021 18:15:32 -0700 (PDT) Received: by mail-lf1-f45.google.com with SMTP id b83so18674724lfd.11 for ; Sun, 21 Mar 2021 18:15:32 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=WdmxxSrwzXDeVHys9gJSfkL8VR+S6Ap+ItCRomQg2/k=; b=oTfvFy06i/E7MbR4ix56CwdxpAMzbzqYKWEP8iqmPXSSWOncLBkg5c0WHpBjTBm+i2 W7wQCixZDUJ1W/Yk8h87ORZj9wECEhRIIyG3AdhEq11ZVnhcgzmnE/EyB6lYKwAXQGSW 62dOpsw+LCMS177iJFCTLBO4MdpVEK0zcOFxQ/Ze6ZHvrTkRRBxT7VNjmZ4YuCtMzD2R Tp2l7akOFx/kDkApx91cA7z+Ppcm0Z3AH/VIc48emTNqGguCZI+yEUcPYrmFznJ9nz7W eVI+GYddXGt9ISjJBI47Di9RQfM9WUe7rck315Mt8kJvdBT2V7Q5DlPDzytOm4USXHZy Wa6Q== X-Gm-Message-State: AOAM532c8TeGZRUG8npOJXNwfGPtIMd0XQbL9R9U++slCqzNF+19EX7s fCo7ZhMV4lWy7WVDXBCcB2hX4fQI0LvQs3dCKlRMCw== X-Google-Smtp-Source: ABdhPJxsCfpJf1ynVFb+/9xaWrQwMJUykXS2nmrZ/gRVGX5PWjWFEE3EgE+cjCQ+6WDy8+QRFhATcJTulqq5zhzIuRk= X-Received: by 2002:a05:6512:3d04:: with SMTP id d4mr227039lfv.102.1616375729783; Sun, 21 Mar 2021 18:15:29 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: Date: Sun, 21 Mar 2021 20:15:18 -0500 Message-ID: To: Rowan Tommins Cc: PHP Internals Content-Type: multipart/alternative; boundary="00000000000032962405be15cd6c" Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode? From: pollita@php.net (Sara Golemon) --00000000000032962405be15cd6c Content-Type: text/plain; charset="UTF-8" On Sun, Mar 21, 2021 at 9:18 AM Rowan Tommins wrote: > A) Raise a deprecation notice in 8.1, and remove in 9.0. Do not provide > a specific replacement, but recommend people look at iconv() or > mb_convert_encoding(). There is precedent for this, such as > convert_cyr_string(), but it may frustrate those who are using the > functions correctly. > > B) Introduce new names, such as utf8_to_iso_8859_1 and > iso_8859_1_to_utf8; immediately make those the primary names in the > manual, with utf8_encode / utf8_decode as aliases. Raise deprecation > notices for the old names, either immediately or in some future release. > This gives a smoother upgrade path, but commits us to having these > functions as outliers in our standard library. > > C) Leave them alone forever. Treat it as the user's fault if they mess > things up by misunderstanding them. > > My preference is for a deprecation notice (but not necessarily removal ever -- We can argue that part a little). As for what users should use instead, obviously there are multiple options already in core (which you referenced), but those all have third party deps and can't be guaranteed the way utf8_en/decode() can (this was the point of moving them from xml). While I'm normally in favor of userspace things belonging in userspace (this particular conversion is trivial since it's a 1:1 mapping), I'm actually willing to see this added under a new, clearer name in ext/standard since this is something that's in long use, but used incorrectly. As for details, I don't love iso_8859_1_to_utf8(), but we can use the common alias for iso-8859-1 known as latin1 and call the new functions: utf8_from_latin1() and utf8_to_latin1() with the caveat that the later will throw a ValueError for codepoints which are out of range (one of the more problematic issues with utf8_decode()). That makes this not just a simple rename for clarity, but what I'd consider a bug-fix for an unfortunately unfixable function. -Sara --00000000000032962405be15cd6c--