Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113652 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 79016 invoked from network); 21 Mar 2021 19:13:15 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 21 Mar 2021 19:13:15 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 7E5BE1804C0 for ; Sun, 21 Mar 2021 12:08:18 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.1 required=5.0 tests=BAYES_20,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_ENVFROM_END_DIGIT, FREEMAIL_FROM,HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-ej1-f43.google.com (mail-ej1-f43.google.com [209.85.218.43]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 21 Mar 2021 12:08:17 -0700 (PDT) Received: by mail-ej1-f43.google.com with SMTP id u5so17797400ejn.8 for ; Sun, 21 Mar 2021 12:08:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=mq7+5WPLzmz860xHG44iCq19kJiE1QTYcSYRDYUhh5M=; b=nmJVlvb/UcEXY2+b5ppvLt3u3/ygFyHe2KuF3RkcSrGpGPtlNwpSOCC6c1/deX2Npx NrnD9N9PVgU6SXy5u7IhQCmzNq5xn+8PUpQWvpOeaOcKVT0OZrLzhpoeK0Y++dyyFNag spOCTTsliVKtH/wBRJMs+4k5Oqkc+r52kj8nJju2/vAZjIXafyPIy4+sK3LrKTcksufA 3F+++yUpQfQbSQvkHcZl0/+aXF3906wezWGtED3ax4i2VyyRsM9bBELeRoxecSieqiyk 4v4pV7aBk/RsQQHpq93G88EaR0y2nKIB7dfYaZJVA6oc61kyib1+KdcyNPy6IcBWWMN2 mlxw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=mq7+5WPLzmz860xHG44iCq19kJiE1QTYcSYRDYUhh5M=; b=uVQkEkj/b/hECcuDxKRIuWr2M2q066diPzle0D/XEdy9NNmbDwjXXIfcHpYDiwHL4W BMzobuXxvchTsyvzrmKEQ6Dcx0p6eYLRC+JTEI/gKRu1OrDSzRxKY9wyoajz0AW+NaC/ M3EELcxmMD8YQKTk5qm9SoVorAcYCr7ALWzD14sSlbi9ljhyg9G4YoNGYIqgEKYIQHkl XEoPCa0/0ipIjKF+/ZiMH8UDIOvo2QM2dlBJCpZq5JbuP68zWbAUokSMZkSMHvPZ1pI0 lSclE/kSPLac34CE5ksgniLjNRUMc/Wccmqirf/nCN0BIeamsn9bhADwyBGNe7lpryPL 2CxA== X-Gm-Message-State: AOAM533fbNaX0i21vHI7B2noOZVbaTH3CsmgZ+S78mok2MWyo1o60aBC R7uY7Cqh/swSYyn4RbkaylYc/HgUQfw2CdpKE2w= X-Google-Smtp-Source: ABdhPJwB3z5xakHQU+cRQAcpkUo1EcM3NR9O0P7OENfl4AnS/GmJkXwDXc7kIXpOdpO4+ZTBPZuqwldAmgdCo5xhfI4= X-Received: by 2002:a17:906:29c3:: with SMTP id y3mr15133392eje.430.1616353692988; Sun, 21 Mar 2021 12:08:12 -0700 (PDT) MIME-Version: 1.0 References: <3a4d89fc-c5f8-4720-b2e0-f6f3c28684f9@www.fastmail.com> <5f5fd136-e181-d5d3-fe40-1a4cc5c668f2@gmail.com> In-Reply-To: <5f5fd136-e181-d5d3-fe40-1a4cc5c668f2@gmail.com> Date: Sun, 21 Mar 2021 19:08:02 +0000 Message-ID: To: Rowan Tommins Cc: PHP internals Content-Type: multipart/alternative; boundary="000000000000b3baaa05be10abfb" Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode? From: tekiela246@gmail.com (Kamil Tekiela) --000000000000b3baaa05be10abfb Content-Type: text/plain; charset="UTF-8" Option A, please. I have never had a reason to use either of these two functions. I assume there's plenty of valid applications for converting between ISO-8859-1 and UTF-8, but that function causes more harm than good. I have seen plenty of people use it, but I have never seen anyone use it properly. Most of the time people use it to fix their mojibake text when they forget to set the connection charset in PDO or mysqli. I was a little surprised to learn that these functions had something to do with XML. The reason why I consider them dangerous is that people using them are most likely solving the wrong problem. The problem isn't the conversion from ISO to UTF but having the text in the wrong format in the first place. They are used as some kind of magical solution that fixes an annoying problem. I would have no quarrel with them if they were named correctly though. Another reason why I do not like these functions is that they let you shoot yourself in the foot very easily. They don't warn about invalid or missing code points, which often leads to more data corruption. When doing the same with ICONV you at least get a notice. I think we really do not need to keep these functions. As for the alternative that we can offer, iconv seems to be doing exactly the same thing and even better. mb_convert_encoding does the same but also silently ignores invalid characters. So we already offer plenty of alternatives. We don't need to add anything new. -- Kamil --000000000000b3baaa05be10abfb--