Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113697 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 7963 invoked from network); 22 Mar 2021 18:23:12 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Mar 2021 18:23:12 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 0CCA31804C0 for ; Mon, 22 Mar 2021 11:18:31 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.1 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,HTML_MESSAGE, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-vs1-f48.google.com (mail-vs1-f48.google.com [209.85.217.48]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 22 Mar 2021 11:18:30 -0700 (PDT) Received: by mail-vs1-f48.google.com with SMTP id v2so7971801vsq.11 for ; Mon, 22 Mar 2021 11:18:30 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=DGfAB8gStLjTyLLxysmSSduWeM3vs42fK/6crdHhPO8=; b=O11kyxuMZcU0M4+HiwEvtghHPaXUVqR/utkjlyrDAIGs5qcCXL+3UiMIH07M4L2UfD o08oQtx0WfBbnZBOY5J9A3uTFHT6bJyL5rHRTxLywyqGShwLXpqFQkS2+194TuM1YF3g 7h8A4Zcr9RncLRHdgh/7JUZt3Kxo3idX9acCbdPRV8s5+jQOcVcAwEjkov9lRGMIxJPW e4yZt9UNmAmK+pxZTadQDSCvos3xurcod6L7dD1z9Qq9A+7Ry8QdJUr7p0FjCo8yo3pd XL5MBmgAcHGQ0BJ/IcCwSaJNRVfIYpf0TK56+bE36xowH59ZcdLmdY2oUCgMFu9QKuev IZmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=DGfAB8gStLjTyLLxysmSSduWeM3vs42fK/6crdHhPO8=; b=Vai+VhBtmp0B8H//Q57En57OfqYT0RacL4Bk7nyYnMmydK6zjdZLV6kBn2EhW8CsDh ORLxRZDIbZYGiQ+EsdCW/UKtsOz0Y7n/ByoCjbX1Jv+GaT5z4eZLMb2IM+5KXK0ZzFvl afkPrfVaq1DmRlWKDcmDCWiOGZtBgZwnwYrsEOx+RbFYJD3BTYPP3vi8RDfWLKfJ8j79 dV/CG3Tk5lpx+VMFHesN9aLZp4onbqzYzphN9CKXuw2pD2q+383D/3Vw7ewMFMlFFR/S jLuR61S2oBYFrySbZq7x481w3cqcqKpX2qgqSg9lP56ZYZDfOrvVclrpFILnZKDObblw LF5A== X-Gm-Message-State: AOAM533i1TL80TJto91jd1AL3Qq6jk3bhbyo5aFbh0GmtqY9CJcKRPXL 7lq5I9MLfekpYdFdaNpITqbLhFDcp9KKJQ//JTE= X-Google-Smtp-Source: ABdhPJzF/YTsF9LZTXcVxuqmbK5eY0AV1o/rMDkvGksh8DVp3mtPZfV23Y9zS/bQP+1qTbEMvVunMqg+QHuOvFhRJy8= X-Received: by 2002:a05:6102:22c8:: with SMTP id a8mr1040332vsh.13.1616437109475; Mon, 22 Mar 2021 11:18:29 -0700 (PDT) MIME-Version: 1.0 References: <693767b5-a25b-b4d9-f535-6b985bf26d67@gmail.com> <29d5329c-bea2-7944-4820-515d4a10ae86@alec.pl> <16ecfc31-33aa-4223-fb67-b5a4b5895f05@gmail.com> <11e9a312-ed10-412e-506d-ccf9f24457f8@alec.pl> In-Reply-To: Date: Mon, 22 Mar 2021 14:18:18 -0400 Message-ID: To: Rowan Tommins Cc: PHP internals Content-Type: multipart/alternative; boundary="000000000000b64e9f05be2417a4" Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode? From: chasepeeler@gmail.com (Chase Peeler) --000000000000b64e9f05be2417a4 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Mon, Mar 22, 2021 at 1:22 PM Rowan Tommins wrote: > On 22/03/2021 16:52, Aleksander Machniak wrote: > > On 22.03.2021 16:41, Rowan Tommins wrote: > >> That code will never do anything useful. > > I already proved it is useful, regardless of it's name/intention. > > > You have proven no such thing. If that function is saving you from > errors, it is completely by accident. > > Even if it is by accident, removing or changing the behavior of the function is guaranteed to make something that currently works (by skill or by luck) and risk it no longer working. > The same effect can be achieved using base64_encode() and > base64_decode(), or bin2hex() and hex2bin(), or any other function that > takes a series of bytes and applies an arbitrary encoding to it. > > It could also be achieved by using a binary column type in the database, > because the values you have stored are not useful as strings; they might > as well be encrypted. > > Given the sequence of bytes "\xE3\x82\zB0", which is a valid UTF-8 > string representing U+30B0 KATAKANA LETTER GU =E3=82=B0 calling utf8_enco= de() > will result in the sequence of bytes "\xC3\xA3\xC2\x82\xC2\xB0", which > is the UTF-8 representation of the following Unicode code points: > > - U+00E3 LATIN SMALL LETTER A WITH TILDE =C3=A3 > - U+0082 CONTROL: BREAK PERMITTED HERE > - U+00B0 DEGREE SIGN =C2=B0 > > This is clearly gibberish, and bears no relationship to the original > string; it is what is generally referred to as "mojibake". > > Regards, > > -- > Rowan Tommins > [IMSoP] > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: https://www.php.net/unsub.php > > --=20 Chase Peeler chasepeeler@gmail.com --000000000000b64e9f05be2417a4--