Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:113689 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 94985 invoked from network); 22 Mar 2021 17:27:02 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 22 Mar 2021 17:27:02 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 2576C1804DD for ; Mon, 22 Mar 2021 10:22:20 -0700 (PDT) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-0.7 required=5.0 tests=BAYES_05,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,FREEMAIL_FROM,NICE_REPLY_A, RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-ed1-f45.google.com (mail-ed1-f45.google.com [209.85.208.45]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Mon, 22 Mar 2021 10:22:19 -0700 (PDT) Received: by mail-ed1-f45.google.com with SMTP id bf3so20343128edb.6 for ; Mon, 22 Mar 2021 10:22:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=subject:to:references:from:message-id:date:user-agent:mime-version :in-reply-to:content-transfer-encoding:content-language; bh=PFg6K1ofnTLTBTzG+5/7Qemb7BeRIDFmTs69K42vuw8=; b=hpBrnJqqSOvMiTxRlvpZoGmwwjdl0SguvcB7ybvRgPR43fQW5UErnLJL3pVjspakGb Vr+SnPKtcatiGfuJ4tQsDJNDnqOc6jcAjKPx1AfYvCMbWDhCEO51NUDOyYn2ohQKxpFt XxQ/R7yKgiONJcM5ZNEYoH4gGxWhTo8/w85Jrn+WZ/K7OlJE+LcTxprd1boJuVCTX6dT 3Sx0EmeDMwZkt84qkiF3CM9PliEHzcqdsovoZpjlI0obd5vtpuDkxk8L1vKs8+WtP5wP jtDsiBjmkYqvUudDMHnVPaH1mGvFjDUs1NDm4VfKXXCLUQHj/GalOEnFNpR/Lvm38PvF pQuQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:subject:to:references:from:message-id:date :user-agent:mime-version:in-reply-to:content-transfer-encoding :content-language; bh=PFg6K1ofnTLTBTzG+5/7Qemb7BeRIDFmTs69K42vuw8=; b=QOOkA67azc12N0vT+9kBEHfYeA6MmDw74Jiel5w+M59VsMQhQLFWEOK7OOTvJb3U2w +jciNPGOSgqx3tOk4L/A1JRfrPdWks549MNxLO6MZzl19mAjNmEoO6yusnuzGcf2BHoi 59qkwB1RSuVY5LefACL1sujyAA4fUcIbGdORaSKZAnxDcUSujQ3r8w0XOiGCpqL+BtSe drYVeY5tdz/oTWIxduQ/0iaCof8vSrRI5t29jHIEzpNlx9v31fHg043cUXciUEYBuEKn r8jIlCLuDDCTDkpyDzSCpz3k2wZ7wv0uJILomeaERTUj5ek1zbL83XU/nytRF0hcyLth HUfA== X-Gm-Message-State: AOAM530U8SoOZSZlsYiU9tYzTHETYGaGLeKzQRavZQN5L4GBBJ0Vw67K nmbTScMbNKOjv9WSnaRQYL2FNYvM2XI= X-Google-Smtp-Source: ABdhPJxIbqlecgtNr50yQWVvLMHkdWXWaRHvfp9ChJKjZRRflnEF4u35MkXzjoNU58FOgav4L9cCig== X-Received: by 2002:aa7:d687:: with SMTP id d7mr710113edr.118.1616433736773; Mon, 22 Mar 2021 10:22:16 -0700 (PDT) Received: from [192.168.0.22] (cpc104104-brig22-2-0-cust548.3-3.cable.virginm.net. [82.10.58.37]) by smtp.googlemail.com with ESMTPSA id a22sm11726745edu.14.2021.03.22.10.22.16 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 22 Mar 2021 10:22:16 -0700 (PDT) To: internals@lists.php.net References: <693767b5-a25b-b4d9-f535-6b985bf26d67@gmail.com> <29d5329c-bea2-7944-4820-515d4a10ae86@alec.pl> <16ecfc31-33aa-4223-fb67-b5a4b5895f05@gmail.com> <11e9a312-ed10-412e-506d-ccf9f24457f8@alec.pl> Message-ID: Date: Mon, 22 Mar 2021 17:22:15 +0000 User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:78.0) Gecko/20100101 Thunderbird/78.8.1 MIME-Version: 1.0 In-Reply-To: <11e9a312-ed10-412e-506d-ccf9f24457f8@alec.pl> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB Subject: Re: [PHP-DEV] What should we do with utf8_encode and utf8_decode? From: rowan.collins@gmail.com (Rowan Tommins) On 22/03/2021 16:52, Aleksander Machniak wrote: > On 22.03.2021 16:41, Rowan Tommins wrote: >> That code will never do anything useful. > I already proved it is useful, regardless of it's name/intention. You have proven no such thing. If that function is saving you from errors, it is completely by accident. The same effect can be achieved using base64_encode() and base64_decode(), or bin2hex() and hex2bin(), or any other function that takes a series of bytes and applies an arbitrary encoding to it. It could also be achieved by using a binary column type in the database, because the values you have stored are not useful as strings; they might as well be encrypted. Given the sequence of bytes "\xE3\x82\zB0", which is a valid UTF-8 string representing U+30B0 KATAKANA LETTER GU グ calling utf8_encode() will result in the sequence of bytes "\xC3\xA3\xC2\x82\xC2\xB0", which is the UTF-8 representation of the following Unicode code points: - U+00E3 LATIN SMALL LETTER A WITH TILDE ã - U+0082 CONTROL: BREAK PERMITTED HERE - U+00B0 DEGREE SIGN ° This is clearly gibberish, and bears no relationship to the original string; it is what is generally referred to as "mojibake". Regards, -- Rowan Tommins [IMSoP]