Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:105577 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 61399 invoked from network); 3 May 2019 02:15:42 -0000 Received: from unknown (HELO v-smtpout1.han.skanova.net) (81.236.60.154) by pb1.pair.com with SMTP; 3 May 2019 02:15:42 -0000 Received: from [192.168.7.8] ([213.64.245.126]) by cmsmtp with ESMTPA id MKyAh4BJuSP5KMKyAhgpZ3; Fri, 03 May 2019 01:18:26 +0200 To: "Christoph M. Becker" , Nicolai Scheer References: <0036318e-84fb-fe16-2ce1-150d7039ecf4@gmx.de> Cc: PHP Internals Message-ID: <7041d66d-1c50-aa36-5427-47555db7259a@telia.com> Date: Fri, 3 May 2019 01:18:32 +0200 User-Agent: Mozilla/5.0 (Windows NT 6.3; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <0036318e-84fb-fe16-2ce1-150d7039ecf4@gmx.de> Content-Type: text/plain; charset=utf-8; format=flowed Content-Transfer-Encoding: 8bit Content-Language: en-GB X-CMAE-Envelope: MS4wfM9I6FzqdzC/IcYRsVBcFMiVJ0t8Dz3cChDfEk2PgQ0AN6+7UcNj6a3SX0NcgX/bQAAPGxSYVHyco6sxhbYKiWaa6NPRJFmfrTGAqzlpdqWUBvpnyFSP ySBGkuWRP+WW97A26NMJEOZziKY+G/cWotteGXo4JihYI3EOQ1nk8ZIVC8y2osw+CIG5La/aQ+GyvSkvPYe9+p/fiQL9pnPb3bgkOcx8XnfuC358FIDT0Hzp Io3kMRIWuHKaf/xVLiUb4Q== Subject: Re: [PHP-DEV] Re: default_charset and mb_internal_encoding From: bjorn.x.larsson@telia.com (=?UTF-8?Q?Bj=c3=b6rn_Larsson?=) Den 2019-04-11 kl. 15:41, skrev Christoph M. Becker: > On 02.04.2019 at 11:42, Nicolai Scheer wrote: > >> I'm currently in the process of migrating an old application from php 5.6 >> to 7.2. >> In the process, I fiddled with the default_charset ini setting. >> >> The documentation states (c.f. >> https://www.php.net/manual/en/ini.core.php#ini.default-charset): >> >> "In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value of >> default_charset >> will also be used to set the default character set for [...] and for >> mbstring functions >> if the mbstring.http_input mbstring.http_output mbstring.internal_encoding >> configuration option is unset." >> >> As such, I'd expect to be able to set default_charset to iso-8859-1 and >> mbstring to pick that same setting for its internal encoding (if the >> mentioned directives are unset, that is). >> >> This seems not to be the case: >> >> > ini_set( 'default_charset', 'iso-8859-1' ); >> var_dump( ini_get("mbstring.internal_encoding") ); >> var_dump( ini_get("mbstring.http_input") ); >> var_dump( ini_get("mbstring.http_output") ); >> echo mb_internal_encoding() . "\n"; >> echo mb_strlen( "\xc3\xb6" ) . "\n"; >> echo mb_strlen( "\xc3\xb6", '8bit' ) . "\n"; >> >> This outputs (7.2.15 on a CentOS box): >> string(0) "" >> string(0) "" >> string(0) "" >> UTF-8 >> 1 >> 2 >> >> The default_charset is set but mbstring settings are not, so I'd expect to >> get 2 as the character/byte count in both cases. >> >> If I throw a mb_internal_encoding("iso-8859-1") in the mix, both string >> lengths are equal. >> >> Since the mentioned mbstring directives are deprecated as of 5.6.0 - do I >> really need to use mb_internal_encoding() instead? >> Is the documentation wrong or am I just misinterpreting it? I thought that >> default_charset should act as some kind of "master setting" in order not to >> have to set all specific settings as well (e.g. iconv, mbstring). >> >> Usually we use UTF-8, so I did not come across this before... >> >> Any insight? > confirms the reported behavior. A quick look > at the code, too. I suggest you file a ticket on . > > Thanks, > Christoph M. Becker Hi, Did this lead to a bug report? It lead to a bug in Smarty 3.1.33 for me. I got a warning about "mbregex compile err: invalid code point value" in mb_split(). I have content in ISO-8859-1 and Smarty normal procedure to set encoding and php.ini setting to ISO-8859-1 flunked. However mb_regex_encoding('ISO-8859-1') did the trick! r//Björn L