Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:105579 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 439 invoked from network); 3 May 2019 12:40:49 -0000 Received: from unknown (HELO mout.gmx.net) (212.227.15.18) by pb1.pair.com with SMTP; 3 May 2019 12:40:49 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1556876619; bh=9akSFzi3xd28lS1WbuoCv27K/pWXWwGOi3vZGCMm/Jw=; h=X-UI-Sender-Class:Subject:To:Cc:References:From:Date:In-Reply-To; b=SrDno6ePuF1QE0k7OwF+bPW5vMN3eqBQF7OHVF579M2H+xbVaivRmc9f67zfpDJHr vL2HkI5tckw4gWyTKhZSdyKos7NGFpFN6sXOV9FfZMr+V72KhRYiz8OuG6RQznoSCI /hbpcvfOz1DGxHSQzVgyC7QTxfcOOwxKF+ujuCXo= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [192.168.2.144] ([79.222.45.225]) by mail.gmx.com (mrgmx003 [212.227.17.190]) with ESMTPSA (Nemesis) id 0MWC9x-1hGHSM11th-00XKYv; Fri, 03 May 2019 11:43:39 +0200 To: =?UTF-8?Q?Bj=c3=b6rn_Larsson?= , Nicolai Scheer Cc: PHP Internals References: <0036318e-84fb-fe16-2ce1-150d7039ecf4@gmx.de> <7041d66d-1c50-aa36-5427-47555db7259a@telia.com> Message-ID: Date: Fri, 3 May 2019 11:43:38 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: <7041d66d-1c50-aa36-5427-47555db7259a@telia.com> Content-Type: text/plain; charset=utf-8 Content-Language: de-DE Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:ihHCnAlVJwqLLVIRNsjhlvLmpKtv0sky9KH/4EJeS4+iRCIp0JR JT0dAizGSiv3KG1dOo1HDacStm2ZC4yQG0Oh5twPl+oabXXdR95T6B5KT253WPp3Q8cfJL2 k9Glc5Dr/+UcVAjsU0F1YBC6cWJn7zzR1txksll6bj/IG0a1WMtiDwjXNMGI725LTqQvQcV jnrziz3n9EYzzDORxiP4Q== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:wskXNvWTTJk=:hJwEQjL21aApugVMqKISeP GFLxcIpaibZQ2bopcvT0M5uZ6ZB8258ZOiQKU+APfwFogezrJ9L/IzJc2MDdE0IGL2IusOkOq RnmQV1S73+1I1UpH5lTqELcPk1MQaKzyALAvxPP3pZoy/ofKJ6rt3ZU6nMFeisyrcxD9zZ5LW p9Oogq6CCDbE37Jr/aDLcagu1LFexktdM8ior2+ISB6x/yo66Nh+uoAeflM2dqgXynA/fEA0l nSz/jY5xUSPHdEg35UtXhU1CkAlemT50quIrk0pJhHqT+N7i5M4ChgN5jiWr7/49WtxxrqJp7 KGrf4Wq9HYO7X1DjaGydAsZuu4RfT+cKoMjQ1ugg4Rph4Dj+SppUZHrq83x7+S0K1rtARUBys Dcid8Jc46xgSBcFlZXHM+A0tsfp0GWTLrqIg7f5Q2JlJ3bw1QPayarDS+34cMSndNdsrj7gng yJneHyAAxt2VwOgFCwLDWGOMoiblDd7tDJuV3mRfgycz9V0V5a6E17A74w4fdj0x2Mr1mm/VL iyAs2LhxDk3QHyB9SimaWTUft7Z6XgQk39O4tC6okVbGltWEG4ejkGVOJSbC1zNfTkMIB/dD0 mSx7vskjRj7wyXfSrZOJ1evjyUB1QscrpFGOAqfPIqj0fAnxW7BGihrRS517mG0rnNSZx3E6U H3baQB6n9zKZKVp9HQpLlCgDf0hkmTn4FYBUgP9EjaeA4e4grQlwIvNBuSPZ1IkZZN65XreHa sarUsj95MOoZwmoiNVmFMGVDwnr7y0iKHSreWW1U+n5A9RxisA3QFgrlhkBar/Bfo0tdLu/cA xV9p3H1dgjva8cZLVwG5Jz5Tstcg3jOQmyNFCku3MQMp0v18rmv9bb/nBlAi1iaZyw2JP2oPT zpI00aE0Ntz2pMYXLCpJDtblmu24IPQd3WWPjgPOMAGmv8RUDDz0i3V+g3CDaxlS98SBQVhs7 brH3HmyX0iw== Subject: Re: [PHP-DEV] Re: default_charset and mb_internal_encoding From: cmbecker69@gmx.de ("Christoph M. Becker") On 03.05.2019 at 01:18, Bj=C3=B6rn Larsson wrote: > Den 2019-04-11 kl. 15:41, skrev Christoph M. Becker: > >> On 02.04.2019 at 11:42, Nicolai Scheer wrote: >> >>> I'm currently in the process of migrating an old application from php >>> 5.6 >>> to 7.2. >>> In the process, I fiddled with the default_charset ini setting. >>> >>> The documentation states (c.f. >>> https://www.php.net/manual/en/ini.core.php#ini.default-charset): >>> >>> "In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value = of >>> default_charset >>> will also be used to set the default character set for [...] and for >>> mbstring functions >>> if the mbstring.http_input mbstring.http_output >>> mbstring.internal_encoding >>> configuration option is unset." >>> >>> As such, I'd expect to be able to set default_charset to iso-8859-1 an= d >>> mbstring to pick that same setting for its internal encoding (if the >>> mentioned directives are unset, that is). >>> >>> This seems not to be the case: >>> >>> >> ini_set( 'default_charset', 'iso-8859-1' ); >>> var_dump( ini_get("mbstring.internal_encoding") ); >>> var_dump( ini_get("mbstring.http_input") ); >>> var_dump( ini_get("mbstring.http_output") ); >>> echo mb_internal_encoding() . "\n"; >>> echo mb_strlen( "\xc3\xb6" ) . "\n"; >>> echo mb_strlen( "\xc3\xb6", '8bit' ) . "\n"; >>> >>> This outputs (7.2.15 on a CentOS box): >>> string(0) "" >>> string(0) "" >>> string(0) "" >>> UTF-8 >>> 1 >>> 2 >>> >>> The default_charset is set but mbstring settings are not, so I'd >>> expect to >>> get 2 as the character/byte count in both cases. >>> >>> If I throw a mb_internal_encoding("iso-8859-1") in the mix, both strin= g >>> lengths are equal. >>> >>> Since the mentioned mbstring directives are deprecated as of 5.6.0 - >>> do I >>> really need to use mb_internal_encoding() instead? >>> Is the documentation wrong or am I just misinterpreting it? I thought >>> that >>> default_charset should act as some kind of "master setting" in order >>> not to >>> have to set all specific settings as well (e.g. iconv, mbstring). >>> >>> Usually we use UTF-8, so I did not come across this before... >>> >>> Any insight? >> >> confirms the reported behavior.=C2=A0 A quick = look >> at the code, too.=C2=A0 I suggest you file a ticket on >> . > > Did this lead to a bug report? Hmm, apparently not. > It lead to a bug in Smarty 3.1.33 for me. I got a warning about > "mbregex compile err: invalid code point value" in mb_split(). > I have content in ISO-8859-1 and Smarty normal procedure to > set encoding and php.ini setting to ISO-8859-1 flunked. > > However mb_regex_encoding('ISO-8859-1') did the trick! While the RFC[1] states | all functions that take encoding option use php.internal_encoding as | default (e.g. htmlentities/mb_strlen/mb_regex/etc) apparently this has not been implemented (yet). [1] =2D- Christoph M. Becker