Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:105239 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 20529 invoked from network); 11 Apr 2019 16:44:04 -0000 Received: from unknown (HELO mout.gmx.net) (212.227.17.22) by pb1.pair.com with SMTP; 11 Apr 2019 16:44:04 -0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=gmx.net; s=badeba3b8450; t=1554990087; bh=Mbi3g2VoAyPZ+wNgApO6RAYHyHCJy9uSEwNfEe2g9x0=; h=X-UI-Sender-Class:Subject:To:References:From:Date:In-Reply-To; b=LD6TtGWvulZhjd49tK3Fc9FNbphPo/lmpzaBW+tUJfV7+kATgAujMJwkcvF4sxLQ6 FWrHEgwc30lQzM8X3q1u++3ICji8CoNltEmloZ3RbNfgjM6leZboNdPVrXuLtTxM1x 6Eqjv6aKW146f+JI1Ywg+8i0gOYCraNwXZ1THAmU= X-UI-Sender-Class: 01bb95c1-4bf8-414a-932a-4f6e2808ef9c Received: from [192.168.2.145] ([87.167.196.134]) by mail.gmx.com (mrgmx101 [212.227.17.168]) with ESMTPSA (Nemesis) id 0M0LtB-1gtTpq02Np-00uakz; Thu, 11 Apr 2019 15:41:27 +0200 To: Nicolai Scheer , PHP Internals References: Message-ID: <0036318e-84fb-fe16-2ce1-150d7039ecf4@gmx.de> Date: Thu, 11 Apr 2019 15:41:26 +0200 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:60.0) Gecko/20100101 Thunderbird/60.6.1 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: de-DE Content-Transfer-Encoding: quoted-printable X-Provags-ID: V03:K1:+u5ngEbMvMquo7lstQU9lNq1noNvR5Py3rS0R70cyfdAxN5QgAm 2IbXhZg6IDcO6v2ocTogcgo8Vn7aMyWBTfCRageKsP/k1wF/pFVMjxq+AMD9JZx1Xz15bJM qMzrCuQX3FM5l8ap55VLk9Hprd7Pd7nL1e4g6y7xT4lCPeM8JXeDpPt3nWBaNPqhokNWX71 eiOS7YntWiw84Bys3p9hw== X-Spam-Flag: NO X-UI-Out-Filterresults: notjunk:1;V03:K0:hNWqSHGS4bs=:KttkCp6SBszJ9wl0S2jT6U +zVfaUdsXXA1voBxar9e/U/38EUpH218V23y+ecKLnZ8Vv4yBaxP+tByAKv+sKQg81fofdhZm G60ekZLFvQrhmHPXqwxeVxh17jKJJyjf9CXijmLSYqgyWOYiK3/63/DTq71ifpV+XxCQgizsz RYKhM7gSWzkoO8A/MbsV3WX+b9XSB/WGGDP5qXok+uGkVb0zAcXF0MyPFT7/rXqo/iapYzAxo VHyT0/Vik+yyee/70x7e86lI9t4IPAA3PSViuOnTR34qrdgiPd4SUo7WW4nVyNbMnzMclsOO8 K0uqA/7BZcLQO/ktWEQkWMOIKCjMumqEqv1HCw4myKH5S7sbsadopOzRJDSMYkpRg/Pxbbc/O GoMtkkzId6XrNDFuGRVAlizKeQ/CS9M5fQj2T28KoMmDT9sOLwYxbUoh65fysLErTU83m6DV6 +Py00C3e1MgyxyWWzVFlgvaB8ZRG1he69vNSXeXBrUqkGZy8VE5uHLXF90Ww3RroV53x6R5HP pi2QFw5yDFYW24sZSu/PX5ZKrZu0u+AEhlP8UGmDLQPPpsIQHUWltFotJFXo3Y2B/rxG388V6 kzyXLz2pnx/eUuAKA1uB5mRshrxjAJiAEe19N8/eNtNYktPy8QpR3eNEpIvy/OIwhfF5gCeB3 xh3OCVnwVVD1OofUEbq3K1HYUj3ixsgvp3JfIsTFtePj5KSC3QX9OvFDZ2Cb5U2mrLIx6s3G5 +afXqgAi6V/5qbM9KpAbbjy9P+yhoO1JxehyQAVHGKFRKy2nWPkUdOQVH8ja2usAziCGkEpCu /pkhwOXB6JmuSJwaqUbXosGxaRem5aeUe9FqZsnGdJbotqGvyZZ2DuyQw9EiSttdfTruV1HDg NY/CQmLPGRG0z4Dps2cFJKsy2P/l21rCQU6RmKDaH6CEmNIf6Y8atflkmaDbCpUXcxWmw7dMy jM/s3ekJvyA== Subject: Re: default_charset and mb_internal_encoding From: cmbecker69@gmx.de ("Christoph M. Becker") On 02.04.2019 at 11:42, Nicolai Scheer wrote: > I'm currently in the process of migrating an old application from php 5.= 6 > to 7.2. > In the process, I fiddled with the default_charset ini setting. > > The documentation states (c.f. > https://www.php.net/manual/en/ini.core.php#ini.default-charset): > > "In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value of > default_charset > will also be used to set the default character set for [...] and for > mbstring functions > if the mbstring.http_input mbstring.http_output mbstring.internal_encodi= ng > configuration option is unset." > > As such, I'd expect to be able to set default_charset to iso-8859-1 and > mbstring to pick that same setting for its internal encoding (if the > mentioned directives are unset, that is). > > This seems not to be the case: > > ini_set( 'default_charset', 'iso-8859-1' ); > var_dump( ini_get("mbstring.internal_encoding") ); > var_dump( ini_get("mbstring.http_input") ); > var_dump( ini_get("mbstring.http_output") ); > echo mb_internal_encoding() . "\n"; > echo mb_strlen( "\xc3\xb6" ) . "\n"; > echo mb_strlen( "\xc3\xb6", '8bit' ) . "\n"; > > This outputs (7.2.15 on a CentOS box): > string(0) "" > string(0) "" > string(0) "" > UTF-8 > 1 > 2 > > The default_charset is set but mbstring settings are not, so I'd expect = to > get 2 as the character/byte count in both cases. > > If I throw a mb_internal_encoding("iso-8859-1") in the mix, both string > lengths are equal. > > Since the mentioned mbstring directives are deprecated as of 5.6.0 - do = I > really need to use mb_internal_encoding() instead? > Is the documentation wrong or am I just misinterpreting it? I thought th= at > default_charset should act as some kind of "master setting" in order not= to > have to set all specific settings as well (e.g. iconv, mbstring). > > Usually we use UTF-8, so I did not come across this before... > > Any insight? confirms the reported behavior. A quick look at the code, too. I suggest you file a ticket on . Thanks, Christoph M. Becker