Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:105624 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 10499 invoked from network); 7 May 2019 17:05:43 -0000 Received: from unknown (HELO mail-lf1-f42.google.com) (209.85.167.42) by pb1.pair.com with SMTP; 7 May 2019 17:05:43 -0000 Received: by mail-lf1-f42.google.com with SMTP id n22so4424963lfe.12 for ; Tue, 07 May 2019 07:09:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=9dA/nN6rgHBsLCaXyLT81C8WC1xw0wGOBNQKHg6jVx4=; b=OoPcPs1LGDA3ZOnjbgfC01KF+ezyXpvsUI//kmjBQtzTynH9rqlMchu93SYNAolDjs Y0yPnCWdW3+2Z6ZkzXx8bqSVjk75MWmke0gp4L0gepE3Luy14AYnSG3wssxbKrS2LB5s /cXdUm1G/5+NeZ/oS+MamzSvbUKm4+hpoTfTMt6ozOPBXfCA/FZ1aHlXvaHN9fA7/dom Xg25SfIBKU9jStzLLUdDT8AJQs9JJfP08Lf/A+vPALk2PbILeGOnZsu0F4ECUvB8iTIZ 3kyvEYrgkh8nHMzYr0OGc0dgxm6FtXE6H4upkyIa8oM4ycrPfKK8O3Jw4EwfonqLqrgK tIFQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=9dA/nN6rgHBsLCaXyLT81C8WC1xw0wGOBNQKHg6jVx4=; b=mkB7rOO4L9h/uIv8seCjgec7pceafyczXnm7WmR4XQ+Sik3WErECD0tDD0pKoUE1xq ufxMbct0pMPvLUAG+EDE4PWmwqq18F1yjcEH5nItIKmb2xfpb9xracy1TypqOqSFREhD sWYuK2Q+Py0qAhIPh9bZkNa8V2DR7MfPRs+/ehnyYbtxlkMVS/yOz4ivzstQV+VGacd9 SjwqM2OfvyjsF63yx8Ocf4K6mZzilaWWrO4d3S4VYRcRl1hQYQWAC5WFNrVarcANxHql Gp0IHO7rMheLggdkxH7HH2HNFcV96GGo23rvgYAa6BR+W9PagB2z7ngcnp2mmXT1JOWk pc9Q== X-Gm-Message-State: APjAAAVG3th10MRSrXP0T8QdMhnxZ5s3eIpn7C67upecxUkomSXupvHt 3MGbTTeKhWOpQbaNg5UJNfLwRJjQxH/M9J42Xec= X-Google-Smtp-Source: APXvYqw9v1W0Of07kS3G7zI4diX2qtPDl+/khn7hlyoVvJuoob3vFLds90Y8zDfwys9dYdDs0teKgNfgasRUoudwpAQ= X-Received: by 2002:a19:e619:: with SMTP id d25mr5716233lfh.66.1557238177173; Tue, 07 May 2019 07:09:37 -0700 (PDT) MIME-Version: 1.0 References: <0036318e-84fb-fe16-2ce1-150d7039ecf4@gmx.de> <7041d66d-1c50-aa36-5427-47555db7259a@telia.com> In-Reply-To: Date: Tue, 7 May 2019 16:09:20 +0200 Message-ID: To: "Christoph M. Becker" Cc: =?UTF-8?Q?Bj=C3=B6rn_Larsson?= , Nicolai Scheer , PHP Internals Content-Type: multipart/alternative; boundary="0000000000006194d105884cc478" Subject: Re: [PHP-DEV] Re: default_charset and mb_internal_encoding From: nikita.ppv@gmail.com (Nikita Popov) --0000000000006194d105884cc478 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, May 3, 2019 at 11:44 AM Christoph M. Becker wrote: > On 03.05.2019 at 01:18, Bj=C3=B6rn Larsson wrote: > > > Den 2019-04-11 kl. 15:41, skrev Christoph M. Becker: > > > >> On 02.04.2019 at 11:42, Nicolai Scheer wrote: > >> > >>> I'm currently in the process of migrating an old application from php > >>> 5.6 > >>> to 7.2. > >>> In the process, I fiddled with the default_charset ini setting. > >>> > >>> The documentation states (c.f. > >>> https://www.php.net/manual/en/ini.core.php#ini.default-charset): > >>> > >>> "In PHP 5.6 onwards, "UTF-8" is the default value and [...] The value > of > >>> default_charset > >>> will also be used to set the default character set for [...] and for > >>> mbstring functions > >>> if the mbstring.http_input mbstring.http_output > >>> mbstring.internal_encoding > >>> configuration option is unset." > >>> > >>> As such, I'd expect to be able to set default_charset to iso-8859-1 a= nd > >>> mbstring to pick that same setting for its internal encoding (if the > >>> mentioned directives are unset, that is). > >>> > >>> This seems not to be the case: > >>> > >>> >>> ini_set( 'default_charset', 'iso-8859-1' ); > >>> var_dump( ini_get("mbstring.internal_encoding") ); > >>> var_dump( ini_get("mbstring.http_input") ); > >>> var_dump( ini_get("mbstring.http_output") ); > >>> echo mb_internal_encoding() . "\n"; > >>> echo mb_strlen( "\xc3\xb6" ) . "\n"; > >>> echo mb_strlen( "\xc3\xb6", '8bit' ) . "\n"; > >>> > >>> This outputs (7.2.15 on a CentOS box): > >>> string(0) "" > >>> string(0) "" > >>> string(0) "" > >>> UTF-8 > >>> 1 > >>> 2 > >>> > >>> The default_charset is set but mbstring settings are not, so I'd > >>> expect to > >>> get 2 as the character/byte count in both cases. > >>> > >>> If I throw a mb_internal_encoding("iso-8859-1") in the mix, both stri= ng > >>> lengths are equal. > >>> > >>> Since the mentioned mbstring directives are deprecated as of 5.6.0 - > >>> do I > >>> really need to use mb_internal_encoding() instead? > >>> Is the documentation wrong or am I just misinterpreting it? I thought > >>> that > >>> default_charset should act as some kind of "master setting" in order > >>> not to > >>> have to set all specific settings as well (e.g. iconv, mbstring). > >>> > >>> Usually we use UTF-8, so I did not come across this before... > >>> > >>> Any insight? > >> > >> confirms the reported behavior. A quick look > >> at the code, too. I suggest you file a ticket on > >> . > > > > Did this lead to a bug report? > > Hmm, apparently not. > This was reported as https://bugs.php.net/bug.php?id=3D77907 and will be fixed in 7.4. Nikita > > It lead to a bug in Smarty 3.1.33 for me. I got a warning about > > "mbregex compile err: invalid code point value" in mb_split(). > > I have content in ISO-8859-1 and Smarty normal procedure to > > set encoding and php.ini setting to ISO-8859-1 flunked. > > > > However mb_regex_encoding('ISO-8859-1') did the trick! > > While the RFC[1] states > > | all functions that take encoding option use php.internal_encoding as > | default (e.g. htmlentities/mb_strlen/mb_regex/etc) > > apparently this has not been implemented (yet). > > [1] > > -- > Christoph M. Becker > > -- > PHP Internals - PHP Runtime Development Mailing List > To unsubscribe, visit: http://www.php.net/unsub.php > > --0000000000006194d105884cc478--