Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:71283 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 72333 invoked from network); 19 Jan 2014 22:48:54 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 19 Jan 2014 22:48:54 -0000 Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.46 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.215.46 mail-la0-f46.google.com Received: from [209.85.215.46] ([209.85.215.46:42461] helo=mail-la0-f46.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 2C/A2-61840-4565CD25 for ; Sun, 19 Jan 2014 17:48:53 -0500 Received: by mail-la0-f46.google.com with SMTP id b8so5063147lan.33 for ; Sun, 19 Jan 2014 14:48:49 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=kIixLNi4GNbEEU+L8kEUsK6PPAEEAwsTaRG0A2vR8BY=; b=cJHDGOUF6cbt0EjjXgFCtTRHXqnRFqUb4jDLLmsVBLBtwdGRsSfsLQTasLk5Es4kuu zNRNr0Opc2TVJH8dV2cvFDChIt3rEvP2/HkWf8qiodSftjxX9H856HtE3ZUH3gRR8Ohh R9hec81O3BgzXAstyJ6DhruwDul6l8Ktg3KH+Fc6+OvWYNlA7zdAanHbkZ2+5gwes2oF Wr9P67Fe8a9KR2kFmzN3yTdvKf5Wq8vdusSUf694H5kB2+YdvrjKqsW1ZgQzb6cQxn1J GTr1aI+LA4neK2do+jH9xmPf6HJAflmezUOnYhnzkPURKfySBhROr/Fzv0uWrYsdicnB d06w== X-Received: by 10.152.87.37 with SMTP id u5mr9621904laz.11.1390171728934; Sun, 19 Jan 2014 14:48:48 -0800 (PST) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.112.6.68 with HTTP; Sun, 19 Jan 2014 14:48:08 -0800 (PST) In-Reply-To: References: Date: Mon, 20 Jan 2014 07:48:08 +0900 X-Google-Sender-Auth: YOeZyxUYmsHMR8RWSZGeqfFU1jA Message-ID: To: Pierre Joye Cc: Nikita Popov , "internals@lists.php.net" Content-Type: multipart/alternative; boundary=001a11c233ec147ac804f05a94ce Subject: Re: [PHP-DEV] [RFC] Multibyte char handling From: yohgaki@ohgaki.net (Yasuo Ohgaki) --001a11c233ec147ac804f05a94ce Content-Type: text/plain; charset=UTF-8 Hi Pierre, On Mon, Jan 20, 2014 at 1:09 AM, Pierre Joye wrote: > On Thu, Jan 16, 2014 at 11:47 PM, Yasuo Ohgaki wrote: > > Hi Nikita, > > > > On Fri, Jan 17, 2014 at 7:38 AM, Nikita Popov > wrote: > > > >> No, I don't want a locale-based approach. I want the string functions to > >> stay as is. Multibyte variants of the functions can be added to the > >> multibyte extension. > > > > > > Creating mb_*() function would not solve security issues of > > multibyte char handling since multibyte aware functions are > > optional feature. > > We never supported nor claimed that these functions are multi bytes > safe. However I actually fully understand that we should solve this > problem, in one way or another. > > > However, it may work if PHP compiles mbstring by default and > > discourage use of addslashes()/var_export()/stripslashes() > > in favor of mb_*() variants. > > I do not think we should discourage the use of these functions but > clearly document to rely on mb_* APIs as long as multi bytes support > is required. > > I join other about not making any optional arguments in the existing > APIs, for a couple of reasons: > > 1. it does not solve anything as people still have to update their > code, and they won't unless maybe if they read the doc/changelog > 2. It is really not a clean solution > 3. we already have many duplicate functions in mb, it has worked well > so far and we can add the ones discussed here > I'll leave existing ext/standard functions alone. The last question was about relying on locale. This is absolutely not > a solution. Locale has been proven to be totally unreliable, buggy and > unsafe. Let alone the total lack of real posix locale support on > Windows. > mb_escape_shell_arg()/mb_escape_shell_cmd() need locale based solution, since there aren't good way to detect terminal encoding. I'll make mb version explicitly overrides this behavior by explicitly specifying encoding. On UNIXes, UTF-8 encoding is popular terminal encoding, but there would be systems using other encoding such as EUC, or even SJIS, BIG5. Windows uses different encoding for terminal encoding according to locale, so it's much more complex. This is the reason why I would use locale. However, this implementation is debatable. We could say "Users should explicitly specify terminal encoding by themselves". In fact, I prefer this even if I am about to implement mb_escape_shell_*() using locale for automatic encoding detection. It may be better to raise E_NOTICE at least if encoding parameter is omitted for mb_escape_shell_*(). For anything related to locale, formats or encoding, we should rely on > intl (ICU) and not on systems's locale. This is the only way to be > portable, safe and updated. I agree. I also would like to propose https://wiki.php.net/rfc/altmbstring - ICU version of mbstring for future release. Most work has done by Moriyoshi. We may try to switch to it now, but I suppose there is not enough time for 5.6. It's supposed to work the same as current mbstring mostly. It may be better mbstring compile as optional in favor of ICU implementation. Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net --001a11c233ec147ac804f05a94ce--