Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:71297 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 4758 invoked from network); 20 Jan 2014 06:38:16 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Jan 2014 06:38:16 -0000 Authentication-Results: pb1.pair.com smtp.mail=pierre.php@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=pierre.php@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.41 as permitted sender) X-PHP-List-Original-Sender: pierre.php@gmail.com X-Host-Fingerprint: 209.85.215.41 mail-la0-f41.google.com Received: from [209.85.215.41] ([209.85.215.41:34356] helo=mail-la0-f41.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 5B/60-02192-754CCD25 for ; Mon, 20 Jan 2014 01:38:16 -0500 Received: by mail-la0-f41.google.com with SMTP id mc6so5323386lab.28 for ; Sun, 19 Jan 2014 22:38:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:date:message-id:subject:from:to :cc:content-type; bh=kvFltMVtWHNYRGf0WrZEncRLJbkS33bw5lXVrauNHNo=; b=Gnk6HoKpquhYl19Wgwit13i+zRTyChNpEppe/pZoBsmRiQhkj3Qt7zoXlzbvdK8UsO xHX8XtEoeUDhQBy9apvLg++WbYPTW4EGEONBEsmD2/g/RJqSTWIeiAXUtky3aBVGkrT4 r3hzNOo0fK2khO7C+3/Z6inW/cC/43bLNVtAjCgPFMAOywiYA/C063LIo/GXV3ytEz5/ 5FmkxyFUg4n2rxsDaRW68Dy5UTgkcOxH26B3dxQHlfzA0tyY9SRHTcehQO4vvuX66/AE 1aQWJkERZCvWACkqEcSbpET/XiYuhlTSFPkDeXiglDLRPk8sZApcudft6FKVaDeMUURw t6UQ== MIME-Version: 1.0 X-Received: by 10.152.170.135 with SMTP id am7mr10687403lac.23.1390199892700; Sun, 19 Jan 2014 22:38:12 -0800 (PST) Received: by 10.112.35.134 with HTTP; Sun, 19 Jan 2014 22:38:12 -0800 (PST) Received: by 10.112.35.134 with HTTP; Sun, 19 Jan 2014 22:38:12 -0800 (PST) In-Reply-To: References: Date: Mon, 20 Jan 2014 07:38:12 +0100 Message-ID: To: Yasuo Ohgaki Cc: PHP internals , Nikita Popov Content-Type: multipart/alternative; boundary=089e0122797ac578a004f06122d6 Subject: Re: [PHP-DEV] [RFC] Multibyte char handling From: pierre.php@gmail.com (Pierre Joye) --089e0122797ac578a004f06122d6 Content-Type: text/plain; charset=UTF-8 On Jan 19, 2014 11:48 PM, "Yasuo Ohgaki" wrote: > > Hi Pierre, > > On Mon, Jan 20, 2014 at 1:09 AM, Pierre Joye wrote: >> >> On Thu, Jan 16, 2014 at 11:47 PM, Yasuo Ohgaki wrote: >> > Hi Nikita, >> > >> > On Fri, Jan 17, 2014 at 7:38 AM, Nikita Popov wrote: >> > >> >> No, I don't want a locale-based approach. I want the string functions to >> >> stay as is. Multibyte variants of the functions can be added to the >> >> multibyte extension. >> > >> > >> > Creating mb_*() function would not solve security issues of >> > multibyte char handling since multibyte aware functions are >> > optional feature. >> >> We never supported nor claimed that these functions are multi bytes >> safe. However I actually fully understand that we should solve this >> problem, in one way or another. >> >> > However, it may work if PHP compiles mbstring by default and >> > discourage use of addslashes()/var_export()/stripslashes() >> > in favor of mb_*() variants. >> >> I do not think we should discourage the use of these functions but >> clearly document to rely on mb_* APIs as long as multi bytes support >> is required. >> >> I join other about not making any optional arguments in the existing >> APIs, for a couple of reasons: >> >> 1. it does not solve anything as people still have to update their >> code, and they won't unless maybe if they read the doc/changelog >> 2. It is really not a clean solution >> 3. we already have many duplicate functions in mb, it has worked well >> so far and we can add the ones discussed here > > > I'll leave existing ext/standard functions alone. :) >> The last question was about relying on locale. This is absolutely not >> a solution. Locale has been proven to be totally unreliable, buggy and >> unsafe. Let alone the total lack of real posix locale support on >> Windows. > > > mb_escape_shell_arg()/mb_escape_shell_cmd() need locale based > solution, since there aren't good way to detect terminal encoding. I'll > make mb version explicitly overrides this behavior by explicitly specifying > encoding. > Sounds good > On UNIXes, UTF-8 encoding is popular terminal encoding, but there > would be systems using other encoding such as EUC, or even SJIS, BIG5. Right, and as far as I remember UTF-8 does not suffer from this problem. > Windows uses different encoding for terminal encoding according to locale, > so it's much more complex. > Let me provide a function to detect it, but we need something to normalize the names. Do we have such thing in mbstring? > This is the reason why I would use locale. However, this implementation > is debatable. > Yes :) > We could say "Users should explicitly specify terminal encoding > by themselves". In fact, I prefer this even if I am about to implement > mb_escape_shell_*() using locale for automatic encoding detection. > > It may be better to raise E_NOTICE at least if encoding parameter is > omitted for mb_escape_shell_*(). Notice sounds good too. > >> For anything related to locale, formats or encoding, we should rely on >> intl (ICU) and not on systems's locale. This is the only way to be >> portable, safe and updated. > > > I agree. > I also would like to propose > > https://wiki.php.net/rfc/altmbstring - ICU version of mbstring > Oh, very nice. > for future release. Most work has done by Moriyoshi. We may try to > switch to it now, but I suppose there is not enough time for 5.6. What's the status? We still have some time :) Cheers, Pierre --089e0122797ac578a004f06122d6--