Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:71618 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 33558 invoked from network); 27 Jan 2014 00:10:30 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 27 Jan 2014 00:10:30 -0000 Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.44 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.215.44 mail-la0-f44.google.com Received: from [209.85.215.44] ([209.85.215.44:44324] helo=mail-la0-f44.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id C7/5B-12631-4F3A5E25 for ; Sun, 26 Jan 2014 19:10:28 -0500 Received: by mail-la0-f44.google.com with SMTP id hm7so4044832lab.3 for ; Sun, 26 Jan 2014 16:10:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=sQ7Eey7wqLQuOByYMhdRVJAzl9U2HmTPeqK2goshMe4=; b=sLpsCb0Y3SrzG3TYsJYpz1+Jl6DDYxckGhbEo1nE98jcl4wP20kP+3zU87SCawi1Xl /hfEf5YpijBtGgPw4pu3b1u7wxYmUQvu126+SGhn4am2PB58xt/Tio+fnFIFMJ/16+5t f3JX2QgUHNpwb8OFX3RbCcDCO9Tyhn4sHDzKBRYk2nSlPU689yRBE0dgHnc1GgH869bh GWQZTh9MZMcw2Gj51mZzZpopZFjOyZeXwQb7yFLvMcqg6hed6KxhkiXml3zOvD5plZkP P3dcWKINMkjGc/ZKdeE3pJojeMoUTL9mX5sQF6AvgLlhgu5bDaM4ZYNude9knHcL19GL tBQw== X-Received: by 10.153.3.2 with SMTP id bs2mr9110477lad.5.1390781424539; Sun, 26 Jan 2014 16:10:24 -0800 (PST) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.112.132.134 with HTTP; Sun, 26 Jan 2014 16:09:44 -0800 (PST) In-Reply-To: References: Date: Mon, 27 Jan 2014 09:09:44 +0900 X-Google-Sender-Auth: okZJRFrBvFx8SAEM6OuvgioB52c Message-ID: To: Dan Ackroyd Cc: "internals@lists.php.net" Content-Type: multipart/alternative; boundary=001a1136c742c51afb04f0e888ff Subject: Re: [PHP-DEV] [VOTE] RFC: Multibyte Char Handling From: yohgaki@ohgaki.net (Yasuo Ohgaki) --001a1136c742c51afb04f0e888ff Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Dan, On Mon, Jan 27, 2014 at 7:28 AM, Dan Ackroyd wrote= : > Sorry, for rapid posting but I realise I should have been explicit in > my last message. The RFC being voted on has "Backward Incompatible > Changes: None." > > The referenced RFC which apparently would be included has: > > "Removed (deprecated) functions and reasons behind it > > mb_check_encoding() =E2=80=93 Not that usable as it is advertised, period= . > First of all, validation in terms of encoding is just as same as > filtering through the converter supplied with the same value for the > input and output encoding. Thus just use mb_convert_encoding(). > mb_convert_case() =E2=80=93 Use mb_strtoupper(), mb_strtolower() and > mb_strtotitle() > mb_convert_kana() =E2=80=93 This can't be standard-compliant. In addition= , > part of the functionality is already covered by Normalizer of intl > extension, so we need to carefully consider what is actually needed > here again. > mb_convert_variables() =E2=80=93 This can be implemented as a script. > mb_decode_mimeheader() and mb_encode_mimeheader() =E2=80=93 Non-standard > compliancy. > mb_decode_numericentity() =E2=80=93 Removed in favor of html_entity_decod= e(). > mb_encode_numericentity() =E2=80=93 Removed in favor of htmlentities() an= d > htmlspecialchars(). > mb_encoding_aliases() =E2=80=93 Just unnecessary. > mb_ereg_match() =E2=80=93 Use mb_ereg() > mb_ereg_search(), mb_ereg_search_getpos(), mb_ereg_search_getregs(), > mb_ereg_search_init(), mb_ereg_search_pos(), mb_ereg_search_regs() and > mb_ereg_search_setpos() =E2=80=93 I rarely heard a script that actively u= ses > these functions. They involve an internal state that is not visible to > users, and thus it most likely causes confusion when used across the > function calls. Need to be reimplemented as a class. > mb_eregi() =E2=80=93 Use mb_regex_options() and mb_ereg() > mb_eregi_replace() =E2=80=93 I wonder why this function was added in the = first > place because giving 'i' option to mb_ereg_replace() works in the same > way. > mb_detect_order(), mb_get_info(), mb_http_input(), mb_http_output(), > mb_language() and mb_substitute_character() =E2=80=93 ini_set() and ini_g= et() > are your friends, I guess=E2=80=A6 > mb_regex_encoding() =E2=80=93 It is really confusing that the current mbs= tring > allows two different encoding defaults for regex functions and the > rest. Those settings are unified in the alternative version and so > this is no longer necessary. > mb_send_mail() =E2=80=93 The behavior of this function relies on the > pseudo-locale setting called =E2=80=9Cmbstring.language=E2=80=9D that sup= ports just a > limited set of possible locales. As not everyone can benefit from the > function and most significant applications implement their own mail > functions, I suppose this is no longer wanted. > mb_strrchr() =E2=80=93 Use mb_strrpos(). > mb_strrichr() =E2=80=93 Use mb_strripos()." > > None is not the same as a huge number of function changes. > I just didn't want to touch 5 year old RFC. As I wrote in parent RFC, the implementation is subject to be changed. The objective of this RFC is killing the vulnerability completely. It's better to have road map for it. As I wrote, there is license difficulty to compile current mbstring by default. There is mbstring-ng, but it's incomplete. This RFC is only proposing feasible option. I'm going to copy all mbstring features to mbstring-ng, but there may be some compatibility issue. e.g. Non character encoding handling. There will be another vote when we replace mbstring and mbstring-ng actually, since this RFC only proposes the way to go. I don't think this RFC is the approval for replacement. Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net --001a1136c742c51afb04f0e888ff--