Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:71210 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 33958 invoked from network); 17 Jan 2014 19:54:43 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 17 Jan 2014 19:54:43 -0000 Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.52 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.215.52 mail-la0-f52.google.com Received: from [209.85.215.52] ([209.85.215.52:60033] helo=mail-la0-f52.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 45/72-16023-18A89D25 for ; Fri, 17 Jan 2014 14:54:42 -0500 Received: by mail-la0-f52.google.com with SMTP id c6so4034911lan.11 for ; Fri, 17 Jan 2014 11:54:38 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=HzmSTHaXNsqRrYBV85YKQPrAj/6+szCpL80eFzYo+Dk=; b=d3+H14clcDFMuTSXdXNx6SEWHyiS+TKnuVzJ/+d9ltrEeol/uQbcXvNOq7wIRTFWC4 Fi6/rWDN3eJf1FyyQWbfVa0lzAxcEb5HAhOw1HCMAS2+I9RncriFLhf6WEzFvbZrmKOr Ag6RuZu2KA6l+51jXZY9DRLF6qRwQ1w00MXasi0ln39aLGAzCaM/2RFZsoBxmOt0pl0L IvIp2WHbzwiDy8UhSea+JGpC1yKnRYWiKyefICeRfsyvsQqozMwTUj8Fih6LHW9QaVhV eIwZRPuQqprI6ly3+tG7g6xmu6sadFcU3ipcPRFoKA9E2hwEK0p97wMAXCwHTYmjT+ea kPTg== X-Received: by 10.112.150.200 with SMTP id uk8mr1341733lbb.1.1389988478539; Fri, 17 Jan 2014 11:54:38 -0800 (PST) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.112.6.68 with HTTP; Fri, 17 Jan 2014 11:53:58 -0800 (PST) In-Reply-To: <1389976678.2057.8.camel@smugmug> References: <1389976678.2057.8.camel@smugmug> Date: Sat, 18 Jan 2014 04:53:58 +0900 X-Google-Sender-Auth: bXdxpfIyrj28PUTfI4x6V380m0o Message-ID: To: Mike Cc: Nikita Popov , "internals@lists.php.net" Content-Type: multipart/alternative; boundary=047d7b342f86815e3004f02fe96a Subject: Re: [PHP-DEV] [RFC] Multibyte char handling From: yohgaki@ohgaki.net (Yasuo Ohgaki) --047d7b342f86815e3004f02fe96a Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Hi Mike, On Sat, Jan 18, 2014 at 1:37 AM, Mike wrote: > On Fri, 2014-01-17 at 07:34 +0900, Yasuo Ohgaki wrote: > > > This discussion circulate discussion. > > I'm also in favor of putting those things in ext/mbstring. > I'll make this a vote option. > > > > At first, I proposed locale based solution using php_mblen(). > > This approach does not require additional encoding parameter > > since encoding is specified by locale. > > Meh, but would be okay for me. > It's feasible solution for older versions. I would like to remove locale based code for future releases, though. Functions that are using php_mblen() could be modified to use mbstring when PHP is built with mbstring. Functions may use internal_encoding. Use of internal_encoding requires user code modification in some cases. For instance, Japanese Windows command line uses Shift_JIS as terminal encoding while many users uses UTF-8 for script. Users has to add code that changes internal_encoding. e.g. escapeshellarg()/escapeshellcmd(). They could use simple wrapper for escapeshellarg()/escapeshellcmd(), though= . Although users have to modify their code a little, fgetcsv() and like would be more usable because it's more reliable than locale. It may be better to add mb version of these functions and deprecate them like addslashes(), if we are not going to modify these functions. > > > However, some people don't like the solution (in security ML) > > because it is locale based solution. It may have unwanted side > > effects. Locale is unreliable and most user just don't care about it. > > > > Therefore, I proposed this approach that introduce encoding > > parameter just like htmlspecialchars()/htmlentities(). > > > > Encoding parameter (or some way to specify encoding) for security > > related string function is mandatory. We should provide some way > > to specify encoding. > > > > Many of us do not have access to that mailing list, so yould you shed > some light on the acutal issue? > There are 2 classes of security issue in php_addslashes() First is PHP script execution. Suppose this is a script save app config script. then read it as PHP script. If '=E8=A1=A8' is SJIS, the char code is 0x955c (0x5c =3D \). Since addslas= hes() is not multibyte aware, it escapes the char as 0x95, 0x5c, 0x5c. This make possible that break out string quoting and write attack code. The contents of myconfig.php became