Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:71247 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 39600 invoked from network); 18 Jan 2014 18:52:45 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 18 Jan 2014 18:52:45 -0000 Authentication-Results: pb1.pair.com header.from=yohgaki@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=yohgaki@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.215.41 as permitted sender) X-PHP-List-Original-Sender: yohgaki@gmail.com X-Host-Fingerprint: 209.85.215.41 mail-la0-f41.google.com Received: from [209.85.215.41] ([209.85.215.41:54207] helo=mail-la0-f41.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 21/70-36499-C7DCAD25 for ; Sat, 18 Jan 2014 13:52:44 -0500 Received: by mail-la0-f41.google.com with SMTP id mc6so4598742lab.28 for ; Sat, 18 Jan 2014 10:52:41 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:from:date:message-id :subject:to:cc:content-type; bh=POZ1owGz0PD1/k2XvGV9bxCFGsVbexmepb00cqGeVfE=; b=TWht36lmnKVv3+Wu5Jm9UYQYT+qV+iCUeP8GGt2JyMJqmb7JFgzWLr1gXIoH3sLvSn EdSDmJU9ekhMGmkxxr7G3KPv8d5mTwAfo+37EXZU6TlyTQtyfjDEbXd9KmXY6/8i7zVg jfobe1bJV/frKQetSqynMGj+Yta8ra1sul1fYPxpb+Ix6veIbaPiqH3/yd4b3TTZZx05 7+j53F7t1p6Z6En3YaC++ZR54cYe67X78sdKP0NUdP3MTVccreIhbiT+j4FwFCXeYD65 C7HS4zgQgqiuvalKp2jTwQxTQyYEsPh1aZbU4dJpNLxO6XCLcWilPFAxWCKUygpCnspp iokw== X-Received: by 10.112.199.225 with SMTP id jn1mr178247lbc.49.1390071161116; Sat, 18 Jan 2014 10:52:41 -0800 (PST) MIME-Version: 1.0 Sender: yohgaki@gmail.com Received: by 10.112.6.68 with HTTP; Sat, 18 Jan 2014 10:52:01 -0800 (PST) In-Reply-To: <52DA84A2.2050305@lsces.co.uk> References: <52DA84A2.2050305@lsces.co.uk> Date: Sun, 19 Jan 2014 03:52:01 +0900 X-Google-Sender-Auth: eF4WQJEwF0NRgEnniAy0XoEANOM Message-ID: To: Lester Caine Cc: "internals@lists.php.net" Content-Type: multipart/alternative; boundary=001a11c38264c55ced04f0432922 Subject: Re: [PHP-DEV] [RFC] Multibyte char handling From: yohgaki@ohgaki.net (Yasuo Ohgaki) --001a11c38264c55ced04f0432922 Content-Type: text/plain; charset=UTF-8 Hi Lester, On Sat, Jan 18, 2014 at 10:41 PM, Lester Caine wrote: > Multibyte characters are still a contentious area, and the current > compromise of supporting multibyte content, but being essentially 'single > byte' for the programming structure as been a solution adopted in a few > projects. Firebird is once again debating the same point that they and PHP > last discussed 10 years ago, and was too difficult so PHP6 floundered and > Firebird remained essentially single byte strings in the metadata. > Making a product only works for single byte char is completely OK. The issue is there is no proper function/method/feature that escapes PHP string with multibyte chars correctly. PHP needs to provide API that handles data properly/safely. It's awful that reading var_export()ed data could execute arbitrarily PHP script and/or terminate script execution, isn't it? It cannot be ignored. 10 years on isn't it time to re-open the debate on making the core unicode > since 32 bit processors are more likely to be the norm these days. > Certainly if everything internal is UTF8, then all of the encoding problems > are moved to the client interface? > I'm not proposing transition like Python 2.x to 3.x. The RFC is proposing required feature for proper/safe coding. Anyway, it seems Python's approach is not working well. We could learn from it. Server side should never expect clients are sending proper data, therefore proper encoding handling is mandatory on server side. Adoption of UTF-8 makes things easier, but there are ways to exploit UTF-8 encoding also. For example, recent Chrome may display blank page with malformed chars and it could be used for DoS attack, mixing systems that validate and un-validate encoding could be vulnerable DoS. New mb functions handle encoding properly not only SJIS like encoding but also any encoding supported by mbstiring. Did I make typo? My Chrome did not report spell error. I appreciate if you point it out. I sent correct URL right after first mail, but it wouldn't work. I also would not check second mail in long thread :( https://wiki.php.net/rfc/multibyte_char_handling Regards, -- Yasuo Ohgaki yohgaki@ohgaki.net --001a11c38264c55ced04f0432922--