Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73766 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 92312 invoked from network); 22 Apr 2014 18:36:36 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 22 Apr 2014 18:36:36 -0000 Authentication-Results: pb1.pair.com header.from=jakub.php@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=jakub.php@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.174 as permitted sender) X-PHP-List-Original-Sender: jakub.php@gmail.com X-Host-Fingerprint: 209.85.216.174 mail-qc0-f174.google.com Received: from [209.85.216.174] ([209.85.216.174:52379] helo=mail-qc0-f174.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B1/00-26481-2B6B6535 for ; Tue, 22 Apr 2014 14:36:34 -0400 Received: by mail-qc0-f174.google.com with SMTP id c9so5700865qcz.5 for ; Tue, 22 Apr 2014 11:36:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=B4vRsYbQhqc/Joru1DYSfi4kQkyb+iHdY+0ecLb/+qM=; b=GSSigsUOOm9bgBOa+3WMR1E9lz6Q9lZp08j3W5uma87OrakX+VJKGj6r8tKC/8omNl cWVSq1RSO7NvIKcFu/zzTci5V6eLjAzg14KGvU7EZmNDW7/j5IN7ovTWJVJWsGegxo+L B7cVF01/nUOZHVv8hUQ37WShxZMMhs2fcXUjNUYBlKyL0wgS0qVcjurkn/Uf4aEzbDd1 e9BDaMuMd2vTYQJ9yJBFVI6KiXOYQLGlCUQu941kPo3MS6+HFGN5Z5EluF7Y1pNTjgQ/ qx9SdMgPhn3cI/rfjZGD+ZHsVWlvgBH5mjdxugdZ3iEzsfliCnjBv6lrcmsLfLONvabm u/QA== MIME-Version: 1.0 X-Received: by 10.224.165.20 with SMTP id g20mr52883907qay.10.1398191791415; Tue, 22 Apr 2014 11:36:31 -0700 (PDT) Sender: jakub.php@gmail.com Received: by 10.224.207.138 with HTTP; Tue, 22 Apr 2014 11:36:31 -0700 (PDT) In-Reply-To: <5355A48D.7050600@sugarcrm.com> References: <52FF3BB7.8030408@lsces.co.uk> <52FF465E.4040400@lsces.co.uk> <5355A48D.7050600@sugarcrm.com> Date: Tue, 22 Apr 2014 19:36:31 +0100 X-Google-Sender-Auth: S5NF9a4MhPJJGqIBHdGb0AKvwx8 Message-ID: To: Stas Malyshev Cc: Yasuo Ohgaki , Lester Caine , PHP internals Content-Type: multipart/alternative; boundary=089e0149c9c40e267404f7a5e5b6 Subject: Re: [PHP-DEV] utf-8 filenames in phar files. From: bukka@php.net (Jakub Zelenka) --089e0149c9c40e267404f7a5e5b6 Content-Type: text/plain; charset=UTF-8 On Tue, Apr 22, 2014 at 12:06 AM, Stas Malyshev wrote: > Hi! > > > I have created a quick PR: https://github.com/php/php-src/pull/649 that > > is fixing the ill-formed UTF-8 paths. > > Thanks for the patch. One thing I'd like to understand is what is the > added value of being so strict in checking UTF-8. I.e. what would happen > if we allow some path with weird chars in? > > I think that validation is important to prevent user errors. The problem is that the currently accepted implementation (MB2, MB3, MB4) just pretending that validates UTF-8. However the validation is incorrect. It doesn't allow to use weird characters (there is already check for UTF-8 sequences) but it allows surrogate pair code points which is wrong IMHO. The PR fixes that. It just correctly checks for ill-formed charcters. In regards to normalization I think that Yasuo is right that there will be unhappy users. On the other side, I think that there are more users that would like to use UTF-8 paths at all. Normalization is a bit tricky and the only solution that comes to my mind ATM is dependency on ICU which wouldn't be right IMHO. Jakub --089e0149c9c40e267404f7a5e5b6--