Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73751 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 37091 invoked from network); 21 Apr 2014 12:10:03 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Apr 2014 12:10:03 -0000 Authentication-Results: pb1.pair.com smtp.mail=jakub.php@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=jakub.php@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.192.52 as permitted sender) X-PHP-List-Original-Sender: jakub.php@gmail.com X-Host-Fingerprint: 209.85.192.52 mail-qg0-f52.google.com Received: from [209.85.192.52] ([209.85.192.52:40741] helo=mail-qg0-f52.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 2B/80-32701-59A05535 for ; Mon, 21 Apr 2014 08:09:58 -0400 Received: by mail-qg0-f52.google.com with SMTP id q107so3948911qgd.11 for ; Mon, 21 Apr 2014 05:09:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=vSXItTGu8yhw9PQfM935v69sY4Jm6/1dTPtz31LJU9s=; b=NC1uMpuCNnzSY9WqPfP8uUZRCDWaNMoxzPZWUpb1DLErznpla1i1BkbjgGeidclyI9 VVvUQmpAwnsoVYvVdAmnbpIz2hp+wTVHdkCYmMqZfRy+o6azTyJ0E8575v7LN4iKcyUa q6UnxfZwMGPFMx3ReGcj0qbqGwKj4+bhb92Dmfmz/7IKxvgERRFHtWKlm6ox91KYtGEz RcKiwoJrF618SuEDlm15M/6bUeXUvqpH1su/FHMrmiOgst0xysT8yChmFpAJiG9gfV5a jY36uEvJhKhaJaRc/ZcJ5djlYgnewBTnEwYmqEhazzrS+fbY8HnH9oyfUzzReAKcJ8Ms YGOA== MIME-Version: 1.0 X-Received: by 10.224.73.136 with SMTP id q8mr14357919qaj.54.1398082195051; Mon, 21 Apr 2014 05:09:55 -0700 (PDT) Sender: jakub.php@gmail.com Received: by 10.224.207.138 with HTTP; Mon, 21 Apr 2014 05:09:54 -0700 (PDT) In-Reply-To: References: <52FF3BB7.8030408@lsces.co.uk> <52FF465E.4040400@lsces.co.uk> Date: Mon, 21 Apr 2014 13:09:54 +0100 X-Google-Sender-Auth: Z8dbDDt7aglJvPjlbpGZETgHquE Message-ID: To: Yasuo Ohgaki , Stas Malyshev Cc: Lester Caine , PHP internals Content-Type: multipart/alternative; boundary=001a11c3dc489a58f204f78c6046 Subject: Re: [PHP-DEV] utf-8 filenames in phar files. From: bukka@php.net (Jakub Zelenka) --001a11c3dc489a58f204f78c6046 Content-Type: text/plain; charset=UTF-8 On Mon, Apr 21, 2014 at 11:44 AM, Jakub Zelenka wrote: > On Sat, Feb 15, 2014 at 10:53 AM, Yasuo Ohgaki wrote: > >> On Sat, Feb 15, 2014 at 7:50 PM, Lester Caine wrote: >> >> > My previous post did not appear on the list ;) >> > >> > >> > Yasuo Ohgaki wrote: >> > >> >> A lot of the current confusion does seem to be based around the >> >> Windows >> >> Wide-API as documented in 'The Problem' section of that document. >> It >> >> would >> >> seem that my 'naive' view of simply using UTF-8 strings is thwarted >> >> by these >> >> problems?-- >> >> >> >> Unicode is like one name with several encoding. We cannot get away from >> >> conversions, normalization especially. >> >> >> > >> > That is why personally I'm just looking at UTF8. Which is enough of a >> mine >> > field on it's own, but since a large swath of what we are working with >> now >> > is only UTF8 it does seem to be the right base going forward? >> > >> >> I have problem, too. It seems someone is working on DKIM. >> Anyway, there are problems, but UTF-8 is way to go. We just cannot remove >> conversions. Normalization has number of issues, including security >> related >> one. >> >> Regards, >> >> -- >> Yasuo Ohgaki >> yohgaki@ohgaki.net >> > > > Hi Stas, > > I just saw that the patch has been merged. There was an objection from > Yasuo about normalization in past... > > What's worse the patch doesn't check UTF-8 correctly. It accepts invalid > UTF-8. The correct spec is in > https://tools.ietf.org/html/rfc3629#section-4 . I have got re2c > implementation in jsond scanner : > https://github.com/bukka/php-jsond/blob/master/jsond_scanner.re#L122-L134. As you can see it's very different from the provided impl. > > I think that accepting ill-formed UTF-8 would be a mistake and as such the > patch should be reverted. > > Thanks > > Jakub > > I have created a quick PR: https://github.com/php/php-src/pull/649 that is fixing the ill-formed UTF-8 paths. Jakub --001a11c3dc489a58f204f78c6046--