Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:73748 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 28380 invoked from network); 21 Apr 2014 10:44:14 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 21 Apr 2014 10:44:14 -0000 Authentication-Results: pb1.pair.com smtp.mail=jakub.php@gmail.com; spf=pass; sender-id=pass Authentication-Results: pb1.pair.com header.from=jakub.php@gmail.com; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.216.43 as permitted sender) X-PHP-List-Original-Sender: jakub.php@gmail.com X-Host-Fingerprint: 209.85.216.43 mail-qa0-f43.google.com Received: from [209.85.216.43] ([209.85.216.43:57464] helo=mail-qa0-f43.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B0/90-25151-B76F4535 for ; Mon, 21 Apr 2014 06:44:11 -0400 Received: by mail-qa0-f43.google.com with SMTP id j15so3642987qaq.16 for ; Mon, 21 Apr 2014 03:44:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type; bh=OrV2fFftcKxXFrNPwi/FpA/781RC24py6O8tTBqQOyY=; b=Ix+lAdO8a/9wHnF7xOJS21J8ZnlBIEDJ7m+2g/TKp2HRTT2UW5YssRrt5Y62WjfYXH lHzpZw0M+x4jcmHUTqazhuBEwO8JVsSbhAaQeMBeQFxMymswZbIiIJZyQrdg9EJE5AXS /yI2W6duaaLlkeCdlz7xrLGOCVZQxyKzkXgDdSMpq6ks95Cav1jljm3C7QBIbZf96mtz KJJILmhjbzfxuM/eTKWJwalvsoDjoarBfSG8SeHdr4LTDco0y4Rv4RNCFMaoDxtVOFv+ F3XcAC/HWzXpEPJwrtr8q2TXAG3PPHECtJiTMQBDGyafL7n9guN2atSGt9eacBWCQPIf ehXw== MIME-Version: 1.0 X-Received: by 10.224.126.9 with SMTP id a9mr33854701qas.39.1398077048647; Mon, 21 Apr 2014 03:44:08 -0700 (PDT) Sender: jakub.php@gmail.com Received: by 10.224.207.138 with HTTP; Mon, 21 Apr 2014 03:44:08 -0700 (PDT) In-Reply-To: References: <52FF3BB7.8030408@lsces.co.uk> <52FF465E.4040400@lsces.co.uk> Date: Mon, 21 Apr 2014 11:44:08 +0100 X-Google-Sender-Auth: JCd12JxwuxJG5KAgIhCCbeB1Hsk Message-ID: To: Yasuo Ohgaki , Stas Malyshev Cc: Lester Caine , PHP internals Content-Type: multipart/alternative; boundary=001a11c2ef6eda743c04f78b2d64 Subject: Re: [PHP-DEV] utf-8 filenames in phar files. From: bukka@php.net (Jakub Zelenka) --001a11c2ef6eda743c04f78b2d64 Content-Type: text/plain; charset=UTF-8 On Sat, Feb 15, 2014 at 10:53 AM, Yasuo Ohgaki wrote: > On Sat, Feb 15, 2014 at 7:50 PM, Lester Caine wrote: > > > My previous post did not appear on the list ;) > > > > > > Yasuo Ohgaki wrote: > > > >> A lot of the current confusion does seem to be based around the > >> Windows > >> Wide-API as documented in 'The Problem' section of that document. It > >> would > >> seem that my 'naive' view of simply using UTF-8 strings is thwarted > >> by these > >> problems?-- > >> > >> Unicode is like one name with several encoding. We cannot get away from > >> conversions, normalization especially. > >> > > > > That is why personally I'm just looking at UTF8. Which is enough of a > mine > > field on it's own, but since a large swath of what we are working with > now > > is only UTF8 it does seem to be the right base going forward? > > > > I have problem, too. It seems someone is working on DKIM. > Anyway, there are problems, but UTF-8 is way to go. We just cannot remove > conversions. Normalization has number of issues, including security related > one. > > Regards, > > -- > Yasuo Ohgaki > yohgaki@ohgaki.net > Hi Stas, I just saw that the patch has been merged. There was an objection from Yasuo about normalization in past... What's worse the patch doesn't check UTF-8 correctly. It accepts invalid UTF-8. The correct spec is in https://tools.ietf.org/html/rfc3629#section-4. I have got re2c implementation in jsond scanner : https://github.com/bukka/php-jsond/blob/master/jsond_scanner.re#L122-L134 . As you can see it's very different from the provided impl. I think that accepting ill-formed UTF-8 would be a mistake and as such the patch should be reverted. Thanks Jakub --001a11c2ef6eda743c04f78b2d64--