Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:77340 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 4026 invoked from network); 19 Sep 2014 14:18:29 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 19 Sep 2014 14:18:29 -0000 Authentication-Results: pb1.pair.com header.from=are.you.winning@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=are.you.winning@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.192.47 as permitted sender) X-PHP-List-Original-Sender: are.you.winning@gmail.com X-Host-Fingerprint: 209.85.192.47 mail-qg0-f47.google.com Received: from [209.85.192.47] ([209.85.192.47:45283] helo=mail-qg0-f47.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 8A/B0-34453-33B3C145 for ; Fri, 19 Sep 2014 10:18:28 -0400 Received: by mail-qg0-f47.google.com with SMTP id q107so1828669qgd.34 for ; Fri, 19 Sep 2014 07:18:25 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=fNvrcrIXtRLgX7bbrfFl4Mp9PQiY6WWWog39SDOqDHs=; b=BxqC6hEtEzdkIHp+QAGMsGGjhzaDk032DXd43zQQAhRJ4gt8WvhHU66IENxmstEtfv 6XBUW06ld/vVHx12fqYzzflkYl8wtmDOCFyOyfJs1kaQB4hGaBIve66GBWq3zNXfIW9O YSjGymKwJPxEg/TNFLoEkkUtf9AOYUgSEf1pt+fhMBOXKesz5mUWEab8e5L8lWO41ieI GFBL0hWW4LLPOHFEeCAG/PmW+to0MjS9et32PpdS2K4NcdtNXrvYEaQUHIjQNYvvmnka FUPhefYcn09o+MvbUq7YLWJOFAJStGlKs2jbhKGwm6gsFmvkZS+5bgUV2bDVQ5153F05 mpIA== MIME-Version: 1.0 X-Received: by 10.224.92.83 with SMTP id q19mr879967qam.29.1411136305634; Fri, 19 Sep 2014 07:18:25 -0700 (PDT) Sender: are.you.winning@gmail.com Received: by 10.141.28.193 with HTTP; Fri, 19 Sep 2014 07:18:25 -0700 (PDT) In-Reply-To: References: Date: Fri, 19 Sep 2014 15:18:25 +0100 X-Google-Sender-Auth: d6ACGpS98agurnC-4kRcFxGstwM Message-ID: To: =?UTF-8?Q?K=C3=A9vin_Dunglas?= Cc: Chris Wright , Pierre Joye , PHP internals Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Internationalized Domain Name support in FILTER_VALIDATE_URL From: cw@daverandom.com (Chris Wright) On 19 September 2014 14:48, K=C3=A9vin Dunglas wrote: > Support of IDN in streams is a must have. > But there is a lot of other use cases for URL with IDN validation. The mo= st > common is probably form validation (test if an user submitted URL has a > valid format and can be used to create an HTML link...). > > I'm ok making IDN validation optional and not used by default until PHP > natively support IDN in other features such as streams. > But IDN are used more and more in the wild, and from a user point of view= it > is disappointing that a valid URL, working in browsers and even displayed= by > Google Search is not considered as a valid URL by a PHP-based website usi= ng > filter_var() without a specific flag. > > Even some TLD are using non-ASCII characters, exemple: http://=E6=97=85= =E6=B8=B8=E6=B0=94=E8=B1=A1.=E4=B8=AD=E5=9B=BD > (popular Chinese weather site). > > About the library, I've not preference between libidn and icu. If the > licence is libidn fit better with the PHP one, libidn is probably the bet= ter > choice. Having a PHP specific implementation of STRINGPREP and Punnycode > sounds not like a good idea (reinventing the wheel, more code to maintain= ). > > Chris, is there a chance to have your work on streams merged in PHP 7? It's very hacky and PoC at the moment. I've got a bunch of time-consuming personal things going on right now, but within the next couple of weeks I will try and polish it up into something serviceable, maintainable and tested/less likely to explode with edge-cases and then I'll put it up for discussion. I'm also fine if someone else wants to have a crack in the meantime, I can push my work so far to github early next week when I get access to the machine. I'd certainly like the functionality to be in 7 if it's viable from a licensing and dependency PoV - I had been holding off bringing it up to see what happened with the more general unicode support discussion (which I somewhat lost track of and seems to have died out) as there was talk of introducing a hard dependency on ICU-or-similar at one point, which would have made this a no-brainer. > What do you thing about the following planning: > - 5.7 (if exists): add IDN support in filter disabled by default. Use lib= idn > if selected to be used for streams too. > - 7 (if IDN support for streams is completed): validate IDN by default (w= hat > the user expect), add a flag to disable IDN validation. Of course we'll > update the doc explaining the new behavior. > > 2014-09-19 12:28 GMT+02:00 Chris Wright : >> >> On 19 September 2014 10:58, Pierre Joye wrote: >> > Hi, >> > >> > On Sep 19, 2014 4:03 PM, "Chris Wright" wrote: >> >> >> >> K=C3=A9vin >> >> >> >> On 18 September 2014 21:26, K=C3=A9vin Dunglas wr= ote: >> >> > Hello, >> >> > >> >> > I'm working on enhancing the FILTER_VALIDATE_URL filter ( >> >> > https://github.com/php/php-src/pull/826). >> >> > The current implementation does not support validation of >> >> > internationalized >> >> > domain names (i.e: http://www.acad=C3=A9mie-fran=C3=A7aise.fr/ >> >> > ). >> >> > >> >> > Support of IDN validation can be easily added using ICU's >> >> > uidna_toASCII() >> >> > function. >> >> > >> >> > Is it acceptable to add a dependency to ICU for ext/filter? >> >> > Another option is to add a HAVE_ICU constant in main/php_config.h a= nd >> >> > to >> >> > validate IDN only if ICU is present. >> >> > >> >> > What strategy is preferred? >> >> >> >> I've done some work around this area previously, and all I will say >> >> is: be careful with what you do with this from a userland PoV. >> >> >> >> PHP does not natively support IDN in stream open routines or SSL >> >> verification routines. It will never support these things without at >> >> least one of: >> >> - a core dependency on ICU, libidn or similar >> >> - moving streams into an extension so a dependency can be introduced >> >> there (probably not sanely possible) >> >> - an in-house NAMEPREP implementation (this is the hard part of IDN, >> >> punycode itself is pretty trivial to implement once you have a >> >> canonical set of codepoints) >> >> >> >> These things can be implemented with *a lot* of boilerplate in >> >> userland when you have ext/intl, but it's not pretty. libcurl *can* >> >> support IDN if it was built against libidn, I'm not sure if this is >> >> currently the case in common distributions or not. Since one almost >> >> never just validates a URL string, it's usually a precursor to >> >> attempting to open it, this could lead to some pretty hefty wtfs. >> >> >> >> In short, while I'm generally for ext/filter being able to handle IDN= , >> >> I *do not* believe it should do it implicitly, it should require an >> >> explicit flag, because it will break *a lot* of code if IDN is >> >> suddenly treated as valid where it previously wasn't. >> > >> > I am really not sure about that especially the enabling by default par= t. >> > >> > The doc is pretty clear about what this filter supports and allowing i= dn >> > may >> > break a lot of codes out there. >> > >> > From an implementation point of view we may not need ICU to support ID= N. >> > Windows does not use it and there are license friendly decoder >> > implementations too. >> >> If we can agree on adding a core dependency on , >> I already have an experimental local branch that adds full IDN support >> to streams. It's based on libidn but it would be easy enough to swap >> it out for something else that provides the same functionality. >> >> In my (biased) opinion, streams are a far more important element of >> IDN support. Filter validation is just polish/a nicety on top. > > > > > -- > K=C3=A9vin Dunglas > > http://dunglas.fr