Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:77338 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 275 invoked from network); 19 Sep 2014 13:49:13 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 19 Sep 2014 13:49:13 -0000 Authentication-Results: pb1.pair.com header.from=dunglas@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=dunglas@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.217.173 as permitted sender) X-PHP-List-Original-Sender: dunglas@gmail.com X-Host-Fingerprint: 209.85.217.173 mail-lb0-f173.google.com Received: from [209.85.217.173] ([209.85.217.173:51520] helo=mail-lb0-f173.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 61/00-34453-8543C145 for ; Fri, 19 Sep 2014 09:49:13 -0400 Received: by mail-lb0-f173.google.com with SMTP id 10so1453645lbg.32 for ; Fri, 19 Sep 2014 06:49:09 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:in-reply-to:references:from:date:message-id:subject:to :cc:content-type; bh=mS39rThaUurYvuKwI0d8M2vOFZ2vBKR0kv89pm6LIn0=; b=pE+/SAbvEo+v7ptw7mcfO6h9NZnPMadkbRCa/XmrcnwAUVw3vwFYaXIR07LZdmr3tq R0TrYNmQLKkUKRE35kwdhen/5sH2iqn6R1GfD5g3skpk35FIWFYj/pGfS82lzrBeGzDZ YPszj/lyCFnLDhi+4nwaNrD0uoenHQ282KTdAyQDNmc6xG8JeRxfTbXlJvi8AIYlO8rl 9SXcRNe81ahS24P7HYQRwXrwT0ztiKlP+7deUPGh7pNL8BCVk1t0jmt6VtBU5EzQHSlv FvoMiqJNC2sZA/zaHrf5/OFSVRW1dfRRrioxCqzf3LDOMxR8hLcQ+y9UG280mf/eMtNT 40aA== X-Received: by 10.112.34.210 with SMTP id b18mr6723680lbj.62.1411134549553; Fri, 19 Sep 2014 06:49:09 -0700 (PDT) MIME-Version: 1.0 Received: by 10.114.161.164 with HTTP; Fri, 19 Sep 2014 06:48:49 -0700 (PDT) In-Reply-To: References: Date: Fri, 19 Sep 2014 15:48:49 +0200 Message-ID: To: Chris Wright Cc: Pierre Joye , PHP internals Content-Type: multipart/alternative; boundary=14dae93d93808e767605036b5dbb Subject: Re: [PHP-DEV] Internationalized Domain Name support in FILTER_VALIDATE_URL From: dunglas@gmail.com (=?UTF-8?Q?K=C3=A9vin_Dunglas?=) --14dae93d93808e767605036b5dbb Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Support of IDN in streams is a must have. But there is a lot of other use cases for URL with IDN validation. The most common is probably form validation (test if an user submitted URL has a valid format and can be used to create an HTML link...). I'm ok making IDN validation optional and not used by default until PHP natively support IDN in other features such as streams. But IDN are used more and more in the wild, and from a user point of view it is disappointing that a valid URL, working in browsers and even displayed by Google Search is not considered as a valid URL by a PHP-based website using filter_var() without a specific flag. Even some TLD are using non-ASCII characters, exemple: http://=E6=97=85=E6= =B8=B8=E6=B0=94=E8=B1=A1.=E4=B8=AD=E5=9B=BD (popular Chinese weather site). About the library, I've not preference between libidn and icu. If the licence is libidn fit better with the PHP one, libidn is probably the better choice. Having a PHP specific implementation of STRINGPREP and Punnycode sounds not like a good idea (reinventing the wheel, more code to maintain). Chris, is there a chance to have your work on streams merged in PHP 7? What do you thing about the following planning: - 5.7 (if exists): add IDN support in filter disabled by default. Use libidn if selected to be used for streams too. - 7 (if IDN support for streams is completed): validate IDN by default (what the user expect), add a flag to disable IDN validation. Of course we'll update the doc explaining the new behavior. 2014-09-19 12:28 GMT+02:00 Chris Wright : > On 19 September 2014 10:58, Pierre Joye wrote: > > Hi, > > > > On Sep 19, 2014 4:03 PM, "Chris Wright" wrote: > >> > >> K=C3=A9vin > >> > >> On 18 September 2014 21:26, K=C3=A9vin Dunglas wro= te: > >> > Hello, > >> > > >> > I'm working on enhancing the FILTER_VALIDATE_URL filter ( > >> > https://github.com/php/php-src/pull/826). > >> > The current implementation does not support validation of > >> > internationalized > >> > domain names (i.e: http://www.acad=C3=A9mie-fran=C3=A7aise.fr/ > > >> > ). > >> > > >> > Support of IDN validation can be easily added using ICU's > >> > uidna_toASCII() > >> > function. > >> > > >> > Is it acceptable to add a dependency to ICU for ext/filter? > >> > Another option is to add a HAVE_ICU constant in main/php_config.h an= d > to > >> > validate IDN only if ICU is present. > >> > > >> > What strategy is preferred? > >> > >> I've done some work around this area previously, and all I will say > >> is: be careful with what you do with this from a userland PoV. > >> > >> PHP does not natively support IDN in stream open routines or SSL > >> verification routines. It will never support these things without at > >> least one of: > >> - a core dependency on ICU, libidn or similar > >> - moving streams into an extension so a dependency can be introduced > >> there (probably not sanely possible) > >> - an in-house NAMEPREP implementation (this is the hard part of IDN, > >> punycode itself is pretty trivial to implement once you have a > >> canonical set of codepoints) > >> > >> These things can be implemented with *a lot* of boilerplate in > >> userland when you have ext/intl, but it's not pretty. libcurl *can* > >> support IDN if it was built against libidn, I'm not sure if this is > >> currently the case in common distributions or not. Since one almost > >> never just validates a URL string, it's usually a precursor to > >> attempting to open it, this could lead to some pretty hefty wtfs. > >> > >> In short, while I'm generally for ext/filter being able to handle IDN, > >> I *do not* believe it should do it implicitly, it should require an > >> explicit flag, because it will break *a lot* of code if IDN is > >> suddenly treated as valid where it previously wasn't. > > > > I am really not sure about that especially the enabling by default part= . > > > > The doc is pretty clear about what this filter supports and allowing id= n > may > > break a lot of codes out there. > > > > From an implementation point of view we may not need ICU to support IDN= . > > Windows does not use it and there are license friendly decoder > > implementations too. > > If we can agree on adding a core dependency on , > I already have an experimental local branch that adds full IDN support > to streams. It's based on libidn but it would be easy enough to swap > it out for something else that provides the same functionality. > > In my (biased) opinion, streams are a far more important element of > IDN support. Filter validation is just polish/a nicety on top. > --=20 K=C3=A9vin Dunglas http://dunglas.fr --14dae93d93808e767605036b5dbb--