Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:77330 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 54706 invoked from network); 19 Sep 2014 09:03:04 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 19 Sep 2014 09:03:04 -0000 Authentication-Results: pb1.pair.com header.from=are.you.winning@gmail.com; sender-id=pass Authentication-Results: pb1.pair.com smtp.mail=are.you.winning@gmail.com; spf=pass; sender-id=pass Received-SPF: pass (pb1.pair.com: domain gmail.com designates 209.85.192.47 as permitted sender) X-PHP-List-Original-Sender: are.you.winning@gmail.com X-Host-Fingerprint: 209.85.192.47 mail-qg0-f47.google.com Received: from [209.85.192.47] ([209.85.192.47:40932] helo=mail-qg0-f47.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id CF/71-44461-741FB145 for ; Fri, 19 Sep 2014 05:03:04 -0400 Received: by mail-qg0-f47.google.com with SMTP id q107so1489572qgd.6 for ; Fri, 19 Sep 2014 02:03:01 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20120113; h=mime-version:sender:in-reply-to:references:date:message-id:subject :from:to:cc:content-type:content-transfer-encoding; bh=FCdEFDpPvdenlaMEps7l7Xtv70AW1M8MyJKilk4ErZY=; b=W1NaWHyaw3CeyHyv6rYMv535ljFnMXcX5wn/fBc4X//fzrHsBkQKDdF+Cbbe7We35K 91RqDdvMSi9Orm/VeKHq0e5H7jL0ULisCjAlA3uHjYOQ6Ddsett8RawZrZ0OlDxx4c6x P5rh6wqtq+dnXGBufTXkImIrMWtmmTGxlJE9V0m/ZMcCO3M/iF3m5LGjwh4ttyBy5dkL au27nrbp/IiOJjTTiC/MKGGHVUGOzWNFL7OhlyEfvhqtyC0jRIwkbQOrM6sPSONNHVUv Sv7O2mBJX50E37dObxOHJ61ixtK/SVKRf6Hd7md0e/9zxWvlC6oPbTwPIFGClHbVcqRx Nz0A== MIME-Version: 1.0 X-Received: by 10.224.137.193 with SMTP id x1mr16963732qat.56.1411117381611; Fri, 19 Sep 2014 02:03:01 -0700 (PDT) Sender: are.you.winning@gmail.com Received: by 10.141.28.193 with HTTP; Fri, 19 Sep 2014 02:03:01 -0700 (PDT) In-Reply-To: References: Date: Fri, 19 Sep 2014 10:03:01 +0100 X-Google-Sender-Auth: 7bAAFFrw7s0-2q8Y4P18KcJYpHs Message-ID: To: =?UTF-8?Q?K=C3=A9vin_Dunglas?= Cc: "internals@lists.php.net" Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Subject: Re: [PHP-DEV] Internationalized Domain Name support in FILTER_VALIDATE_URL From: cw@daverandom.com (Chris Wright) K=C3=A9vin On 18 September 2014 21:26, K=C3=A9vin Dunglas wrote: > Hello, > > I'm working on enhancing the FILTER_VALIDATE_URL filter ( > https://github.com/php/php-src/pull/826). > The current implementation does not support validation of internationaliz= ed > domain names (i.e: http://www.acad=C3=A9mie-fran=C3=A7aise.fr/ > ). > > Support of IDN validation can be easily added using ICU's uidna_toASCII() > function. > > Is it acceptable to add a dependency to ICU for ext/filter? > Another option is to add a HAVE_ICU constant in main/php_config.h and to > validate IDN only if ICU is present. > > What strategy is preferred? I've done some work around this area previously, and all I will say is: be careful with what you do with this from a userland PoV. PHP does not natively support IDN in stream open routines or SSL verification routines. It will never support these things without at least one of: - a core dependency on ICU, libidn or similar - moving streams into an extension so a dependency can be introduced there (probably not sanely possible) - an in-house NAMEPREP implementation (this is the hard part of IDN, punycode itself is pretty trivial to implement once you have a canonical set of codepoints) These things can be implemented with *a lot* of boilerplate in userland when you have ext/intl, but it's not pretty. libcurl *can* support IDN if it was built against libidn, I'm not sure if this is currently the case in common distributions or not. Since one almost never just validates a URL string, it's usually a precursor to attempting to open it, this could lead to some pretty hefty wtfs. In short, while I'm generally for ext/filter being able to handle IDN, I *do not* believe it should do it implicitly, it should require an explicit flag, because it will break *a lot* of code if IDN is suddenly treated as valid where it previously wasn't. Thanks, Chris