Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:63719 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 35228 invoked from network); 30 Oct 2012 17:26:40 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 30 Oct 2012 17:26:40 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@golemon.com; spf=softfail; sender-id=softfail Authentication-Results: pb1.pair.com header.from=php@golemon.com; sender-id=softfail Received-SPF: softfail (pb1.pair.com: domain golemon.com does not designate 209.85.223.170 as permitted sender) X-PHP-List-Original-Sender: php@golemon.com X-Host-Fingerprint: 209.85.223.170 mail-ie0-f170.google.com Received: from [209.85.223.170] ([209.85.223.170:34143] helo=mail-ie0-f170.google.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 10/02-23554-FCD00905 for ; Tue, 30 Oct 2012 12:26:40 -0500 Received: by mail-ie0-f170.google.com with SMTP id c12so888927ieb.29 for ; Tue, 30 Oct 2012 10:26:37 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20120113; h=mime-version:sender:x-originating-ip:in-reply-to:references:date :x-google-sender-auth:message-id:subject:from:to:cc:content-type :x-gm-message-state; bh=ZVO7y/3i+GKeHKF+TJ+FjNkucs9g66didzF8dkjhiIg=; b=Gugec8sqFDl91PF3BCw6EzYv0iJceluEv6Qf5W1jmUYwfVOoBVU3us/C/a1Divbl17 gYyJs/bAIgn9+SErjbfwFXOg0ITrSKgkUZq9nbPc5lhlExj3j7nU8cy5eUWaBoKn7cGU ahikifyuXDIoq5GSk1LPEvyAJ/oKT2ZsIRDDXAVSvm0H7SaXy85uYg2pqRQu7qu2mzUu QNgx2R6Srvkbc0farT+ID5h404QlDT0uuLXDx5vqH34ie5Nb9SnPom/g9A+Ijmii0XYP 3FluHc/juHK9X/7bd1Mz10iQYCpPjDtJYjqGnaVw370noMU7t+GFHFca3i4bWq3kRDlL Hq0A== MIME-Version: 1.0 Received: by 10.42.57.10 with SMTP id b10mr29318899ich.54.1351617997018; Tue, 30 Oct 2012 10:26:37 -0700 (PDT) Sender: php@golemon.com Received: by 10.64.21.193 with HTTP; Tue, 30 Oct 2012 10:26:36 -0700 (PDT) X-Originating-IP: [2620:0:1cfe:10:68d6:da3:b512:6f22] In-Reply-To: <509006DA.6030200@sugarcrm.com> References: <509006DA.6030200@sugarcrm.com> Date: Tue, 30 Oct 2012 10:26:36 -0700 X-Google-Sender-Auth: 8sQ6by0mMs43LV7FXUJMlg3JNbk Message-ID: To: Stas Malyshev Cc: PHP internals Content-Type: text/plain; charset=ISO-8859-1 X-Gm-Message-State: ALoCoQnEjCvy7PFth8Lm96BdxyhC6S9m1Jx3nMZWqGOsVEWTT3IJDErGDemz7CdPjDXbhuGKpjP4 Subject: Re: [PHP-DEV] [RFC] ICU UConverter implementation for ext/intl From: pollita@php.net (Sara Golemon) > 1. transcode() accepts options, but there's no comparable way to set > options to the object. I think these APIs should be synchronized. > Imagine code keeping options in array/config object - it's be really > annoying to have two separate procedures to feed these to object and to > transcode(). > transcode() having an $options parameter is to make up for the instance version (convert()) being able to set those via instance functions (setSubstChars()). I don't picture a given app using both convert() and transcode(), the latter only exists to placate those who are objectophobic. > Also, description of options would be helpful. > They're covered in the RFC: to_subst and from_subst under "Simple Use" > 2. Shouldn't "Enumeration and lookup" methods be static? They look like > independent from encodings and don't use the object. > They are in the patch, I just forgot to note that in the RFC. Updated. > 3. For "Advanced Use", I think "no error" condition should be the > default and not requiring explicit action. > If you take no action at all, then an error still exists. This is consistent with the underlying API. > 4. I think error reporting should match other intl functions. It'd not > really be good if intl submodules would be all different in error > reporting. > Mentioned in previous feedback, I plan to look at this again. > 5. What is $source parameter for callbacks? > It's context for where in the conversion we are. $codeunit/$codepoint is the specific element causing the problem, $source is the string from that point forward. > 6. Why toUCallback returns string but fromUCallback gets codepoint as > long? Shouldn't those be the same - i.e., if toU returns unicode > codepoint, it should be long? Or it can return multiple codepoints? In > which case it becomes confusing as we represent codepoints as both > string and long in the same API. > Actually (I left this out of the RFC), they both can return a large number of types. In the case of toUCallback, you can return a utf-8 string (most reasonable Unicode representation to be returned as a char* string) and the callback mechanism will make that into UChars to put into the target string. You can return a long and it'll be treated as a single Unicode codepoint (One UChar for BMP, 2 for higher planes). You can also return an array of either of these types to specify a string in a readable, but unicode friendly format, e.g. array("Espa", 0x00F1 /* LATIN SMALL LETTER N WITH TILDE */, "ol") would be equivalent to "Espa\xC3\xB1ol". The same is true for fromUCallback() with the exception that the values being returned are assumed to be in the target encoding. For longs this means a single byte unsigned char which is appended to the target as-is. Similarly strings are appended as-is. As for input parameters: for toUCallback, $source and $codeUnits are still in their original encoding and presented as-is for that encoding. For fromUCallback(), the $source/$codePoint are in Unicode (UChar/UTF16 internall) and can't be directly offered to PHP without running into endianness issues. So the codepoint is provided as a single UChar32 (avoiding the surrogate problem in the process), and source is given as a series of UChar32 codepoints in a numerically indexed array. I'll add a section about callback input/return types to clarify this. > 7. Link to ICU API from the RFC would be helpful for reviewers and later > docs, I think. > Added!