Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:26977 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 37199 invoked by uid 1010); 15 Dec 2006 19:44:17 -0000 Delivered-To: ezmlm-scan-internals@lists.php.net Delivered-To: ezmlm-internals@lists.php.net Received: (qmail 37183 invoked from network); 15 Dec 2006 19:44:17 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 15 Dec 2006 19:44:17 -0000 Authentication-Results: pb1.pair.com header.from=pollita@php.net; sender-id=unknown; domainkeys=good Authentication-Results: pb1.pair.com smtp.mail=pollita@php.net; spf=permerror; sender-id=unknown Received-SPF: error (pb1.pair.com: domain php.net from 140.211.166.39 cause and error) DomainKey-Status: good X-DomainKeys: Ecelerity dk_validate implementing draft-delany-domainkeys-base-01 X-PHP-List-Original-Sender: pollita@php.net X-Host-Fingerprint: 140.211.166.39 osu1.php.net Linux 2.4/2.6 Received: from [140.211.166.39] ([140.211.166.39:44714] helo=osu1.php.net) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id D1/01-30677-EEAF2854 for ; Fri, 15 Dec 2006 14:44:17 -0500 X-DomainKeys: Ecelerity dk_sign implementing draft-delany-domainkeys-base-01 DomainKey-Signature: q=dns; a=rsa-sha1; c=nofws; s=mx; d=php.net; h=From:Subject:To:Date; b=G5oQQoVXxikdv1nI7eY/1wxiDryYoVs3Nuv2rz4rEz7KQU6moSZrwqxJEqoFxyde UWfhTDHHC7r1jvlaWHvgDqQGP/5FV6MgwyseQEXUwFAwY98FGbEIHBJ1hGRb8uKY Authentication-Results: osu1.php.net smtp.user=pollita; auth=pass (LOGIN) X-Host-Fingerprint: 207.126.230.225 unknown Received: from [207.126.230.225] ([207.126.230.225:54371] helo=[10.72.106.237]) by osu1.php.net (ecelerity 2.1.1.11-rc1 r(13363/13364M)) with ESMTPSA (cipher=AES256-SHA) id 13/82-02367-9B5F2854 for ; Fri, 15 Dec 2006 11:21:29 -0800 Message-ID: <4582F5B3.3060201@php.net> Date: Fri, 15 Dec 2006 11:21:23 -0800 User-Agent: Thunderbird 1.5.0.8 (Windows/20061025) MIME-Version: 1.0 To: internals@lists.php.net Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit Subject: Re: Unicode support for *printf() From: pollita@php.net (Sara Golemon) > I know we had discussion about *fprintf() on IRC, > but I'm still not clear on some stuff. > > Sara, could you please explain again why UG(unicode) > should not be used as the selection for > php_formatted_print/php_u_formatted_print? I > bet a few of us are hazy on the streams Unicode/binary > details, so let's please clarify this and make sure > we're on the same page. > UG(unicode) only determines what kind of data the runtime should give back to the script. But we're not giving any data to the scipt here, we're giving data to a stream. Whether this stream expects unicode or binary data has nothing to do with the setting of u.s, I could quite easily run the following in non-unicode semantics mode: $fp = fopen("somefile.txt", "w"); stream_encoding($fp, "utf8"); fwrite($fp, u"\C{SNOWMAN}"); Or this in unicode.semantics mode: $fp = fopen("somefile.bin", "wb"); fwrite($fp, b"\xFF\xFE"); Let's look at how fwrite() works (maxchars logic stripped out for simplicity): if (Z_TYPE_P(zstring) == IS_UNICODE) { ret = php_stream_write_unicode(stream, Z_USTRVAL_P(zstring), write_len); } else { convert_to_string(zstring); ret = php_stream_write(stream, Z_STRVAL_P(zstring), write_len); } Here, we rely on the user to know what kind of data they should be pushing at the stream. If they push unicode, it's written as unicode, if they push binary, it's written as binary. Sending the wrong type is dealt with by the streams layer, potentially raising an error. What I was proposing for (v)fprintf(), since they involve multiple parameters, was to use the format specifier as the type hinter. If that arg is unicode, then generate the string as a whole as unicode, if that arg is binary, then generate the string as a whole as binary. Of course , it'd be even more comprehensive to do smaller writes as the string is processed (dispatching to write() or write_unicode() as determined by the arg), but that's going I bit far in my oppinion. This keeps the responsibility (and the flexibility) of generating and sending the proper types on the script author, where it belongs. -Sara