Unicode support for *printf()

19 years ago by Sara Golemon — view source — reply

unread

I know we had discussion about *fprintf() on IRC,
but I'm still not clear on some stuff.

Sara, could you please explain again why UG(unicode)
should not be used as the selection for
php_formatted_print/php_u_formatted_print? I
bet a few of us are hazy on the streams Unicode/binary
details, so let's please clarify this and make sure
we're on the same page.

UG(unicode) only determines what kind of data the runtime should give
back to the script. But we're not giving any data to the scipt here,
we're giving data to a stream. Whether this stream expects unicode or
binary data has nothing to do with the setting of u.s, I could quite
easily run the following in non-unicode semantics mode:

$fp = fopen("somefile.txt", "w");
stream_encoding($fp, "utf8");
fwrite($fp, u"\C{SNOWMAN}");

Or this in unicode.semantics mode:

$fp = fopen("somefile.bin", "wb");
fwrite($fp, b"\xFF\xFE");

Let's look at how fwrite() works (maxchars logic stripped out for
simplicity):

if (Z_TYPE_P(zstring) == IS_UNICODE) {
ret = php_stream_write_unicode(stream, Z_USTRVAL_P(zstring), write_len);
} else {
convert_to_string(zstring);
ret = php_stream_write(stream, Z_STRVAL_P(zstring), write_len);
}

Here, we rely on the user to know what kind of data they should be
pushing at the stream. If they push unicode, it's written as unicode,
if they push binary, it's written as binary. Sending the wrong type is
dealt with by the streams layer, potentially raising an error.

What I was proposing for (v)fprintf(), since they involve multiple
parameters, was to use the format specifier as the type hinter. If that
arg is unicode, then generate the string as a whole as unicode, if that
arg is binary, then generate the string as a whole as binary. Of course
, it'd be even more comprehensive to do smaller writes as the string is
processed (dispatching to write() or write_unicode() as determined by
the arg), but that's going I bit far in my oppinion.

This keeps the responsibility (and the flexibility) of generating and
sending the proper types on the script author, where it belongs.

-Sara

19 years ago by Andrei Zmievski — view source — reply

unread

The update patch is attached. If there are no objections, I'd like to
commit.

-Andrei

19 years ago by Matt Wilmas — view source — reply

unread

Hi Andrei,

Just a couple things I noticed in _appenddouble... In the first switch (),
'F' is being changed to 'f', and in the second switch, 'F' needs to be moved
down with 'f'. Those changes were just made in v1.89 of the file.

Matt

----- Original Message -----
From: "Andrei Zmievski"
Sent: Monday, December 18, 2006

The update patch is attached. If there are no objections, I'd like to
commit.

-Andrei

19 years ago by Andrei Zmievski — view source — reply

unread

Is this better?

19 years ago by Matt Wilmas — view source — reply

unread

Hi Andrei,

Yeah, I see the patch you committed also included the changes that were made
since my reply yesterday... I've attached a patch that removes a few lines
that aren't present in the non-Unicode version.

Matt

----- Original Message -----
From: "Andrei Zmievski"
Sent: Tuesday, December 19, 2006

Is this better?

-Andrei

Hi Andrei,

Just a couple things I noticed in _appenddouble... In the first
switch (),
'F' is being changed to 'f', and in the second switch, 'F' needs to be
moved
down with 'f'. Those changes were just made in v1.89 of the file.

Matt

19 years ago by Andrei Zmievski — view source — reply

unread

Committed.

Hi Andrei,

Yeah, I see the patch you committed also included the changes that
were made
since my reply yesterday... I've attached a patch that removes a few
lines
that aren't present in the non-Unicode version.

Matt

----- Original Message -----
From: "Andrei Zmievski"
Sent: Tuesday, December 19, 2006

Is this better?

-Andrei

Hi Andrei,

Just a couple things I noticed in _appenddouble... In the first
switch (),
'F' is being changed to 'f', and in the second switch, 'F' needs to
be
moved
down with 'f'. Those changes were just made in v1.89 of the file.

Matt
<formatted_print.diff.txt