Hi,
Is there a way (planed or already implemented) to accept an ascii string in
zend_parse_parameters()?
There is the 't', but then I have to convert the string to ascii by-hand and
mess around with the types. Isn't there a simpler way?
Nuno
There is no way to do it right now. I could see it being useful,
though. Anyone else have an opinion on this?
Hi,
Is there a way (planed or already implemented) to accept an ascii
string in zend_parse_parameters()?
There is the 't', but then I have to convert the string to ascii
by-hand and mess around with the types. Isn't there a simpler way?Nuno
There is no way to do it right now. I could see it being useful, though.
Anyone else have an opinion on this?
I think we should have a look at it when there is a real need for it as
I think it's going to be a pain for the users...
Derick
--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org
Check out ext/unicode/property.c where zend_unicode_to_ascii() is being
used.
-Andrei
There is no way to do it right now. I could see it being useful,
though.
Anyone else have an opinion on this?I think we should have a look at it when there is a real need for it as
I think it's going to be a pain for the users...Derick
--
Derick Rethans
http://derickrethans.nl | http://ez.no | http://xdebug.org
Sorry for the delay.
But I think that a new type specifier could be introduced. If not you are
saying to extensions writers to duplicate the code below a hundred times:
if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "t", &name, &name_len,
&name_type) == FAILURE) {
return;
}
if (name_type == IS_UNICODE) {
buf = zend_unicode_to_ascii(name, name_len TSRMLS_CC);
if (buf == NULL) {
php_error(E_WARNING, "my_var has to consist only of ASCII
characters");
RETURN_FALSE;
}
} else {
buf = (char *) name;
}
With a new specifier you would be sure that the string you received was
ASCII-only and wouldn't have to care with conversions and such.
Nuno
Check out ext/unicode/property.c where zend_unicode_to_ascii() is being
used.-Andrei
There is no way to do it right now. I could see it being useful, though.
Anyone else have an opinion on this?I think we should have a look at it when there is a real need for it as
I think it's going to be a pain for the users...Derick
That assumes there are a hundred places where you want to receive an
ASCII string. Are they really that prevalent?
-Andrei
Sorry for the delay.
But I think that a new type specifier could be introduced. If not you
are saying to extensions writers to duplicate the code below a hundred
times:if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "t", &name,
&name_len, &name_type) == FAILURE) {
return;
}if (name_type == IS_UNICODE) {
buf = zend_unicode_to_ascii(name, name_len TSRMLS_CC);
if (buf == NULL) {
php_error(E_WARNING, "my_var has to consist only of ASCII
characters");
RETURN_FALSE;
}
} else {
buf = (char *) name;
}With a new specifier you would be sure that the string you received
was ASCII-only and wouldn't have to care with conversions and such.
Looking only to the tidy extension:
tidy_parse_string
tidy_parse_file
tidy_repair_string
tidy_repair_file
tidy_getopt
tidy::__constructor
tidy::parseFile
tidy::parseString
I would say that others extensions will need too. Think in charset names,
options names, options values, etc..
Nuno
That assumes there are a hundred places where you want to receive an ASCII
string. Are they really that prevalent?-Andrei
Sorry for the delay.
But I think that a new type specifier could be introduced. If not you are
saying to extensions writers to duplicate the code below a hundred times:if (zend_parse_parameters(ZEND_NUM_ARGS() TSRMLS_CC, "t", &name,
&name_len, &name_type) == FAILURE) {
return;
}if (name_type == IS_UNICODE) {
buf = zend_unicode_to_ascii(name, name_len TSRMLS_CC);
if (buf == NULL) {
php_error(E_WARNING, "my_var has to consist only of ASCII
characters");
RETURN_FALSE;
}
} else {
buf = (char *) name;
}With a new specifier you would be sure that the string you received was
ASCII-only and wouldn't have to care with conversions and such.
Are you sure want to generate a hard error if tidy_parse_string()
doesn't get an ASCII string?
-Andrei
Looking only to the tidy extension:
tidy_parse_string
tidy_parse_file
tidy_repair_string
tidy_repair_file
tidy_getopt
tidy::__constructor
tidy::parseFile
tidy::parseStringI would say that others extensions will need too. Think in charset
names, options names, options values, etc..Nuno
I was not refering to the html/xhtml/xml input. I was talking about the
charset parameter, for example. I don't want a chinese string passed as the
charset name (the libtidy API isn't localized yet :) ). The same applies for
the other functions.
Nuno
Are you sure want to generate a hard error if tidy_parse_string() doesn't
get an ASCII string?-Andrei
Looking only to the tidy extension:
tidy_parse_string
tidy_parse_file
tidy_repair_string
tidy_repair_file
tidy_getopt
tidy::__constructor
tidy::parseFile
tidy::parseStringI would say that others extensions will need too. Think in charset names,
options names, options values, etc..Nuno
I was not refering to the html/xhtml/xml input. I was talking about the
charset parameter, for example. I don't want a chinese string passed as the
charset name (the libtidy API isn't localized yet :) ). The same applies for
the other functions.
Yeah, but that doesnt mean you need to throw a hard error on passing a
unicode string here. They can also contain just ascii data which is
perfectly acceptable.
regards,
Derick
I was not refering to the html/xhtml/xml input. I was talking about the
charset parameter, for example. I don't want a chinese string passed as
the
charset name (the libtidy API isn't localized yet :) ). The same applies
for
the other functions.Yeah, but that doesnt mean you need to throw a hard error on passing a
unicode string here. They can also contain just ascii data which is
perfectly acceptable.
Yep, in that case it wouldn't throw any error. It would call the
zend_unicode_to_ascii() and throw an error if that fails.
Nuno
Derick,
The case we're talking about here is where a Unicode string containing
only ASCII characters is passed to an extension, not a binary string
with the same characters.
-Andrei
Yeah, but that doesnt mean you need to throw a hard error on passing a
unicode string here. They can also contain just ascii data which is
perfectly acceptable.regards,
Derick
That assumes there are a hundred places where you want to receive an
ASCII string. Are they really that prevalent?
How many of the extension libraries are Unicode-ready?
You see an awful lot of users with quasi-Unicode data that gets into
their database from un-scrubbed (or minimally-scrubbed) form data, and
then they just blindly pass it off to the extension library.
This usually results in "bug reports" of "weird characters" or cries
for help in PHP-General -- And one look at User Contributed comments
on http://php.net/str_replace will tell you that your average PHP
Developer has NO CLUE about Unicode, and, to be honest, doesn't really
care that much.
They're writing a boutique site for a client who doesn't have any
aspirations to a World Market, despite being "on" the World Wide Web.
So when all these internal strings change overnight into Unicode and
the mutli-byte data is blindly shoved out to tidy, libpdf, mysql,
postgresql, GD, imagemagick, shell scripts, and no less than four (4)
different XML parsing libraries over the years, you tell me:
What's gonna happen?
Is it going to magically "fix" all those weird-looking characters
because PHP is shoving a Unicode string out to the extension library
and all the extension libraries are ready for Unicode?
Is it going to "just work" albeit with funky-looking characters just
like before because the extension libraries aren't ready for Unicode
yet, just like they aren't thread-safe so Apache 2 is kinda pointless?
Or is it going to blow up in their faces because PHP is suddenly
assuming that all these extension libraries can cope with Unicode
strings?
Personally, I'm terrified as a PHP Developer by the Unicode change. I
know I don't have the skillset to handle Unicode issues.
If you guys don't make it "just work" -- I'm screwed, along with HUGE
segments of your install-base.
How hard do you really want to make it for extensions to be written to
deal with Unicode strings if they're not Unicode-ready?
I have NO IDEA what the answers to these questions are. But I sure
hope you guys do... :-)
--
Like Music?
http://l-i-e.com/artists.htm