Hi internals,
I've created a pull request (https://github.com/php/php-src/pull/33) that changes json_encode to fall back to ASCII for strings that are not valid UTF-8.
I ran into an issue in a production application involving PayPal IPN callbacks (which are sent encoded as windows-1252) and json_encode()
. If there was an accented character present in the data, json_encode()
would fail to encode the string and serialize it as 'null'.
I've modified the behaviour of the underlying json_escape_string() implementation to attempt to encode strings anyway while still producing a warning.
Thanks
--
Charlie Somerville
Hi internals,
I've created a pull request (https://github.com/php/php-src/pull/33) that changes json_encode to fall back to ASCII for strings that are not valid UTF-8.
I ran into an issue in a production application involving PayPal IPN callbacks (which are sent encoded as windows-1252) and
json_encode()
. If there was an accented character present in the data,json_encode()
would fail to encode the string and serialize it as 'null'.I've modified the behaviour of the underlying json_escape_string() implementation to attempt to encode strings anyway while still producing a warning.
JSON with non-Unicode strings is no longer JSON. The spec is explicit
that all strings must be Unicode. The default encoding is UTF-8, but it
could be UTF-16/32 as well.
See http://www.ietf.org/rfc/rfc4627.txt
-Rasmus
I've created a pull request (https://github.com/php/php-src/pull/33) that changes json_encode to fall back to ASCII for strings that are not valid UTF-8.
I ran into an issue in a production application involving PayPal IPN callbacks (which are sent encoded as windows-1252) and
json_encode()
. If there was an accented character present in the data,json_encode()
would fail to encode the string and serialize it as 'null'.I've modified the behaviour of the underlying json_escape_string() implementation to attempt to encode strings anyway while still producing a warning.
JSON with non-Unicode strings is no longer JSON. The spec is explicit
that all strings must be Unicode. The default encoding is UTF-8, but it
could be UTF-16/32 as well.
Agreed. I have a patch lying around for bug #61537 that I need to
finish up that actually goes the other way and changes the default
behaviour of json_encode()
to match the documentation and return false
if a string is invalid UTF-8, rather than just nulling that string.
-1 from me on this. It's a regression from the current behaviour, IMO.
Adam
Hi internals,
I've created a pull request (https://github.com/php/php-src/pull/33) that changes json_encode to fall back to ASCII for strings that are not valid UTF-8.
I ran into an issue in a production application involving PayPal IPN callbacks (which are sent encoded as windows-1252) and
json_encode()
. If there was an accented character present in the data,json_encode()
would fail to encode the string and serialize it as 'null'.I've modified the behaviour of the underlying json_escape_string() implementation to attempt to encode strings anyway while still producing a warning.
JSON with non-Unicode strings is no longer JSON. The spec is explicit
that all strings must be Unicode. The default encoding is UTF-8, but it
could be UTF-16/32 as well.
agree, allowing json code in no-utf8 charset is a wrong way.
especially in json-rpc, it will make things mess..
thanks
See http://www.ietf.org/rfc/rfc4627.txt
-Rasmus
--
--
Laruence Xinchen Hui
http://www.laruence.com/