Still new here so what I say probably doesn't amount to much, but I too see this as encouraging the generally-less-appropriate sequence of: parse to determine if it's valid JSON via is_json()
; and then parse again using json_decode()
The name is_valid_json()
rings truer to me because we are dealing with values that may or may not be "JSON" even if they can parse as such. For example, the string "false"
is valid JSON if interpreted as JSON but need not be JSON in truth. The same is true for "[1, 2, 3]"
and even still I suppose it's possible to have a string value that works as a JSON map if interpreted as JSON but that data may not have been created with the purpose of being JSON. Again, it's valid JSON if interpreted that way, but may not in fact be JSON.
That being said, the idea of putting this behind filter_var()
seems like it would do less to encourage poor practice (as in, it would be a good thing). That one is likely harder to find, doesn't sound as appealing as is_json()
or is_valid_json()
, and would stick out more if someone were to use it casually where json_decode()
alone is more appropriate. Also that function already deals with validation and provides a way to pass the options we would need (max depth, JSON_INVALID_UTF8_SUBSTITUTE, and JSON_INVALID_UTF8_IGNORE) if we wanted this to match the behavior of json_decode()
.
The examples you found are good motivations for a leaner validation check, but I wonder if they actually represent a great need. If a good user-space solution exists those few frameworks could use that effectively and presumably the need for a core function would disappear. I think personally I would rather see this go through if our performance improved substantially more than it shows in the posted benchmarks. While I realize what you are doing so far is reusing the existing JSON parser and so we shouldn't expect an entirely different performance profile, I agree with the others that the implementation here is somewhat intrinsically bound with the idea itself.
In comparison I have created a naive version in PHP itself with PCRE calls advancing the JSON tokenization: https://pastebin.com/Cf8BZn1H
There are certainly glaring bugs in this because I tossed it together while on a train and only wanted to get a reasonable approximation of the performance characteristics for in-PHP code. In your benchmark it runs about twice as slow as json_decode()
but uses essentially no memory (matching your results with is_json()
). I'm pretty sure there are easy ways to eliminate some low-hanging performance bottlenecks, though it doesn't check for valid UTF8 or valid escaped strings or check for a maximum depth. I feel like if we are going to add a new native function for this it should be way faster than four times as fast as a primitive function in PHP user-space.
Warmly,
Dennis Snell
Hey,
While I'm not opposed to the idea, I'm struggling to think of a way to accurately determine whether a string is true JSON that doesn't involve some sort of parsing.
Sure some regexes can be run on it, but I'm not sure that will ever be 100% accurate. Any parsing of the JSON to determine whether or not it's valid JSON would, in most situations, lead down a code path that then parsed the JSON again, essentially repeating the same function for little benefit.
I personally like to use the JSON_THROW_ON_ERROR
flag on json_decode.
Not trying to rain on your parade, so sorry if it comes across that way, I'm just not sure of a way to do it where the benefit outweighs the cons, but then again, there are people a lot smarter than me on this list.
Best Regards,
Ollie Read
Hello Ollie.
Dont be sorry. Actually I appreciate every feedback I got.
I promise I Will not dissapoint anyone.
I Will prepare de RFC next week.
Regards.
Juan
El dom., 7 de agosto de 2022 14:03, Ollie Read php@ollie.codes escribió:
Hey,
While I'm not opposed to the idea, I'm struggling to think of a way to
accurately determine whether a string is true JSON that doesn't involve
some sort of parsing.Sure some regexes can be run on it, but I'm not sure that will ever be
100% accurate. Any parsing of the JSON to determine whether or not it's
valid JSON would, in most situations, lead down a code path that then
parsed the JSON again, essentially repeating the same function for little
benefit.I personally like to use the
JSON_THROW_ON_ERROR
flag on json_decode.Not trying to rain on your parade, so sorry if it comes across that way,
I'm just not sure of a way to do it where the benefit outweighs the cons,
but then again, there are people a lot smarter than me on this list.
Best Regards,
Ollie Read