Hello Internals.
I am thinking about improving the json_validate() function developed
for php 8.3.
The actual descriptions goes like this:
json_validate(string $json, int $depth = 512, int $flags = 0): bool
I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:
json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): bool
so, if the string is a valid JSON and also respects the schema ... then TRUE.
What do you think ?
Thanks in advance, your team mate .... Juan
On Wed, Mar 1, 2023 at 11:44 AM juan carlos morales <
dev.juan.morales@gmail.com> wrote:
Hello Internals.
I am thinking about improving the json_validate() function developed
for php 8.3.The actual descriptions goes like this:
json_validate(string $json, int $depth = 512, int $flags = 0): bool
I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): boolso, if the string is a valid JSON and also respects the schema ... then
TRUE.What do you think ?
I'm actually working on this. Currently developing the schema parsing in
pure C implementation in my play C tool called jso. You can see progress
here: https://github.com/bukka/jso/commits/next . The plan is to develop it
inside jso and then port it to jsond and then propose it for json ext
inclusion (that's how I developed the current parser). There is a lot of to
do as JsonSchema is quite complex (composition, JSON pointers, stream
integration for external pointers and more tricky bits) so this won't
likely be ready for 8.3 but should be ready for 8.4. I plan to introduce
some smaller things for 8.3 like better error reporting (error location
which I have already working in jso) and some other small additions. By the
way, the schema support won't be useful just for validation but also for
decoding and possibly encoding (sort of replacement for JsonSerializable).
Especially for decoding it can be further extended to allow class mapping.
We could also provide automatic generation of schema from class and support
attributes. I plan to propose all of this later as well but that might take
some time.
Regards
Jakub
Jakub, wow, great to know this. thanks for writting.
Ok, then .... I will assume that this feature will come from you
sometime in future.
Since json_validate() was announced ... I have being receiving
messages (a lot) about providing the ability to use JSON SCHEMAS as
well, hopefully we will have this in PHP in 8.4 (this will be a
shock!)
Question ... are you planning to incorporate this by enhancing
json_validate() ???
El mié, 1 mar 2023 a las 9:07, Jakub Zelenka (bukka@php.net) escribió:
Hello Internals.
I am thinking about improving the json_validate() function developed
for php 8.3.The actual descriptions goes like this:
json_validate(string $json, int $depth = 512, int $flags = 0): bool
I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): boolso, if the string is a valid JSON and also respects the schema ... then TRUE.
What do you think ?
I'm actually working on this. Currently developing the schema parsing in pure C implementation in my play C tool called jso. You can see progress here: https://github.com/bukka/jso/commits/next . The plan is to develop it inside jso and then port it to jsond and then propose it for json ext inclusion (that's how I developed the current parser). There is a lot of to do as JsonSchema is quite complex (composition, JSON pointers, stream integration for external pointers and more tricky bits) so this won't likely be ready for 8.3 but should be ready for 8.4. I plan to introduce some smaller things for 8.3 like better error reporting (error location which I have already working in jso) and some other small additions. By the way, the schema support won't be useful just for validation but also for decoding and possibly encoding (sort of replacement for JsonSerializable). Especially for decoding it can be further extended to allow class mapping. We could also provide automatic generation of schema from class and support attributes. I plan to propose all of this later as well but that might take some time.
Regards
Jakub
On Wed, Mar 1, 2023 at 9:23 AM juan carlos morales <
dev.juan.morales@gmail.com> wrote:
Jakub, wow, great to know this. thanks for writting.
Ok, then .... I will assume that this feature will come from you
sometime in future.Since json_validate() was announced ... I have being receiving
messages (a lot) about providing the ability to use JSON SCHEMAS as
well, hopefully we will have this in PHP in 8.4 (this will be a
shock!)Question ... are you planning to incorporate this by enhancing
json_validate() ???El mié, 1 mar 2023 a las 9:07, Jakub Zelenka (bukka@php.net) escribió:
On Wed, Mar 1, 2023 at 11:44 AM juan carlos morales <
dev.juan.morales@gmail.com> wrote:Hello Internals.
I am thinking about improving the json_validate() function developed
for php 8.3.The actual descriptions goes like this:
json_validate(string $json, int $depth = 512, int $flags = 0): bool
I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): boolso, if the string is a valid JSON and also respects the schema ... then
TRUE.What do you think ?
I'm actually working on this. Currently developing the schema parsing in
pure C implementation in my play C tool called jso. You can see progress
here: https://github.com/bukka/jso/commits/next . The plan is to develop
it inside jso and then port it to jsond and then propose it for json ext
inclusion (that's how I developed the current parser). There is a lot of to
do as JsonSchema is quite complex (composition, JSON pointers, stream
integration for external pointers and more tricky bits) so this won't
likely be ready for 8.3 but should be ready for 8.4. I plan to introduce
some smaller things for 8.3 like better error reporting (error location
which I have already working in jso) and some other small additions. By the
way, the schema support won't be useful just for validation but also for
decoding and possibly encoding (sort of replacement for JsonSerializable).
Especially for decoding it can be further extended to allow class mapping.
We could also provide automatic generation of schema from class and support
attributes. I plan to propose all of this later as well but that might take
some time.Regards
Jakub
--
To unsubscribe, visit: https://www.php.net/unsub.php
Great addition folks, looking forward to it!
If there is anything that I can do to help, please let me know!
--
Atenciosamente,
Flávio Heleno
Question ... are you planning to incorporate this by enhancing
json_validate() ???
Yes the plan is to initially enhance json_decode and json_validate that
would get a new $schema argument . I plan to create a class for the
actually schema as it needs to be parsed to its own representation so it is
convenient to have it in the object. It could be also later created from
the different sources than just JSON string (e.g. assoc array / stdClass or
automatic generation from the class that I mentioned before) so it will be
better to have it in the class.
Regards
Jakub
Hi Jakub
śr., 1 mar 2023, 14:09 użytkownik Jakub Zelenka bukka@php.net napisał:
Question ... are you planning to incorporate this by enhancing
json_validate() ???Yes the plan is to initially enhance json_decode and json_validate that
would get a new $schema argument . I plan to create a class for the
actually schema as it needs to be parsed to its own representation so it is
convenient to have it in the object. It could be also later created from
the different sources than just JSON string (e.g. assoc array / stdClass or
automatic generation from the class that I mentioned before) so it will be
better to have it in the class.Regards
Jakub
Do we really need this in core? What makes it less usable as an extension?
Cheers,
Michał Marcin Brzuchalski
Hi,
On Wed, Mar 1, 2023 at 1:36 PM Michał Marcin Brzuchalski <
michal.brzuchalski@gmail.com> wrote:
Hi Jakub
śr., 1 mar 2023, 14:09 użytkownik Jakub Zelenka bukka@php.net napisał:
Question ... are you planning to incorporate this by enhancing
json_validate() ???Yes the plan is to initially enhance json_decode and json_validate that
would get a new $schema argument . I plan to create a class for the
actually schema as it needs to be parsed to its own representation so it
is
convenient to have it in the object. It could be also later created from
the different sources than just JSON string (e.g. assoc array / stdClass
or
automatic generation from the class that I mentioned before) so it will be
better to have it in the class.Regards
Jakub
Do we really need this in core? What makes it less usable as an extension?
The primary motivation is that this allow stopping decoding / validation
once first invalid part is found - basically this is going to be validated
as parsed. It means this will eliminate all currently possible DOS attacks
on the actual JSON parsing. There are other reasons that we can discuss in
more details once proposed like better availability for users but those are
just secondary reasons and sort of side effects.
Regards
Jakub
Excellent Jakub, amazing, happy to know all this.
Lets wait for your proposal. Good luck, make it happen!
El mié, 1 mar 2023 a las 10:45, Jakub Zelenka (bukka@php.net) escribió:
Hi,
Hi Jakub
śr., 1 mar 2023, 14:09 użytkownik Jakub Zelenka bukka@php.net napisał:
Question ... are you planning to incorporate this by enhancing
json_validate() ???Yes the plan is to initially enhance json_decode and json_validate that
would get a new $schema argument . I plan to create a class for the
actually schema as it needs to be parsed to its own representation so it is
convenient to have it in the object. It could be also later created from
the different sources than just JSON string (e.g. assoc array / stdClass or
automatic generation from the class that I mentioned before) so it will be
better to have it in the class.Regards
Jakub
Do we really need this in core? What makes it less usable as an extension?
The primary motivation is that this allow stopping decoding / validation once first invalid part is found - basically this is going to be validated as parsed. It means this will eliminate all currently possible DOS attacks on the actual JSON parsing. There are other reasons that we can discuss in more details once proposed like better availability for users but those are just secondary reasons and sort of side effects.
Regards
Jakub
On Wed, Mar 1, 2023 at 9:36 AM Michał Marcin Brzuchalski <
michal.brzuchalski@gmail.com> wrote:
Do we really need this in core? What makes it less usable as an extension?
Cheers,
Michał Marcin Brzuchalski
Extensions are not easy to install and have a complex distribution system
that differs greatly between Windows, Debian, Ubuntu, Alpine Linux, AWS
Lambda, etc. I wish one day we could have something as simple and
ubiquitous as Composer installing PHP extensions, but until then the less
amount of extensions the better for end users.
--
Marco Deleu
Hi Deleu,
śr., 1 mar 2023 o 16:54 Deleu deleugyn@gmail.com napisał(a):
On Wed, Mar 1, 2023 at 9:36 AM Michał Marcin Brzuchalski <
michal.brzuchalski@gmail.com> wrote:Do we really need this in core? What makes it less usable as an extension?
Cheers,
Michał Marcin BrzuchalskiExtensions are not easy to install and have a complex distribution system
that differs greatly between Windows, Debian, Ubuntu, Alpine Linux, AWS
Lambda, etc. I wish one day we could have something as simple and
ubiquitous as Composer installing PHP extensions, but until then the less
amount of extensions the better for end users.
I agree with your last thought. The fewer extensions the better for end
users but what I have a problem with is constantly adding functions to the
standard library instead of writing a library that fulfills the need.
Along with extensions the fewer functions/classes are bundled the better
for end users.
Cheers,
Michał Marcin Brzuchalski
On Wed, Mar 1, 2023, 12:02 PM Michał Marcin Brzuchalski <
michal.brzuchalski@gmail.com> wrote:
Hi Deleu,
śr., 1 mar 2023 o 16:54 Deleu deleugyn@gmail.com napisał(a):
On Wed, Mar 1, 2023 at 9:36 AM Michał Marcin Brzuchalski <
michal.brzuchalski@gmail.com> wrote:Do we really need this in core? What makes it less usable as an
extension?Cheers,
Michał Marcin BrzuchalskiExtensions are not easy to install and have a complex distribution system
that differs greatly between Windows, Debian, Ubuntu, Alpine Linux, AWS
Lambda, etc. I wish one day we could have something as simple and
ubiquitous as Composer installing PHP extensions, but until then the less
amount of extensions the better for end users.I agree with your last thought. The fewer extensions the better for end
users but what I have a problem with is constantly adding functions to the
standard library instead of writing a library that fulfills the need.
Along with extensions the fewer functions/classes are bundled the better
for end users.Cheers,
Michał Marcin Brzuchalski
If we're being practical here, json_validate has been proposed and accepted
already, so the discussion is not whether to add a new function or not, but
instead whether to improve it to also validate schema. So the concern
doesn't seem relevant.
But for the sake of argument, the addition of it has no negative impact on
me while with extensions there are, so it's no-brainer for me.
On Wed, Mar 1, 2023 at 11:44 AM juan carlos morales <
dev.juan.morales@gmail.com> wrote:Hello Internals.
I am thinking about improving the json_validate() function developed
for php 8.3.The actual descriptions goes like this:
json_validate(string $json, int $depth = 512, int $flags = 0): bool
I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): boolso, if the string is a valid JSON and also respects the schema ... then
TRUE.What do you think ?
I'm actually working on this. Currently developing the schema parsing in
pure C implementation in my play C tool called jso. You can see progress
here: https://github.com/bukka/jso/commits/next . The plan is to develop it
inside jso and then port it to jsond and then propose it for json ext
inclusion (that's how I developed the current parser). There is a lot of to
do as JsonSchema is quite complex (composition, JSON pointers, stream
integration for external pointers and more tricky bits) so this won't
likely be ready for 8.3 but should be ready for 8.4. I plan to introduce
some smaller things for 8.3 like better error reporting (error location
which I have already working in jso) and some other small additions. By the
way, the schema support won't be useful just for validation but also for
decoding and possibly encoding (sort of replacement for JsonSerializable).
Especially for decoding it can be further extended to allow class mapping.
We could also provide automatic generation of schema from class and support
attributes. I plan to propose all of this later as well but that might take
some time.Regards
Jakub
Ooo... This would be super userful, especially for some of the ideas I have floating about in the back of my head for new libraries. :-) I'm no help on the C implementation side but I'd be happy to collaborate on the user-space API design. That's something we'd want to get very-right the first time out, or else have just the primitives that allow us to do the not-slow bits in user-space. (Which will likely mean something more than just tacking an extra parameter onto json_validate().)
--Larry Garfield
On Wed, Mar 1, 2023 at 11:44 AM juan carlos morales <
dev.juan.morales@gmail.com> wrote:Hello Internals.
I am thinking about improving the json_validate() function developed
for php 8.3.The actual descriptions goes like this:
json_validate(string $json, int $depth = 512, int $flags = 0): bool
I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): boolso, if the string is a valid JSON and also respects the schema ... then
TRUE.What do you think ?
I'm actually working on this. Currently developing the schema parsing in
pure C implementation in my play C tool called jso. You can see progress
here: https://github.com/bukka/jso/commits/next . The plan is to develop it
inside jso and then port it to jsond and then propose it for json ext
inclusion (that's how I developed the current parser). There is a lot of to
do as JsonSchema is quite complex (composition, JSON pointers, stream
integration for external pointers and more tricky bits) so this won't
likely be ready for 8.3 but should be ready for 8.4. I plan to introduce
some smaller things for 8.3 like better error reporting (error location
which I have already working in jso) and some other small additions. By the
way, the schema support won't be useful just for validation but also for
decoding and possibly encoding (sort of replacement for JsonSerializable).
Especially for decoding it can be further extended to allow class mapping.
We could also provide automatic generation of schema from class and support
attributes. I plan to propose all of this later as well but that might take
some time.Regards
Jakub
Ooo... This would be super userful, especially for some of the ideas I have floating about in the back of my head for new libraries. :-) I'm no help on the C implementation side but I'd be happy to collaborate on the user-space API design. That's something we'd want to get very-right the first time out, or else have just the primitives that allow us to do the not-slow bits in user-space. (Which will likely mean something more than just tacking an extra parameter onto json_validate().)
--Larry Garfield
--
To unsubscribe, visit: https://www.php.net/unsub.php
Hi internals,
wish one day we could have something as simple and
ubiquitous as Composer installing PHP extensions
When it comes to Docker containers and PHP, I've found
https://github.com/mlocati/docker-php-extension-installer to be
invaluable. I wish it were part of the official Docker image because
it is so incredibly painless. Usually, there's even support to apply
upstream patches and get extensions working in later versions of PHP
(for example, Memcached and PHP 8+) before they are officially
released.
On Wed, Mar 1, 2023 at 5:05 PM Larry Garfield larry@garfieldtech.com
wrote:
On Wed, Mar 1, 2023 at 11:44 AM juan carlos morales <
dev.juan.morales@gmail.com> wrote:Hello Internals.
I am thinking about improving the json_validate() function developed
for php 8.3.The actual descriptions goes like this:
json_validate(string $json, int $depth = 512, int $flags = 0): bool
I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): boolso, if the string is a valid JSON and also respects the schema ... then
TRUE.What do you think ?
I'm actually working on this. Currently developing the schema parsing in
pure C implementation in my play C tool called jso. You can see progress
here: https://github.com/bukka/jso/commits/next . The plan is to
develop it
inside jso and then port it to jsond and then propose it for json ext
inclusion (that's how I developed the current parser). There is a lot of
to
do as JsonSchema is quite complex (composition, JSON pointers, stream
integration for external pointers and more tricky bits) so this won't
likely be ready for 8.3 but should be ready for 8.4. I plan to introduce
some smaller things for 8.3 like better error reporting (error location
which I have already working in jso) and some other small additions. By
the
way, the schema support won't be useful just for validation but also for
decoding and possibly encoding (sort of replacement for
JsonSerializable).
Especially for decoding it can be further extended to allow class
mapping.
We could also provide automatic generation of schema from class and
support
attributes. I plan to propose all of this later as well but that might
take
some time.Regards
Jakub
Ooo... This would be super userful, especially for some of the ideas I
have floating about in the back of my head for new libraries. :-) I'm no
help on the C implementation side but I'd be happy to collaborate on the
user-space API design. That's something we'd want to get very-right the
first time out, or else have just the primitives that allow us to do the
not-slow bits in user-space. (Which will likely mean something more than
just tacking an extra parameter onto json_validate().)
Agreed that we'd want to get it right the first time out so help with the
API will be certainly appreciated. I will announce it here when I have got
some draft version and we can go from there.
Cheers
Jakub
On Wed, 1 Mar 2023 at 11:44, juan carlos morales dev.juan.morales@gmail.com
wrote:
I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): bool
Functionally, I think this is sounds great. My only concern is that JSON
Schema is a bit of a moving target. Unlike XSD, which has only ever had two
revisions, 10 years apart (1.0 from 2001, and 1.1 from 2012), JSON Schema
is in active development, with "Draft-07", "2019-09", and "2020-12" all
seeing deployment as stable releases, and work in progress on a new
version: https://github.com/json-schema-org/json-schema-spec/milestone/7
That raises two questions:
- Which existing version(s) of the specification will be implemented? For
instance, if I have a schema written for 2019-09 rather than 2020-12, will
I be able to use this validator? - How will new versions of the specification be added to the parser? If PHP
8.4 is release in November 2024, and JSON Schema 2025-01 is published two
months later, will I need to wait for PHP 8.5 to use the new specification?
I guess the combination of those leads to a third question: will support
for some versions be removed in later PHP versions, so that each PHP
version will have a minimum and maximum schema version it supports?
That's the big disadvantage of anything shipped in core, rather than in an
extension or library - there is no way to make even minor changes outside
of PHP's release cycle.
I wonder if there's some way this can be made pluggable - ship the core
mechanisms needed by the validator in core, but make them accessible to
libraries to implement specific specifications in an efficient way. This is
the approach taken by the MongoDB PHP driver - there's a stable extension
providing efficient low-level routines, then a decoupled PHP library which
can be installed at the project level, and follow a much more agile release
process. I'm not sure what it would look like in this case, but may be
worth considering.
Regards,
Rowan Tommins
[IMSoP]
On Thu, Mar 2, 2023 at 12:00 PM Rowan Tommins rowan.collins@gmail.com
wrote:
On Wed, 1 Mar 2023 at 11:44, juan carlos morales <
dev.juan.morales@gmail.com>
wrote:I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): boolFunctionally, I think this is sounds great. My only concern is that JSON
Schema is a bit of a moving target. Unlike XSD, which has only ever had two
revisions, 10 years apart (1.0 from 2001, and 1.1 from 2012), JSON Schema
is in active development, with "Draft-07", "2019-09", and "2020-12" all
seeing deployment as stable releases, and work in progress on a new
version: https://github.com/json-schema-org/json-schema-spec/milestone/7That raises two questions:
- Which existing version(s) of the specification will be implemented? For
instance, if I have a schema written for 2019-09 rather than 2020-12, will
I be able to use this validator?
The plan is to support draft-04 and all later ones. So yeah you should be
able to support both once they are implemented. I'm starting with draft-04
and will add others in the order they were created.
- How will new versions of the specification be added to the parser? If PHP
8.4 is release in November 2024, and JSON Schema 2025-01 is published two
months later, will I need to wait for PHP 8.5 to use the new specification?
It would be a feature so the next minor.
I guess the combination of those leads to a third question: will support
for some versions be removed in later PHP versions, so that each PHP
version will have a minimum and maximum schema version it supports?
It's possible that we might decide to stop supporting some drafts if the
maintenance burden is too big and usage small but I wouldn't see that as
something that happens often. But essentially you are right that there will
be minimum (draft-04 initially) and maximum (latest implemented draft).
That's the big disadvantage of anything shipped in core, rather than in an
extension or library - there is no way to make even minor changes outside
of PHP's release cycle.
The changes are going to be first added to PECL jsond which will support
PHP 7.2+ for some time. I will most likely keep it updated for later drafts
additions so it could be still used for older versions of PHP.
I wonder if there's some way this can be made pluggable - ship the core
mechanisms needed by the validator in core, but make them accessible to
libraries to implement specific specifications in an efficient way. This is
the approach taken by the MongoDB PHP driver - there's a stable extension
providing efficient low-level routines, then a decoupled PHP library which
can be installed at the project level, and follow a much more agile release
process. I'm not sure what it would look like in this case, but may be
worth considering.
This might be quite technically challenging and also significantly
impacting performance of parsing so I wouldn't see this happening - at
least not initially.
Regards
Jakub
On Thu, Mar 2, 2023 at 12:00 PM Rowan Tommins rowan.collins@gmail.com
wrote:On Wed, 1 Mar 2023 at 11:44, juan carlos morales <
dev.juan.morales@gmail.com>
wrote:I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): boolFunctionally, I think this is sounds great. My only concern is that JSON
Schema is a bit of a moving target. Unlike XSD, which has only ever had two
revisions, 10 years apart (1.0 from 2001, and 1.1 from 2012), JSON Schema
is in active development, with "Draft-07", "2019-09", and "2020-12" all
seeing deployment as stable releases, and work in progress on a new
version: https://github.com/json-schema-org/json-schema-spec/milestone/7That raises two questions:
- Which existing version(s) of the specification will be implemented? For
instance, if I have a schema written for 2019-09 rather than 2020-12, will
I be able to use this validator?The plan is to support draft-04 and all later ones. So yeah you should be
able to support both once they are implemented. I'm starting with draft-04
and will add others in the order they were created.
- How will new versions of the specification be added to the parser? If PHP
8.4 is release in November 2024, and JSON Schema 2025-01 is published two
months later, will I need to wait for PHP 8.5 to use the new specification?It would be a feature so the next minor.
I guess the combination of those leads to a third question: will support
for some versions be removed in later PHP versions, so that each PHP
version will have a minimum and maximum schema version it supports?It's possible that we might decide to stop supporting some drafts if the
maintenance burden is too big and usage small but I wouldn't see that as
something that happens often. But essentially you are right that there will
be minimum (draft-04 initially) and maximum (latest implemented draft).That's the big disadvantage of anything shipped in core, rather than in an
extension or library - there is no way to make even minor changes outside
of PHP's release cycle.The changes are going to be first added to PECL jsond which will support
PHP 7.2+ for some time. I will most likely keep it updated for later drafts
additions so it could be still used for older versions of PHP.I wonder if there's some way this can be made pluggable - ship the core
mechanisms needed by the validator in core, but make them accessible to
libraries to implement specific specifications in an efficient way. This is
the approach taken by the MongoDB PHP driver - there's a stable extension
providing efficient low-level routines, then a decoupled PHP library which
can be installed at the project level, and follow a much more agile release
process. I'm not sure what it would look like in this case, but may be
worth considering.This might be quite technically challenging and also significantly
impacting performance of parsing so I wouldn't see this happening - at
least not initially.Regards
Jakub
This may be a place where good OO design helps. Thinking aloud:
$schema = new JsonSchemaDraft4($schema_string);
$schema->validate($json_string): bool;
$otherSchema = new JsonSchemaDraft5($other_schema);
etc.
If there's a common JasonSchema interface for them all, it would also be possible to polyfill the classes in user-space that way. The C-provided classes could of course share whatever underlying code makes sense for them to share.
--Larry Garfield
On Thu, Mar 2, 2023 at 5:11 PM Larry Garfield larry@garfieldtech.com
wrote:
On Thu, Mar 2, 2023 at 12:00 PM Rowan Tommins rowan.collins@gmail.com
wrote:On Wed, 1 Mar 2023 at 11:44, juan carlos morales <
dev.juan.morales@gmail.com>
wrote:I am thinking about enhancing this function to also be able to
validate against a JSON SCHEMA, giving us something like this:json_validate(string $json, int $depth = 512, int $flags = 0, string
$json_schema = null): boolFunctionally, I think this is sounds great. My only concern is that JSON
Schema is a bit of a moving target. Unlike XSD, which has only ever had
two
revisions, 10 years apart (1.0 from 2001, and 1.1 from 2012), JSON
Schema
is in active development, with "Draft-07", "2019-09", and "2020-12" all
seeing deployment as stable releases, and work in progress on a new
version:
https://github.com/json-schema-org/json-schema-spec/milestone/7That raises two questions:
- Which existing version(s) of the specification will be implemented?
For
instance, if I have a schema written for 2019-09 rather than 2020-12,
will
I be able to use this validator?The plan is to support draft-04 and all later ones. So yeah you should be
able to support both once they are implemented. I'm starting with
draft-04
and will add others in the order they were created.
- How will new versions of the specification be added to the parser? If
PHP
8.4 is release in November 2024, and JSON Schema 2025-01 is published
two
months later, will I need to wait for PHP 8.5 to use the new
specification?It would be a feature so the next minor.
I guess the combination of those leads to a third question: will support
for some versions be removed in later PHP versions, so that each PHP
version will have a minimum and maximum schema version it supports?It's possible that we might decide to stop supporting some drafts if the
maintenance burden is too big and usage small but I wouldn't see that as
something that happens often. But essentially you are right that there
will
be minimum (draft-04 initially) and maximum (latest implemented draft).That's the big disadvantage of anything shipped in core, rather than in
an
extension or library - there is no way to make even minor changes
outside
of PHP's release cycle.The changes are going to be first added to PECL jsond which will support
PHP 7.2+ for some time. I will most likely keep it updated for later
drafts
additions so it could be still used for older versions of PHP.I wonder if there's some way this can be made pluggable - ship the core
mechanisms needed by the validator in core, but make them accessible to
libraries to implement specific specifications in an efficient way.
This is
the approach taken by the MongoDB PHP driver - there's a stable
extension
providing efficient low-level routines, then a decoupled PHP library
which
can be installed at the project level, and follow a much more agile
release
process. I'm not sure what it would look like in this case, but may be
worth considering.This might be quite technically challenging and also significantly
impacting performance of parsing so I wouldn't see this happening - at
least not initially.Regards
Jakub
This may be a place where good OO design helps. Thinking aloud:
$schema = new JsonSchemaDraft4($schema_string);
$schema->validate($json_string): bool;
$otherSchema = new JsonSchemaDraft5($other_schema);
The schema version should be specified by $schema keyword so I don't think
we need multiple classes for that. We should probably default to the latest
version (and maybe allow changing default using argument) if it is not
present but it's not something that users should specify outside the schema
if they control it so it seems better not to do it on the class level.
Cheers
Jakub
The schema version should be specified by $schema keyword
Unfortunately, it's not what happens in the wild. Some schemas even
forbid $schema
(e.g. Composer's one).
--
Best regards,
Bruce Weirdan mailto:weirdan@gmail.com
It's possible that we might decide to stop supporting some drafts if the
maintenance burden is too big and usage small but I wouldn't see that as
something that happens often. But essentially you are right that there will
be minimum (draft-04 initially) and maximum (latest implemented draft).
Thinking about this, particularly if there is the ability to install a PECL
extension which supports different versions from the core PHP version,
perhaps it would be useful to expose a function or constant that lists the
supported versions, so that code needing a particular version could check
for support directly, rather than having to attempt and catch an exception?
I guess in Larry's suggestion, that use case would be filled by
class_exists('JsonSchema_2026_10")
Regards,
Rowan Tommins
[IMSoP]
On Thu, Mar 2, 2023 at 5:53 PM Rowan Tommins rowan.collins@gmail.com
wrote:
It's possible that we might decide to stop supporting some drafts if the
maintenance burden is too big and usage small but I wouldn't see that as
something that happens often. But essentially you are right that there
will
be minimum (draft-04 initially) and maximum (latest implemented draft).Thinking about this, particularly if there is the ability to install a PECL
extension which supports different versions from the core PHP version,
perhaps it would be useful to expose a function or constant that lists the
supported versions, so that code needing a particular version could check
for support directly, rather than having to attempt and catch an exception?
That's a good point. I think a constant should be sufficient. I think the
different classes make less sense if $schema is specified as I would think
that the $schema should have a priority. So having just default argument
using the constant for the cases where it's not specified should be
sufficient IMHO.
Cheers
Jakub
On Thu, Mar 2, 2023 at 5:53 PM Rowan Tommins rowan.collins@gmail.com
wrote:It's possible that we might decide to stop supporting some drafts if the
maintenance burden is too big and usage small but I wouldn't see that as
something that happens often. But essentially you are right that there
will
be minimum (draft-04 initially) and maximum (latest implemented draft).Thinking about this, particularly if there is the ability to install a PECL
extension which supports different versions from the core PHP version,
perhaps it would be useful to expose a function or constant that lists the
supported versions, so that code needing a particular version could check
for support directly, rather than having to attempt and catch an exception?That's a good point. I think a constant should be sufficient. I think the
different classes make less sense if $schema is specified as I would think
that the $schema should have a priority. So having just default argument
using the constant for the cases where it's not specified should be
sufficient IMHO.Cheers
Jakub
You mean using the version from the JSON string, and allowing an override? Like this?
new JsonSchema($schema_string, version: JsonSchema::DRAFT_4);
I see two issues there.
-
If I want to see if DRAFT_6 is available, I have to use
defined()
[1] with strings. This is fugly. -
I don't know how to polyfill newer spec versions if I don't want to wait for internals to get around to adding a new version.
--Larry Garfield
Hi,
On Thu, Mar 2, 2023 at 8:30 PM Larry Garfield larry@garfieldtech.com
wrote:
On Thu, Mar 2, 2023 at 5:53 PM Rowan Tommins rowan.collins@gmail.com
wrote:It's possible that we might decide to stop supporting some drafts if
the
maintenance burden is too big and usage small but I wouldn't see that
as
something that happens often. But essentially you are right that there
will
be minimum (draft-04 initially) and maximum (latest implemented
draft).Thinking about this, particularly if there is the ability to install a
PECL
extension which supports different versions from the core PHP version,
perhaps it would be useful to expose a function or constant that lists
the
supported versions, so that code needing a particular version could
check
for support directly, rather than having to attempt and catch an
exception?That's a good point. I think a constant should be sufficient. I think the
different classes make less sense if $schema is specified as I would
think
that the $schema should have a priority. So having just default argument
using the constant for the cases where it's not specified should be
sufficient IMHO.Cheers
Jakub
You mean using the version from the JSON string, and allowing an
override? Like this?
It should never allow overriding the $schema as it would go against spec so
this would be just default if $schema is not specified. Just the default
could be overridden.
new JsonSchema($schema_string, version: JsonSchema::DRAFT_4);
Would be probably called more JsonSchema::VERSION_DRAFT_04 but essentially
yeah I was thinking either that or something like just global constant
JSON_SCHEMA_VERSION_DRAFT_04 which is currently more convention in json
extension. I wouldn't really mind using the class constant though.
As I said my main point was that custom defined schema should contain
version in the schema - I realise that it might not be used in wild but
this is defined as SHOULD in all drafts so we should follow that
recommendation in our API design. When this is the case, there is no point
for user to explicitly pick the schema class IMO.
Another thing to note is that we might want to introduce some sort of a
factory method or factory class instead of using constructor because as I
said before we would probably like to introduce more sources for schema
than just string in the future. It means it could be automatically
generated schema from a class so only the class name would be passed or for
convenience it could be just passed directly from the assoc array. It is
basically pointless to always convert it to string because internally it
will just decode the json string to object (stdClass) or more likely array
and parse the schema internal representation from that. If we had this, we
could maybe introduce a different schema classes as well but it would be
more invisible for users and could be just subclasses of JsonSchema or
JsonSchema would be just an interface.
I see two issues there.
- If I want to see if DRAFT_6 is available, I have to use
defined()
[1]
with strings. This is fugly.
This is a good point as for class with autoloader you don't need strings.
Maybe we could also introduce JsonSchema::VERSION_LATEST which would have
value of the last supported draft. Then you could check if draft-06 is
supported by just doing something like
if (JsonSchema::VERSION_LATEST > JsonSchema::VERSION_DRAFT_04) {
// at lest draft 06
} else if (JsonSchema::VERSION_LATEST > JsonSchema::VERSION_DRAFT_06) {
// at lest draft 07
} else ...
We could also support string for the actual version and allowing passing
the well defined values for $schema. The it would throw exception if it's
not supported. We could even have static method to check whether version is
supported and then for example doing something like
if (JsonSchema::isVersionSupported("
https://json-schema.org/draft/2020-12/schema")) { ... }
- I don't know how to polyfill newer spec versions if I don't want to
wait for internals to get around to adding a new version.
I guess it could be possible to support custom user validators (e.g.
instances implementing JsonSchema interface if above concept is used) in
the future but it would be of course more limited. That's not something
that would happen initially but it might be good design an API with that in
mind.
So to sum it up, maybe the rough structure could be something like
interface JsonSchema {
const VERSION_DRAFT_04 = 1;
const VERSION_DRAFT_06 = 2;
const VERSION_DRAFT_07 = 3;
const VERSION_LATEST = 3;
public function validate(array|stdClass $data): bool;
}
class JsonSchemaForVersionDraft04 implements JsonSchema {
public function validate(array|stdClass $data): bool {}
}
class JsonSchemaForVersionDraft06 implements JsonSchema {
public function validate(array|stdClass $data): bool {}
}
class JsonSchemaForVersionDraft07 implements JsonSchema {
public function validate(array|stdClass $data): bool {}
}
class JsonSchemaFactory {
public static function createFromJsonString(string $jsonString,
int|string $version = JsonSchema::VERSION_LATEST): JsonSchema {}
public static function createFromClasl(string $className, int|string
$version = JsonSchema::VERSION_LATEST): JsonSchema {}
public static function createFromArray(array $schemaData, int|string
$version = JsonSchema::VERSION_LATEST): JsonSchema {}
public static function isVersionSupported(int|string $version): bool {}
}
Just a draft of course so any ideas how to improve it are welcome.
What do you think?
Cheers
Jakub
You mean using the version from the JSON string, and allowing an
override? Like this?It should never allow overriding the $schema as it would go against spec so
this would be just default if $schema is not specified. Just the default
could be overridden.
That sounds like it could be hard to explain why a parameter only sometimes does something... Which suggests we should take a different approach.
new JsonSchema($schema_string, version: JsonSchema::DRAFT_4);
Would be probably called more JsonSchema::VERSION_DRAFT_04 but essentially
yeah I was thinking either that or something like just global constant
JSON_SCHEMA_VERSION_DRAFT_04 which is currently more convention in json
extension. I wouldn't really mind using the class constant though.
Class constants FTW.
As I said my main point was that custom defined schema should contain
version in the schema - I realise that it might not be used in wild but
this is defined as SHOULD in all drafts so we should follow that
recommendation in our API design. When this is the case, there is no point
for user to explicitly pick the schema class IMO.Another thing to note is that we might want to introduce some sort of a
factory method or factory class instead of using constructor because as I
said before we would probably like to introduce more sources for schema
than just string in the future. It means it could be automatically
generated schema from a class so only the class name would be passed or for
convenience it could be just passed directly from the assoc array. It is
basically pointless to always convert it to string because internally it
will just decode the json string to object (stdClass) or more likely array
and parse the schema internal representation from that. If we had this, we
could maybe introduce a different schema classes as well but it would be
more invisible for users and could be just subclasses of JsonSchema or
JsonSchema would be just an interface.
I'm totally fine with factory methods. I'm less enamored with a factory class, but open to discussing it.
I see two issues there.
- If I want to see if DRAFT_6 is available, I have to use
defined()
[1]
with strings. This is fugly.This is a good point as for class with autoloader you don't need strings.
Maybe we could also introduce JsonSchema::VERSION_LATEST which would have
value of the last supported draft. Then you could check if draft-06 is
supported by just doing something likeif (JsonSchema::VERSION_LATEST > JsonSchema::VERSION_DRAFT_04) {
// at lest draft 06
} else if (JsonSchema::VERSION_LATEST > JsonSchema::VERSION_DRAFT_06) {
// at lest draft 07
} else ...We could also support string for the actual version and allowing passing
the well defined values for $schema. The it would throw exception if it's
not supported. We could even have static method to check whether version is
supported and then for example doing something likeif (JsonSchema::isVersionSupported("
https://json-schema.org/draft/2020-12/schema")) { ... }
thinking-face.gif
That could work. For simplicity someone in user space (FIG?) could release a set of constants that can be rapidly updated, but the actual PHP API is just looking at the URL. If we expect it to be a rarely-used feature, I'd be on board with that.
- I don't know how to polyfill newer spec versions if I don't want to
wait for internals to get around to adding a new version.I guess it could be possible to support custom user validators (e.g.
instances implementing JsonSchema interface if above concept is used) in
the future but it would be of course more limited. That's not something
that would happen initially but it might be good design an API with that in
mind.So to sum it up, maybe the rough structure could be something like
interface JsonSchema {
const VERSION_DRAFT_04 = 1;
const VERSION_DRAFT_06 = 2;
const VERSION_DRAFT_07 = 3;
const VERSION_LATEST = 3;public function validate(array|stdClass $data): bool;
}class JsonSchemaForVersionDraft04 implements JsonSchema {
public function validate(array|stdClass $data): bool {}
}class JsonSchemaForVersionDraft06 implements JsonSchema {
public function validate(array|stdClass $data): bool {}
}class JsonSchemaForVersionDraft07 implements JsonSchema {
public function validate(array|stdClass $data): bool {}
}class JsonSchemaFactory {
public static function createFromJsonString(string $jsonString,
int|string $version = JsonSchema::VERSION_LATEST): JsonSchema {}public static function createFromClasl(string $className, int|string
$version = JsonSchema::VERSION_LATEST): JsonSchema {}public static function createFromArray(array $schemaData, int|string
$version = JsonSchema::VERSION_LATEST): JsonSchema {}public static function isVersionSupported(int|string $version): bool {}
}Just a draft of course so any ideas how to improve it are welcome.
What do you think?
Hm. So you're idea is that you'd parse the JSON string with json_decode()
first, then pass that to the validator? Is that really the most performant/convenient approach? (I don't know, but I question if it is.)
Thinking about it another way, I see three base primitives that are better done in C than in PHP.
- validate(SchemaDefinition $schema_definition, $json_value): bool
- make_schema(string $jsonSchemaString): SchemaDefinition (and possibly alternates with other typed parameters)
- make_schema_from_class(string $className, string $version = latest available): SchemaDefinition
Everything else could be done in user-space without any significant performance impact. Even 3 could technically be done in user space if SchemaDefinition had a robust API that could be populated from user space, rather than being opaque. So a major question is what level of exposed API we want the schema object to have. (I could probably argue both ways here.)
(If you want to continue this real-time, I'm available in most of the PHP chat fora these days.)
--Larry Garfield
Class constants FTW.
s/Class constants/enum/
:P
--
Rowan Tommins
[IMSoP]
Class constants FTW.
s/Class constants/enum/
:P
Well yes, but if they're being used to just represent a string, rather than being their own value, actually no. :-)
Though, I suppose if the list of supported versions is provided as an enum, that's slightly easier to check for from user-space than class constants?
--Larry Garfield
On Fri, Mar 3, 2023 at 4:31 PM Larry Garfield larry@garfieldtech.com
wrote:
You mean using the version from the JSON string, and allowing an
override? Like this?It should never allow overriding the $schema as it would go against spec
so
this would be just default if $schema is not specified. Just the default
could be overridden.That sounds like it could be hard to explain why a parameter only
sometimes does something... Which suggests we should take a different
approach.
We cannot override $schema because that would not be complaining with the
spec of any draft. If it's missing, then we can select the default as it
is not exactly specified what draft should be selected. At least that's how
I understand it. If you understand it differently, please can you explain
it?
new JsonSchema($schema_string, version: JsonSchema::DRAFT_4);
Would be probably called more JsonSchema::VERSION_DRAFT_04 but
essentially
yeah I was thinking either that or something like just global constant
JSON_SCHEMA_VERSION_DRAFT_04 which is currently more convention in json
extension. I wouldn't really mind using the class constant though.Class constants FTW.
As I said my main point was that custom defined schema should contain
version in the schema - I realise that it might not be used in wild but
this is defined as SHOULD in all drafts so we should follow that
recommendation in our API design. When this is the case, there is no
point
for user to explicitly pick the schema class IMO.Another thing to note is that we might want to introduce some sort of a
factory method or factory class instead of using constructor because as I
said before we would probably like to introduce more sources for schema
than just string in the future. It means it could be automatically
generated schema from a class so only the class name would be passed or
for
convenience it could be just passed directly from the assoc array. It is
basically pointless to always convert it to string because internally it
will just decode the json string to object (stdClass) or more likely
array
and parse the schema internal representation from that. If we had this,
we
could maybe introduce a different schema classes as well but it would be
more invisible for users and could be just subclasses of JsonSchema or
JsonSchema would be just an interface.I'm totally fine with factory methods. I'm less enamored with a factory
class, but open to discussing it.
Would you prefer them to be in a separate class as I proposed or part of
JsonSchema (which would need to a class in such case)?
I see two issues there.
- If I want to see if DRAFT_6 is available, I have to use
defined()
[1]
with strings. This is fugly.This is a good point as for class with autoloader you don't need strings.
Maybe we could also introduce JsonSchema::VERSION_LATEST which would have
value of the last supported draft. Then you could check if draft-06 is
supported by just doing something likeif (JsonSchema::VERSION_LATEST > JsonSchema::VERSION_DRAFT_04) {
// at lest draft 06
} else if (JsonSchema::VERSION_LATEST > JsonSchema::VERSION_DRAFT_06) {
// at lest draft 07
} else ...We could also support string for the actual version and allowing passing
the well defined values for $schema. The it would throw exception if it's
not supported. We could even have static method to check whether version
is
supported and then for example doing something likeif (JsonSchema::isVersionSupported("
https://json-schema.org/draft/2020-12/schema")) { ... }thinking-face.gif
That could work. For simplicity someone in user space (FIG?) could
release a set of constants that can be rapidly updated, but the actual PHP
API is just looking at the URL. If we expect it to be a rarely-used
feature, I'd be on board with that.
Sounds good. Would you be still for int constants in addition to that or do
you think that just version as a string is enough? Int is quicker to check
and allow comparison with that latest value but not sure if that's really
that useful...
- I don't know how to polyfill newer spec versions if I don't want to
wait for internals to get around to adding a new version.I guess it could be possible to support custom user validators (e.g.
instances implementing JsonSchema interface if above concept is used) in
the future but it would be of course more limited. That's not something
that would happen initially but it might be good design an API with that
in
mind.So to sum it up, maybe the rough structure could be something like
interface JsonSchema {
const VERSION_DRAFT_04 = 1;
const VERSION_DRAFT_06 = 2;
const VERSION_DRAFT_07 = 3;
const VERSION_LATEST = 3;public function validate(array|stdClass $data): bool;
}class JsonSchemaForVersionDraft04 implements JsonSchema {
public function validate(array|stdClass $data): bool {}
}class JsonSchemaForVersionDraft06 implements JsonSchema {
public function validate(array|stdClass $data): bool {}
}class JsonSchemaForVersionDraft07 implements JsonSchema {
public function validate(array|stdClass $data): bool {}
}class JsonSchemaFactory {
public static function createFromJsonString(string $jsonString,
int|string $version = JsonSchema::VERSION_LATEST): JsonSchema {}public static function createFromClasl(string $className, int|string
$version = JsonSchema::VERSION_LATEST): JsonSchema {}public static function createFromArray(array $schemaData, int|string
$version = JsonSchema::VERSION_LATEST): JsonSchema {}public static function isVersionSupported(int|string $version): bool {}
}Just a draft of course so any ideas how to improve it are welcome.
What do you think?
Hm. So you're idea is that you'd parse the JSON string with
json_decode()
first, then pass that to the validator? Is that really the most
performant/convenient approach? (I don't know, but I question if it is.)
So internally there will need to be 2 passes actually.
The first pass will be happening during the actual json parsing which
should cover all type / pattern / format issues, all maxim / minimum
properties or items checks, additional properties or items checks and some
other things. This is important in preventing the hash DOS attacks and
better error reporting. This can be however a bit harder to expose to user
space but could be done using a set of callback - sort of event based
parsing. Basically something like SAX parser for XML. This is more a
generic thing that would be potentially possible to add but it would need
the whole new interface. It is also something that would be possible to add
before the actual schema introduction so if you have some idea about API,
that would be appreciated. Then we could also think how to integrate it to
the validation.
In addition to that there will need to be a second pass that will be done
on the parsed data and it is for things like applying subschemas
conditionally that need to have all properties accessible during
validation. This is what validate method would get and if the first pass is
not done in user space, then it could also cover validation of all keywords.
Thinking about it another way, I see three base primitives that are better
done in C than in PHP.
- validate(SchemaDefinition $schema_definition, $json_value): bool
As I mentioned above this is not enough for internal validation -
specifically the first pass.
- make_schema(string $jsonSchemaString): SchemaDefinition (and possibly
alternates with other typed parameters)- make_schema_from_class(string $className, string $version = latest
available): SchemaDefinitionEverything else could be done in user-space without any significant
performance impact. Even 3 could technically be done in user space if
SchemaDefinition had a robust API that could be populated from user space,
rather than being opaque. So a major question is what level of exposed API
we want the schema object to have. (I could probably argue both ways here.)
This wouldn't work internally. That SchemaDefinition or JsonSchema how I ca
it needs to hold internal generic representation (sort of compiled schema
to C structs) that drives the whole validation. There are different
constructs. Composition also complicates things significantly so this would
be difficult to abstract in some way and support new features. But maybe I
didn't get what you exactly meant.
(If you want to continue this real-time, I'm available in most of the PHP
chat fora these days.)
Mailing list is fine for the main ideas but happy to discuss / clarify some
details privately before posting it here so we don't spam the list if you
find it better. The PHP Foundation slack is what I use so feel free to ping
me there (I might not reply immediately but eventually I will).
Cheers
Jakub