Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:119667 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 41155 invoked from network); 4 Mar 2023 00:09:27 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 4 Mar 2023 00:09:27 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 07D7D1804AA for ; Fri, 3 Mar 2023 16:09:25 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-1.4 required=5.0 tests=BAYES_00, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS,T_SCC_BODY_TEXT_LINE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS15169 209.85.128.0/17 X-Spam-Virus: No X-Envelope-From: Received: from mail-ed1-f47.google.com (mail-ed1-f47.google.com [209.85.208.47]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Fri, 3 Mar 2023 16:09:24 -0800 (PST) Received: by mail-ed1-f47.google.com with SMTP id o12so16706278edb.9 for ; Fri, 03 Mar 2023 16:09:24 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=MJu2Wqp+i1u1qVCg0wMppkyXmz6hvNvMRd07Cp7jed4=; b=58+EaSutAVXo3wLIWhFqajWBI0D1shrcOCiOLymO1ZAA9d+sz2y1b9VEnHleBoPfCx 949Mp/gXo0hbiibmVWj0FguOk8HeShSUPjzX8XwagUbblT35yPrMHIa2SFq7uIFn/igQ ZN3AeAOE9ToZjagwSzNvgDvuVuELgeHczQQ8HOzcFj1xlQpL5/gVN9cfrGnlK3QfpssU WF/HUVWCYjxGXIiUIIBvzFaJh5kVCTTT6Ql1FLYGP71KWkO9kake7qQctuSNffq+GTUK siJGo/+jq3M2zcjJxN44mcpmoTqHfW4/QJA6yo5PYiv1AUfVNCSxC927EYlcZnPamHcV ROIw== X-Gm-Message-State: AO0yUKU2/u1+PTKJvWGLGbLZMNIWWXXzJaxV/HmgOJ0HDJUgCtCcSZR7 AKo/soM2WBKSLOCacnfS1n7nkyVr9vQ9w6Zr0UoY0L7D X-Google-Smtp-Source: AK7set90dPgGCjCyKoNDym0aRqQ0vh7STLq1TZLfttRU67+i32RmcQXc21H5uaQs0m7kxa4qkdQyyOQHVl+i7ZWKl+I= X-Received: by 2002:a17:906:398a:b0:8d1:9162:514a with SMTP id h10-20020a170906398a00b008d19162514amr1770839eje.8.1677888562989; Fri, 03 Mar 2023 16:09:22 -0800 (PST) MIME-Version: 1.0 References: <7e08b0f9-9d30-4ead-8194-2494ecc78e2e@app.fastmail.com> <86dc97c5-96b2-4da3-aaaf-f94a833fba70@app.fastmail.com> In-Reply-To: <86dc97c5-96b2-4da3-aaaf-f94a833fba70@app.fastmail.com> Date: Sat, 4 Mar 2023 00:09:11 +0000 Message-ID: To: Larry Garfield Cc: php internals Content-Type: multipart/alternative; boundary="000000000000c50da805f607dfc7" Subject: Re: [PHP-DEV] RFC Idea - json_validate() validate schema From: bukka@php.net (Jakub Zelenka) --000000000000c50da805f607dfc7 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Fri, Mar 3, 2023 at 4:31=E2=80=AFPM Larry Garfield wrote: > On Fri, Mar 3, 2023, at 6:19 AM, Jakub Zelenka wrote: > > > >> You mean using the version from the JSON string, and allowing an > >> override? Like this? > >> > >> > > It should never allow overriding the $schema as it would go against spe= c > so > > this would be just default if $schema is not specified. Just the defaul= t > > could be overridden. > > That sounds like it could be hard to explain why a parameter only > sometimes does something... Which suggests we should take a different > approach. > > We cannot override $schema because that would not be complaining with the spec of any draft. If it's missing, then we can select the default as it is not exactly specified what draft should be selected. At least that's how I understand it. If you understand it differently, please can you explain it? > >> new JsonSchema($schema_string, version: JsonSchema::DRAFT_4); > >> > > Would be probably called more JsonSchema::VERSION_DRAFT_04 but > essentially > > yeah I was thinking either that or something like just global constant > > JSON_SCHEMA_VERSION_DRAFT_04 which is currently more convention in json > > extension. I wouldn't really mind using the class constant though. > > Class constants FTW. > > > As I said my main point was that custom defined schema should contain > > version in the schema - I realise that it might not be used in wild but > > this is defined as SHOULD in all drafts so we should follow that > > recommendation in our API design. When this is the case, there is no > point > > for user to explicitly pick the schema class IMO. > > > > Another thing to note is that we might want to introduce some sort of a > > factory method or factory class instead of using constructor because as= I > > said before we would probably like to introduce more sources for schema > > than just string in the future. It means it could be automatically > > generated schema from a class so only the class name would be passed or > for > > convenience it could be just passed directly from the assoc array. It i= s > > basically pointless to always convert it to string because internally i= t > > will just decode the json string to object (stdClass) or more likely > array > > and parse the schema internal representation from that. If we had this, > we > > could maybe introduce a different schema classes as well but it would b= e > > more invisible for users and could be just subclasses of JsonSchema or > > JsonSchema would be just an interface. > > I'm totally fine with factory methods. I'm less enamored with a factory > class, but open to discussing it. > > Would you prefer them to be in a separate class as I proposed or part of JsonSchema (which would need to a class in such case)? > >> I see two issues there. > >> > >> 1. If I want to see if DRAFT_6 is available, I have to use defined()[1= ] > >> with strings. This is fugly. > >> > >> > > This is a good point as for class with autoloader you don't need string= s. > > > > Maybe we could also introduce JsonSchema::VERSION_LATEST which would ha= ve > > value of the last supported draft. Then you could check if draft-06 is > > supported by just doing something like > > > > if (JsonSchema::VERSION_LATEST > JsonSchema::VERSION_DRAFT_04) { > > // at lest draft 06 > > } else if (JsonSchema::VERSION_LATEST > JsonSchema::VERSION_DRAFT_06) = { > > // at lest draft 07 > > } else ... > > > > > > We could also support string for the actual version and allowing passin= g > > the well defined values for $schema. The it would throw exception if it= 's > > not supported. We could even have static method to check whether versio= n > is > > supported and then for example doing something like > > > > if (JsonSchema::isVersionSupported(" > > https://json-schema.org/draft/2020-12/schema")) { ... } > > thinking-face.gif > > That could work. For simplicity someone in user space (FIG?) could > release a set of constants that can be rapidly updated, but the actual PH= P > API is just looking at the URL. If we expect it to be a rarely-used > feature, I'd be on board with that. > > Sounds good. Would you be still for int constants in addition to that or do you think that just version as a string is enough? Int is quicker to check and allow comparison with that latest value but not sure if that's really that useful... > >> 2. I don't know how to polyfill newer spec versions if I don't want to > >> wait for internals to get around to adding a new version. > >> > > > > I guess it could be possible to support custom user validators (e.g. > > instances implementing JsonSchema interface if above concept is used) i= n > > the future but it would be of course more limited. That's not something > > that would happen initially but it might be good design an API with tha= t > in > > mind. > > > > So to sum it up, maybe the rough structure could be something like > > > > interface JsonSchema { > > const VERSION_DRAFT_04 =3D 1; > > const VERSION_DRAFT_06 =3D 2; > > const VERSION_DRAFT_07 =3D 3; > > const VERSION_LATEST =3D 3; > > > > public function validate(array|stdClass $data): bool; > > } > > > > class JsonSchemaForVersionDraft04 implements JsonSchema { > > public function validate(array|stdClass $data): bool {} > > } > > > > class JsonSchemaForVersionDraft06 implements JsonSchema { > > public function validate(array|stdClass $data): bool {} > > } > > > > class JsonSchemaForVersionDraft07 implements JsonSchema { > > public function validate(array|stdClass $data): bool {} > > } > > > > class JsonSchemaFactory { > > public static function createFromJsonString(string $jsonString, > > int|string $version =3D JsonSchema::VERSION_LATEST): JsonSchema {} > > > > public static function createFromClasl(string $className, int|string > > $version =3D JsonSchema::VERSION_LATEST): JsonSchema {} > > > > public static function createFromArray(array $schemaData, int|string > > $version =3D JsonSchema::VERSION_LATEST): JsonSchema {} > > > > public static function isVersionSupported(int|string $version): bool = {} > > } > > > > Just a draft of course so any ideas how to improve it are welcome. > > > > What do you think? > > Hm. So you're idea is that you'd parse the JSON string with json_decode(= ) > first, then pass that to the validator? Is that really the most > performant/convenient approach? (I don't know, but I question if it is.) > > So internally there will need to be 2 passes actually. The first pass will be happening during the actual json parsing which should cover all type / pattern / format issues, all maxim / minimum properties or items checks, additional properties or items checks and some other things. This is important in preventing the hash DOS attacks and better error reporting. This can be however a bit harder to expose to user space but could be done using a set of callback - sort of event based parsing. Basically something like SAX parser for XML. This is more a generic thing that would be potentially possible to add but it would need the whole new interface. It is also something that would be possible to add before the actual schema introduction so if you have some idea about API, that would be appreciated. Then we could also think how to integrate it to the validation. In addition to that there will need to be a second pass that will be done on the parsed data and it is for things like applying subschemas conditionally that need to have all properties accessible during validation. This is what validate method would get and if the first pass is not done in user space, then it could also cover validation of all keywords= . > Thinking about it another way, I see three base primitives that are bette= r > done in C than in PHP. > > 1) validate(SchemaDefinition $schema_definition, $json_value): bool > As I mentioned above this is not enough for internal validation - specifically the first pass. > 2) make_schema(string $jsonSchemaString): SchemaDefinition (and possibly > alternates with other typed parameters) > 3) make_schema_from_class(string $className, string $version =3D latest > available): SchemaDefinition > > Everything else could be done in user-space without any significant > performance impact. Even 3 could technically be done in user space if > SchemaDefinition had a robust API that could be populated from user space= , > rather than being opaque. So a major question is what level of exposed A= PI > we want the schema object to have. (I could probably argue both ways her= e.) > This wouldn't work internally. That SchemaDefinition or JsonSchema how I ca it needs to hold internal generic representation (sort of compiled schema to C structs) that drives the whole validation. There are different constructs. Composition also complicates things significantly so this would be difficult to abstract in some way and support new features. But maybe I didn't get what you exactly meant. (If you want to continue this real-time, I'm available in most of the PHP > chat fora these days.) > > Mailing list is fine for the main ideas but happy to discuss / clarify some details privately before posting it here so we don't spam the list if you find it better. The PHP Foundation slack is what I use so feel free to ping me there (I might not reply immediately but eventually I will). Cheers Jakub --000000000000c50da805f607dfc7--