Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:112740 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 22255 invoked from network); 3 Jan 2021 21:44:10 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 3 Jan 2021 21:44:10 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id 126371804DD for ; Sun, 3 Jan 2021 13:19:57 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=0.0 required=5.0 tests=BAYES_05, FREEMAIL_FORGED_FROMDOMAIN,FREEMAIL_FROM,HEADER_FROM_DIFFERENT_DOMAINS, HTML_MESSAGE,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE, SPF_PASS autolearn=no autolearn_force=no version=3.4.2 X-Spam-Virus: No X-Envelope-From: Received: from mail-lf1-f44.google.com (mail-lf1-f44.google.com [209.85.167.44]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 3 Jan 2021 13:19:56 -0800 (PST) Received: by mail-lf1-f44.google.com with SMTP id h22so60182788lfu.2 for ; Sun, 03 Jan 2021 13:19:56 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=bTcLzGq02EmK+seA+gJ45HKnEZ8W4rliRDavFU853OI=; b=gAjJInTwaba2lmSug3O1I7c6uz/WiKl6xYEH6IEhAFePwpCm8p3osmh4lErlSgNBeB zkna0IkluvT84taco+bI32OaIR5xIWF6VcTvXTFlT7KyTjHTo/9eOFFYLiNFnc5UBXW2 bzB0ck+0bYZ0cubzlnU96DKoShrA2cE5Q/1TKthS3rwq1agUscMECc39r9ZOr7uvvaen xh4t2jPecShWuyiSa58qvJfTLtiOs0u3ESZZFnbh86sb829T3Kb1MmC931gJ5k8Q5x9J VXp22CTfjZ362SJX+kfPgr7AvXVo2oUQuNnPmGAiu8mRzYLVWuk8rO8yik4XvANN5aS0 YwWg== X-Gm-Message-State: AOAM532BNI7ze+FQJ2wAHK1dcnp5TED+PzxcAkQe4L4OtAhnI5UhdcNx C7ESLzkKJF/HxA8QtDLf6m93OuwgO58uYz0Bz1k= X-Google-Smtp-Source: ABdhPJxNSIEziYzJJEwQyMUqnJWPkO4IZolXp/JYrdmvTkNTixsny8xkrlvgv6fYNzUx28scztKN0CpgWsNWOGhCUNE= X-Received: by 2002:a2e:7806:: with SMTP id t6mr36886836ljc.298.1609708794674; Sun, 03 Jan 2021 13:19:54 -0800 (PST) MIME-Version: 1.0 References: <8c1f2d5a-6888-0787-06ad-095a06dd4e7a@php.net> In-Reply-To: Date: Sun, 3 Jan 2021 21:19:43 +0000 Message-ID: To: =?UTF-8?B?TcOhdMOpIEtvY3Npcw==?= Cc: Remi Collet , PHP internals list Content-Type: multipart/alternative; boundary="000000000000e6021705b805884d" Subject: Re: [PHP-DEV] [RFC] Bundling ext/simdjson into core From: bukka@php.net (Jakub Zelenka) --000000000000e6021705b805884d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi, On Thu, Dec 31, 2020 at 1:30 AM M=C3=A1t=C3=A9 Kocsis wrote: > Hi Remi and Jakub, > >> > I agree it's too early as the library is young and won't be available in >>> many distros. The PECL path is better in this case IMO as it will allow >>> some time . >> >> > In my opinion, this is a case where making an > exception is worth considering. > > The PECL path doesn't mean that the extension won't be used. From the user point of view, it doesn't really matter that much as distro packages are available even for PECL extensions and it can easily added to Docker image as well. The advantage is that it will give some time for the library to be available in distros and possibly more stable. I think it would actually help with stabilizing the API and introducing new features quickly. It means all the mentioned features could be provided to users in the extension release cycle and not waiting a year for new PHP version. > Should the simdjson library be written in C, > I'd propose to add the new API + parser to > ext/json directly, since ext/simdjson is just a > very small wrapper around the parser, and > not a complex piece of code in itself (compared to other parts of php-src= ). > > I think this wouldn't be really an option even if it was in C because ext/json is enabled by default so you couldn't have external dependency on such a new library. Otherwise it would mean that you couldn't install PHP if that library is not available. Of course unless the library is bundled but that is not a good idea for the maintanance. > Also, I think the performance benefit of using > the simdjson parser is so major that it would > be a pity if people had to wait for years until > the functionality becomes generally available > as a core extension. As json_encode() and > json_decode() are very easy to use, my guess > is that a 3rd party JSON-related extension > would never get an adoption large enough, > because only those people would install it > who have really reached the limitations of ext/json. > > But this proposal is not about changing json_decode. It introduces a new API that can be in the same way introduce in the PECL extension. As I said above, having that in PECL doesn't mean that it's not available and people have to wait for it. It would be great to actually see what the performance benefit is in the real applications. The benchmarks relies on repeated calls which is not always the way how it's in the application (e.g. due to processor cache). Also it might not be such a huge perf increase for the most apps as the actual parsing is not usually the app bottleneck. That said I think it will bring some considerable improvements anyway in the apps especially in those doing lots of parsing but would be great to see how much it is in reality. > By the way, it has just come to my mind that > our company is also affected by these > limitations. Sometimes we have to parse > very large JSON documents, and in some > cases these can end up being truncated. > Fortunately we only need a specific part of > the data, so someone wrote a partial "parser" > (this is euphemism) tailored for the schema > in question. Rather than having to use > custom hackery, it would be so much better > if PHP would offer partial parsing out of the > box, like what the proposed > JsonParser::getKeyValue() does. > As you mentiened, this could be possible in ondemand API which looks really useful indeed. There are more things that are pretty useful like JSON Pointer, better error reporting and UTF8 validation that could be potentially also re-used in encoder. I think it would be great to have at least some of the features in the extension before it gets to the core. Especially thinking about the error reporting which should no longer depend on global state. One note about the proposed API. As it's not part of the ext/json, it shouldn't be called JsonParser but rather SimdJsonParser to reflect that it's part of the simdjson extension. That's the convention that is used for other exts and it's also less confusing for users because that class won't be available for many users initially - at least until the library is available or extension installed / enabled. The methods also shouldn't be all static but rather instance should be provided that would allow getting errors or using the ondemand mode. > That said, the cost-benefit ratio of having > simdjson in core seems advantageous for me. > > Was thinking that it would be good to consider some kind of plugable >> decoder where another extension could register a parsing callback. >> Something similar to what we have for parser but instead for the whole >> decoding. That would allow to still use current parser in json_decode bu= t >> if simdjson available / configured in ini, then it would used instead an= d >> would be just faster. Not sure if all options are supported though - for >> example don't see any note about UTF8 substitution >> (JSON_INVALID_UTF8_SUBSTITUTE). >> > > This is a very interesting approach, and it reminds me about the hashing > registry RFC > in certain extent. > > And you are right, as far as I saw, these flags > are not supported by ext/simdjson. But to be > honest, I haven't analyzed yet how difficult it > would be to have a reasonably full compatibility with ext/json. > > Yeah it would be really cool to be able to natively speed json_decode to get instant improvement for the apps as the new API usually takes time to get used. This pluggable API is in my eyes the way how to do it because we might still need to have the current parser available for the cases where simdjson lib is not available. But as you say it requires some analysis first. I think for now it would be great to create a repo for that extension under php org in Github, get it to PECL and then add more features into it. If I manage to find some time, I would be happy to contribute some bits or at least help with reviews. Once it gets a bit more stable, we could consider it for inclusion to the core. Cheers Jakub --000000000000e6021705b805884d--