Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:108103 Return-Path: Delivered-To: mailing list internals@lists.php.net Received: (qmail 46357 invoked from network); 12 Jan 2020 20:51:38 -0000 Received: from unknown (HELO php-smtp4.php.net) (45.112.84.5) by pb1.pair.com with SMTP; 12 Jan 2020 20:51:38 -0000 Received: from php-smtp4.php.net (localhost [127.0.0.1]) by php-smtp4.php.net (Postfix) with ESMTP id F312818050B for ; Sun, 12 Jan 2020 10:58:05 -0800 (PST) X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on php-smtp4.php.net X-Spam-Level: X-Spam-Status: No, score=-2.6 required=5.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,RCVD_IN_DNSWL_LOW,SPF_HELO_PASS,SPF_NONE autolearn=no autolearn_force=no version=3.4.2 X-Spam-ASN: AS11403 64.147.123.0/24 X-Spam-Virus: No X-Envelope-From: Received: from wout5-smtp.messagingengine.com (wout5-smtp.messagingengine.com [64.147.123.21]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by php-smtp4.php.net (Postfix) with ESMTPS for ; Sun, 12 Jan 2020 10:58:05 -0800 (PST) Received: from compute7.internal (compute7.nyi.internal [10.202.2.47]) by mailout.west.internal (Postfix) with ESMTP id 34A934BF for ; Sun, 12 Jan 2020 13:58:03 -0500 (EST) Received: from imap26 ([10.202.2.76]) by compute7.internal (MEProxy); Sun, 12 Jan 2020 13:58:03 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-type:date:from:in-reply-to :message-id:mime-version:references:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=KL0Ri4 G5FuRiz/5m8VYYJSZzxHdmGB74KuHiamPKlVI=; b=gU9vvqRgv9LJiPi5TiY5j+ a1Ren5fT8vPya02U0Jck165vqBE276+xO2RCeWVc41ZjPiTodQKekueDmM6e++dh g73p4kGfjxQTX1VepVFsPABLtpk6WSLhA9wGZaoNGoqSqdRfsS8TUZRxkfc7mi1w ppkCh9kf49kkszrf+Rv+ExVm3K/SI4U5QSHMhkt8kEbF3f7g3WwEAT/hF7zqGCZ1 YCYWWJv8dSsZrrM2SCzxbnWZIglEyr2dTAaYKAlrRNW48yIA9bPflqDe7BpFSan2 1gZrjX8O28aLU7CdUdgik5h36pqbBrt2ELRCA4ziLagc75Q1SDDEJNdt4MjI+VCg == X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedufedrvdeikedguddulecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefofgggkfgjfhffhffvufgtsehttdertderreejnecuhfhrohhmpedfnfgr rhhrhicuifgrrhhfihgvlhgufdcuoehlrghrrhihsehgrghrfhhivghlughtvggthhdrtg homheqnecuffhomhgrihhnpeihohhuthhusggvrdgtohhmnecurfgrrhgrmhepmhgrihhl fhhrohhmpehlrghrrhihsehgrghrfhhivghlughtvggthhdrtghomhenucevlhhushhtvg hrufhiiigvpedt X-ME-Proxy: Received: by mailuser.nyi.internal (Postfix, from userid 501) id 5452314200A2; Sun, 12 Jan 2020 13:58:02 -0500 (EST) X-Mailer: MessagingEngine.com Webmail Interface User-Agent: Cyrus-JMAP/3.1.7-740-g7d9d84e-fmstable-20200109v1 Mime-Version: 1.0 Message-ID: In-Reply-To: References: Date: Sun, 12 Jan 2020 12:57:42 -0600 To: "php internals" Content-Type: text/plain Subject: =?UTF-8?Q?Re:_[PHP-DEV]_Introducing_compile_time_code_execution_to_PHP_p?= =?UTF-8?Q?reloading?= From: larry@garfieldtech.com ("Larry Garfield") On Sat, Jan 11, 2020, at 6:40 AM, Robert Hickman wrote: > With PHP having recently introduced preloading, i have been thinking > about the possibility of adding a system whereby arbitrary php code > can run during this step. Essentially, this would serve the same > function as 'compile time execution' in many programming languages. It > should be noted that my thoughts below are mostly inspired by the > in-development language JAI, demos of which are included at the end of > this email. > > While PHP is an interpreted language, code is first parsed which > generates an AST, and this AST is then used to generate bytecode that > is stored in opcache. With preloading, the generation of this bytecode > is done only once on server startup. Compile time code would run > during this stage as a 'shim' between parsing and bytecode generation, > allowing arbitrary modifications to the AST. > > I can think of numerous examples of ways this could be advantageous. > For one, frameworks often want to store configuration data in a > database or some other external source, and accessing it every request > is needless overhead, given that data tends to never change in > production. So you could do something like the following which runs > once during preload, and caches the constant in opcache. > > > ------------ > static_run { > $link = mysqli_connect("127.0.0.1", "my_user", "my_password", "my_db"); > $res = mysqli_query ($link, 'select * from sometable'); > > $array = []; > while($row = mysqli_fetch_assoc($res)) { > $array[]= $row; > } > > define('CONST_ARRAY' = $array); > } > ------------ > > static_run being a new keyword that allows an expression to be > evaluated at compile time. > > I foresee this being able to do far more than simply define constants > though. In my opinion, it should be able to allow arbitrary > modifications to the AST, and arbitrary programmatic code generation. > For example, static code could register a callback which receives the > AST of a file during import: > > > ------------ > static_run { > on_file_load(function($file_ast){ > > // Do something with the ast of the file > > return $file_ast; > }); > } > ------------ > > As noted above, I can think of numerous things that this could do, and > as a flexible and far reaching facility, I am sure many more things > are possible that I have not considered. To give a few examples: > > * Choose a database interface once instead of during every request. > > * Check the types defined in an orm actually match the database. > > * Inverting the above, programmatically generate types from a database table. > > * Compile templating languages like twig into PHP statically, > eliminating runtime overhead > > * Convert syntactically pretty code into a more optimised form. > > * Statically generate efficient code for mapping URLs to handler functions > > * Validate the usage of callback systems such as wordpress 'shortcodes'. > > * Arbitrary code validation, such as to implement corporate > programming standards. > > > ==== Why not a preprocessor? > > While things like this can be implemented as a preprocessor, I can see > considerable advantages of implementation as a native feature of the > language itself. A big one is that it would be aware of the semantics > of the language like namespaces, and scope, which is a big downside of > rudimentary preprocessors like the one in C/C++. Implementing it into > the language runtime also eliminates the need for a build step, and > means that everyone using the language has access to the same tools. > > I also think that given that these data structures already exist > during compilation to bytecode, why not just give programmers access > to them? > > This concept is not that unusual and python for example, allows python > code to modify the AST of files as they are being loaded. However > directly modifying the AST won't be very user friendly. Due to this, > syntax could be created which allows the more common operations to be > done more easily. Rust has a macro system that is based on this kind > of idea, and JAI has recently introduced something comparable. While > it should be obvious from the above, i am not talking about macros in > the C sense. These should be 'hygienic macros'. > > > ==== How it runs > > On the web, compile time code is ran during preloading. When running > php code at the CLI, compile time code could just be run every time, > before run time code. Cacheing the opcodes in a file and automatically > detecting changes and recompiling this as python does, could be a > worthwhile optimisation. > > > ==== Inspirations > > The general idea with this was inspired by the in development > programming language JAI, which has full compile time execution. > Literally, the entire programming language can be run at compile time > with very few restrictions. See the following to videos for a > demonstration of what it can do: > > https://www.youtube.com/watch?v=UTqZNujQOlA > https://www.youtube.com/watch?v=59lKAlb6cRg&list=PLmV5I2fxaiCKfxMBrNsU1kgKJXD3PkyxO&index=20&t=0s > > There is also a programming language called 'zig' that is based on > similar ideas to JAI, and also has compile time execution. Unlike JAI > it has been released ans is available to try today. My suggested > syntax for static_run was inspired by zig. While I'd love to be able to leverage preloading to do "compile-like stuff", I have a lot of concerns with it. Most notably, *not all code will be run in a preload context*. Language features that only sometimes work scare me greatly. Doing one-time optimizations in preload that make the code faster, that's great. Preload optimizations that make the code behave differently, that's extremely dangerous. It also makes development much harder. In part because it means you have to consider two kinds of users (preloaded and not), but also because the most important "not preloaded" user is yourself, during development. "I changed one character and now I have to restart my webserver to see if it did anything" is a bad place for PHP to be. So while I'm very open to engine additions to do known-quantity preload-only optimizations (eg, could generics be implemented in a way that is moderately performant normally, but full speed in preload, with the same behavior?), I am highly skeptical about allowing arbitrary preload/compile time behavior as it makes development harder and bifurcates the ecosystem. To your specific examples, many are already possible today. Code generation in a pre-execute build step is increasingly common; the Symfony ecosystem does a ton of it, I've implemented a compiled version of a PSR-14 Event Dispatcher, etc. The ones we cannot do today are those that require DB or other service access; that's frequently not available in a PaaS environment, where you're doing your build before you connect a container to an environment with services. Once you are in a live environment you want a read-only file system, for security and auditability. Code generation at that point is then impossible. Moving that code gen to a preloader wouldn't help with that. I appreciate the intent here, but in practice I'd much rather we limit preload optimization to things the engine can do for us, and reliably know that it can do so without changing behavior. There's actually quite a lot that could be done there, if we were able to give the engine the information to do so. For example, there's a ton of optimizations that can be done that rely on working with pure functions only, but the engine today cannot know if a function is pure. (Or I don't think it's able to figure it out for itself, anyway.) I'd be fully in favor of ways that we could indicate to the engine "this is safe to do more computer-science-y optimizations on, do your thing", and then implementing those optimizations in the engine rather than in user space. --Larry Garfield