Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:45278 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 70025 invoked from network); 15 Aug 2009 19:52:23 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 15 Aug 2009 19:52:23 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@stefan-marr.de; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php@stefan-marr.de; sender-id=unknown Received-SPF: error (pb1.pair.com: domain stefan-marr.de from 85.88.12.247 cause and error) X-PHP-List-Original-Sender: php@stefan-marr.de X-Host-Fingerprint: 85.88.12.247 toolslave.net Received: from [85.88.12.247] ([85.88.12.247:45685] helo=uhweb12247.united-hoster.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id B9/30-02934-3F1178A4 for ; Sat, 15 Aug 2009 15:52:20 -0400 Received: from cust194-138.dsl.versadsl.be ([62.166.194.138] helo=[192.168.0.10]) by uhweb12247.united-hoster.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.69) (envelope-from ) id 1McPIP-0003hR-Qr; Sat, 15 Aug 2009 21:52:13 +0200 Mime-Version: 1.0 (Apple Message framework v1074) Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes In-Reply-To: Date: Sat, 15 Aug 2009 21:52:04 +0200 Cc: internals@lists.php.net Content-Transfer-Encoding: 7bit Message-ID: <36262A27-538B-487C-9C36-10E18DDBED22@stefan-marr.de> References: To: Paul Biggar X-Mailer: Apple Mail (2.1074) Subject: Re: [PHP-DEV] Design of the Zend Engine's Instruction Set From: php@stefan-marr.de (Stefan Marr) Hi Paul: > To start with, the best reference about the Zend engine that I know of > is a presentation by Andy Wharmby at IBM: > www.zapt.info/PHPOpcodes_Sep2008.odp. It should answer a lot of your > questions. Thanks a lot, was not aware of that one. And, well it helps to read and understand the code. >> So, the basic design of the Zend Engine is a >> a stack-based interpreter for a fixed length > > No, its a register based interpreter. There is a stack, but thats used > for calling functions only. The operands to the opcodes are pointed to > by the opcodes in the case of compiled variables, or in symbol tables > otherwise. > That's as close to a register machine as we can get I > think, but its not very close to a stack machine. In a stack-based VM, > the operands to an opcode would be implicit, with add for example > using the top two stack operands, and thats not the case at all. The encoding of constants or addresses in symbol tables alone does not disqualify it as a stack-based machine model per se. However, since there seem to be no traditional instructions which rely on a stack, I agree that its not a stack machine. struct _zend_execute_data makes it also look a bit like a stack, especially with its struct _zend_execute_data *prev_execute_data. But knowing that union _temp_variable *Ts; is addressed directly, it looks more lake a CISC-like register-memory model with an "infinite" number of registers. >> instruction set (76byte on a 32bit architecture), > > Andy's presentation says 96 bytes, but that might be 64 bit. I presume > this means sizeof(strict _zend_op)? Yes, gives 76 byte on my OS X, but thats a detail which just illustrates the significants of the different approaches. As an other example, Self has a real bytecode set. Each instruction is encoded in just 8bit, but that encoding is not optimized for interpretation. >> the rest of the instruction has >> many similarities with a AST representation. > > Are you referring to the IS_TMP_VAR type of a znode? Actually, I was more concerned about the op_array, and whether there is any place in the interpreter where it is used directly, i.e., by using it in a C function call as an argument and thus using the implicit C stack. If this would be used to initiate interpretation of the op_array, I think it would resemble a tree walker. But have not found anything hinting at that, especially the global data structures do not support such a thing, from what I can tell by reading the code. I am just cautious, for instance the Lua implementation provides some interesting mechanisms in this direction. >> However, its not a simple, single stack model, >> but uses several purpose-specific stacks. > > How so? Ah, thanks, you are right, was looking at the wrong struct definition (_zend_compiler_globals), indeed _zend_executor_globals defines only an argument stack, an argument type stack, and struct _zend_execute_data *current_execute_data (which also is a stack). >> What I am not so sure about is especially the >> semantics of the result field and the pointer >> to the other function (op_array). >> >> Would be grateful if someone could comment on that. > > I'm not sure whats confusing about the result field? It points to a > zval, same as op1 and op2. Ah, well, ok, now I see how it is meant. In the assumption of a stack model, it does not make much sense, but in a register-memory model, it is just specifying the location for the result, sure. > I _think_ that op_array is used to attach extra information to the > opcode by special extensions. I can't think of an example off the top > of my head. Well, was a bit imprecise here, its part of _znode i.e. operands and result, but that does not pose any misunderstandings for me anymore. >> I am also not really sure with these complexity, >> whether is not actually some kind of abstract syntax >> tree instead of a instruction set like Java >> bytecode. Thats not a technical problem, but merely >> an academic question to categorize/characterize PHP. > > I think the result field of a znode can make it seem like that, but I > would characterize it as you have before. An instruction set just like > Java bytecode. Way more complicated, obviously, but I dont think its > very close to an AST. Certainly the interpreter does not really > resemble an AST walker. Sometimes, it would be really interesting to know where some of the used ideas are coming from and what the reasoning was. I tend to think that its rather unlikely that they are pulled out of thin air. Some parts of the model remind me of CISC instruction sets... 3-address form, register-memory model... > I hope I answered what you were looking for. I'm not certain about a > few of my answers, since I've really avoided the interpreter in my > work, but I think most of it is OK. Your answers were really helpful, guiding the code reading. Thanks a lot Stefan