Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:45269 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 3720 invoked from network); 13 Aug 2009 12:42:41 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 13 Aug 2009 12:42:41 -0000 Authentication-Results: pb1.pair.com smtp.mail=php@stefan-marr.de; spf=permerror; sender-id=unknown Authentication-Results: pb1.pair.com header.from=php@stefan-marr.de; sender-id=unknown Received-SPF: error (pb1.pair.com: domain stefan-marr.de from 85.88.12.247 cause and error) X-PHP-List-Original-Sender: php@stefan-marr.de X-Host-Fingerprint: 85.88.12.247 toolslave.net Received: from [85.88.12.247] ([85.88.12.247:41336] helo=uhweb12247.united-hoster.com) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 31/70-01701-E3A048A4 for ; Thu, 13 Aug 2009 08:42:39 -0400 Received: from soft83.vub.ac.be ([134.184.43.183]) by uhweb12247.united-hoster.com with esmtpsa (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.69) (envelope-from ) id 1MbZdW-000232-9L for internals@lists.php.net; Thu, 13 Aug 2009 14:42:33 +0200 Content-Type: text/plain; charset=us-ascii; format=flowed; delsp=yes Content-Transfer-Encoding: 7bit Date: Thu, 13 Aug 2009 14:42:26 +0200 Message-ID: To: internals@lists.php.net Mime-Version: 1.0 (Apple Message framework v1074) X-Mailer: Apple Mail (2.1074) Subject: Design of the Zend Engine's Instruction Set From: php@stefan-marr.de (Stefan Marr) Hello internals: I had a look at the Zend Engine to understand some details about its internal design with respect to its opcodes and machine model. Would like to ask you for some comments if the following sounds wrong or misinterpreted to you: So, the basic design of the Zend Engine is a a stack-based interpreter for a fixed length instruction set (76byte on a 32bit architecture), where the instruction encoding is much more complex then for instance for the JVM, Python, or Smalltalk. Even so, the source code is compiled to a linearized instruction stream, the instruction itself contain not just opcode and operands. The version I looked at had some 136 opcodes encoded in one byte, but the rest of the instruction has many similarities with a AST representation. Instructions encode: - a line number - a function pointer to the actual handler which is used to execute it - two operands, which encode constant values, object references, jump addresses, or pointer to other functions - 64 bit for an extended operand value - a field for results, which is use for some operations return values. However, its not a simple, single stack model, but uses several purpose-specific stacks. What I am not so sure about is especially the semantics of the result field and the pointer to the other function (op_array). Would be grateful if someone could comment on that. I am also not really sure with these complexity, whether is not actually some kind of abstract syntax tree instead of a instruction set like Java bytecode. Thats not a technical problem, but merely an academic question to categorize/characterize PHP. All comments are welcome. Many thanks Stefan -- Stefan Marr Software Languages Lab Former Programming Technology Lab Vrije Universiteit Brussel Pleinlaan 2 / B-1050 Brussels / Belgium http://prog.vub.ac.be/~smarr Phone: +32 2 629 3956 Fax: +32 2 629 3525