Newsgroups: php.internals Path: news.php.net Xref: news.php.net php.internals:67747 Return-Path: Mailing-List: contact internals-help@lists.php.net; run by ezmlm Delivered-To: mailing list internals@lists.php.net Received: (qmail 31827 invoked from network); 20 Jun 2013 13:55:30 -0000 Received: from unknown (HELO lists.php.net) (127.0.0.1) by localhost with SMTP; 20 Jun 2013 13:55:30 -0000 X-Host-Fingerprint: 85.191.93.178 cust-03-55bf5db2.adsl.scarlet.nl Received: from [85.191.93.178] ([85.191.93.178:4984] helo=localhost.localdomain) by pb1.pair.com (ecelerity 2.1.1.9-wez r(12769M)) with ESMTP id 7D/D5-25301-0D903C15 for ; Thu, 20 Jun 2013 09:55:29 -0400 To: internals@lists.php.net Message-ID: <51C309CE.7080009@linaro.org> Date: Thu, 20 Jun 2013 15:55:26 +0200 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130510 Thunderbird/17.0.6 MIME-Version: 1.0 CC: dmitry@zend.com, rasmus@lerdorf.com, pollita@php.net Content-Type: text/plain; charset=ISO-8859-1; format=flowed Content-Transfer-Encoding: 7bit X-Posted-By: 85.191.93.178 Subject: ARM performance and GOTO executor From: ard.biesheuvel@linaro.org (Ard Biesheuvel) Hello all, I am working on ARM server performance tuning, and I have been playing around a bit with the various executor modes and zend_vm_gen.php. As it turns out (scroll down for numbers), the GOTO executor is much faster than the default CALL executor on ARM, partly due to fewer branch mispredictions (as perf tells me) but there are probably other factors at play here as well. My question to you is if we could parametrize this in the build system, for instance by adding alternate files zend_vm_opcodes-goto.h and zend_vm_execute-goto.h to the tree, and selecting those when targeting ARM (and perhaps other archs that may prefer GOTO over CALL as well). Or is there a better way of including/selecting alternate executors? Also, when playing around, I noticed that building the executor without specialization is broken, as there are erroneous FREE_OP2() calls left behind in the handlers for 'break' and 'continue'. If nobody objects, I will remove them (zend_vm_def.h lines 3302 and 3314) Regards, Ard. ARM Cortex-A15 @ 1.7 GHz with default executor (specialized CALL) ================================================================= simple 0.358 simplecall 0.396 simpleucall 0.419 simpleudcall 0.458 mandel 0.839 mandel2 1.038 ackermann(7) 0.400 ary(50000) 0.096 ary2(50000) 0.087 ary3(2000) 0.490 fibo(30) 1.157 hash1(50000) 0.135 hash2(500) 0.096 heapsort(20000) 0.266 matrix(20) 0.309 nestedloop(12) 0.499 sieve(30) 0.363 strcat(200000) 0.046 ------------------------ Total 7.449 Performance counter stats for 'php Zend/bench.php': 7444.535230 task-clock # 0.983 CPUs utilized 103 context-switches # 0.014 K/sec 9 cpu-migrations # 0.001 K/sec 5963 page-faults # 0.801 K/sec 12728701964 cycles # 1.710 GHz 13603248229 instructions # 1.07 insns per cycle 2633774500 branches # 353.786 M/sec 118799433 branch-misses # 4.51% of all branches 7.570311211 seconds time elapsed ARM Cortex-A15 @ 1.7 GHz with specialized GOTO executor ======================================================= simple 0.185 simplecall 0.295 simpleucall 0.249 simpleudcall 0.257 mandel 0.349 mandel2 0.529 ackermann(7) 0.252 ary(50000) 0.061 ary2(50000) 0.060 ary3(2000) 0.393 fibo(30) 0.798 hash1(50000) 0.092 hash2(500) 0.079 heapsort(20000) 0.195 matrix(20) 0.206 nestedloop(12) 0.214 sieve(30) 0.241 strcat(200000) 0.025 ------------------------ Total 4.479 Performance counter stats for '~/php Zend/bench.php': 4468.040559 task-clock # 0.983 CPUs utilized 79 context-switches # 0.018 K/sec 9 cpu-migrations # 0.002 K/sec 5062 page-faults # 0.001 M/sec 7561345552 cycles # 1.692 GHz 11297962039 instructions # 1.49 insns per cycle 2121936756 branches # 474.914 M/sec 22190686 branch-misses # 1.05% of all branches 4.545350085 seconds time elapsed