Wednesday, December 02, 2009

So those previous benchmarks weren't benchmarks:

The first is the 1.2Ghz ARM5 processor from Marvell (the one in the Sheevaplug)


# nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 354.24 : 9.08 : 2.98
STRING SORT : 33.613 : 15.02 : 2.32
BITFIELD : 8.4763e+07 : 14.54 : 3.04
FP EMULATION : 38.723 : 18.58 : 4.29
FOURIER : 362.87 : 0.41 : 0.23
ASSIGNMENT : 4.6711 : 17.77 : 4.61
IDEA : 1201.9 : 18.38 : 5.46
HUFFMAN : 462.25 : 12.82 : 4.09
NEURAL NET : 0.51582 : 0.83 : 0.35
LU DECOMPOSITION : 16.481 : 0.85 : 0.62
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 14.784
FLOATING-POINT INDEX: 0.663
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU :
L2 Cache :
OS : Linux 2.6.31-gentoo-r6
C compiler : armv5tel-softfloat-linux-gnueabi-gcc
libc :
MEMORY INDEX : 3.193
INTEGER INDEX : 4.112
FLOATING-POINT INDEX: 0.368
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


Now the Geode...


$ nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 124.28 : 3.19 : 1.05
STRING SORT : 15.581 : 6.96 : 1.08
BITFIELD : 4.4277e+07 : 7.60 : 1.59
FP EMULATION : 13.392 : 6.43 : 1.48
FOURIER : 1999.4 : 2.27 : 1.28
ASSIGNMENT : 3.3664 : 12.81 : 3.32
IDEA : 519.83 : 7.95 : 2.36
HUFFMAN : 215.18 : 5.97 : 1.91
NEURAL NET : 1.6729 : 2.69 : 1.13
LU DECOMPOSITION : 71.611 : 3.71 : 2.68
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 6.779
FLOATING-POINT INDEX: 2.830
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : AuthenticAMD Geode(TM) Integrated Processor by AMD PCS 498MHz
L2 Cache : 128 KB
OS : Linux 2.6.32-rc8
C compiler : i586-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 1.784
INTEGER INDEX : 1.625
FLOATING-POINT INDEX: 1.570
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


AMD Athlon 64 X2...


# nbench

BYTEmark* Native Mode Benchmark ver. 2 (10/95)
Index-split by Andrew D. Balsa (11/97)
Linux/Unix* port by Uwe F. Mayer (12/96,11/97)

TEST : Iterations/sec. : Old Index : New Index
: : Pentium 90* : AMD K6/233*
--------------------:------------------:-------------:------------
NUMERIC SORT : 719.36 : 18.45 : 6.06
STRING SORT : 119.58 : 53.43 : 8.27
BITFIELD : 3.2184e+08 : 55.21 : 11.53
FP EMULATION : 84.806 : 40.69 : 9.39
FOURIER : 11684 : 13.29 : 7.46
ASSIGNMENT : 15.326 : 58.32 : 15.13
IDEA : 3096.3 : 47.36 : 14.06
HUFFMAN : 1190 : 33.00 : 10.54
NEURAL NET : 24.162 : 38.81 : 16.33
LU DECOMPOSITION : 850.16 : 44.04 : 31.80
==========================ORIGINAL BYTEMARK RESULTS==========================
INTEGER INDEX : 41.210
FLOATING-POINT INDEX: 28.320
Baseline (MSDOS*) : Pentium* 90, 256 KB L2-cache, Watcom* compiler 10.0
==============================LINUX DATA BELOW===============================
CPU : Dual AuthenticAMD AMD Athlon(tm) 64 X2 Dual Core Processor 6000+ 2993MHz
L2 Cache : 1024 KB
OS : Linux 2.6.31
C compiler : x86_64-pc-linux-gnu-gcc
libc :
MEMORY INDEX : 11.299
INTEGER INDEX : 9.582
FLOATING-POINT INDEX: 15.707
Baseline (LINUX) : AMD K6/233*, 512 KB L2-cache, gcc 2.7.2.3, libc-5.4.38
* Trademarks are property of their respective holder.


Going by this the Marvell ARM board (essentially the Sheevaplug with more NIC's) has no floating point performance (soft-float). Still in integer performance it does alright (more than 2 times the Geode's performance).

Unfortunately my Sheevaplug is slower than other people's for some reason. And it would seem that yes, the Core2Duo does kick ass.
More basic benchmarks (stress running -c - which apparently computes sqrt() at some rate):


# stress -t 30s -c 8 -m 1 && cat /proc/loadavg
stress: info: [6975] dispatching hogs: 8 cpu, 0 io, 1 vm, 0 hdd
stress: info: [6975] successful run completed in 30s
3.64 1.23 0.53 1/47 6985
# cat /proc/cpuinfo
Processor : Feroceon 88FR131 rev 1 (v5l)
BogoMIPS : 1192.75
Features : swp half thumb fastmult edsp
CPU implementer : 0x56
CPU architecture: 5TE
CPU variant : 0x2
CPU part : 0x131
CPU revision : 1

Hardware : Marvell RD-88F6281 Reference Board
Revision : 0000
Serial : 0000000000000000



$ stress -t 30s -c 8 -m 1 && cat /proc/loadavg
stress: info: [10077] dispatching hogs: 8 cpu, 0 io, 1 vm, 0 hdd
stress: info: [10077] successful run completed in 30s
3.78 1.28 0.85 1/126 10087
$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 5
model : 10
model name : Geode(TM) Integrated Processor by AMD PCS
stepping : 2
cpu MHz : 498.091
cache size : 128 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu de pse tsc msr cx8 sep pge cmov clflush mmx mmxext 3dnowext 3dnow
bogomips : 996.18
clflush size : 32
cache_alignment : 32
address sizes : 32 bits physical, 32 bits virtual
power management:


The Marvell board has a 1200Mhz X-Scale (ARM5) versus the AMD Geode (x86) 500Mhz. With about the same amount of RAM (some more things are running on the Geode device) the ARM board appears to win. But not by much; and this is supposedly a fast ARM board. I'm very curious about a Cortex-A8 to see how that does, but I don't have a beagleboard to test on.

Oh, but wait...


$ stress -t 30s -c 8 -m 1 && cat /proc/loadavg
stress: info: [20281] dispatching hogs: 8 cpu, 0 io, 1 vm, 0 hdd
stress: info: [20281] successful run completed in 30s
4.00 0.99 0.32 1/231 20291
$ cat /proc/cpuinfo
processor : 0
vendor_id : AuthenticAMD
cpu family : 15
model : 67
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
stepping : 3
cpu MHz : 2992.907
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 0
cpu cores : 2
apicid : 0
initial apicid : 0
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dno
wext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_lega
cy
bogomips : 5985.81
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc

processor : 1
vendor_id : AuthenticAMD
cpu family : 15
model : 67
model name : AMD Athlon(tm) 64 X2 Dual Core Processor 6000+
stepping : 3
cpu MHz : 2992.907
cache size : 1024 KB
physical id : 0
siblings : 2
core id : 1
cpu cores : 2
apicid : 1
initial apicid : 1
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov
pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt rdtscp lm 3dno
wext 3dnow rep_good extd_apicid pni cx16 lahf_lm cmp_legacy svm extapic cr8_lega
cy
bogomips : 5985.51
TLB size : 1024 4K pages
clflush size : 64
cache_alignment : 64
address sizes : 40 bits physical, 48 bits virtual
power management: ts fid vid ttp tm stc


So it just goes as fast as it can. WTF is the point of stress again?