Bench
author: Giorgio Tani, 2004 11
license: GPL2
1) What’s a benchmark
2) What’s bench
2.1) Beware!
2.2) Tests
2.3) Results
1) What’s a benchmark
A benchmark is simply a measure of computer’s performance executing
some tasks.
From this comes that a benchmark is as significant for one user as the
tasks performed are similar to the ones he/she will perform, so the
best benchmark should be trying the everyday operations under different
platforms in order to choose the one that fits better the user’s needs.
Since it would obviously not be easy and, sometimes, possible doing so,
and since those results would be hardly useful for other users,
benchmark programs usually execute a wide range of generic tasks, often
more or less targeted to one of the possible usages of the machine
(server, multimedia desktop, office production desktop, graphic, 3d,
video editing…). Benchmarks can be focused on CPU, on RAM, on other
single components (i.e. for gaming and multimedia video card is usually
more important than the rest of the system) or can be a weighted
overall system test.
Tasks performed may be basic arithmetic/logic operations (after all,
those are the basis for any more complex program), but more advanced
benchmarks will also run engines similar to, or picked from, commercial
applications.
The “advanced” approach makes the benchmark results more close to what
an user running those application will feel at performance level, but
obviously it makes the benchmark harder to be ported on other software
and hardware platform where equivalents of those engines may be
differently optimized or may not exist for technical or commercial
reasons.
The “basic” approach is easier to write and port but results will be
more theoretical, hard to balance to describe the performance of the
machine from a possible user’s point of view.
Moreover, nowadays many CPU have special features for power demanding
tasks (Altivec, MMX, SSE-2-3…) that may be tested only writing specific
code, so those tests cannot be ported “as is” from one platform to
another, raising problems in making significant even a simple and
theoretical test between different platforms; the possibility to take
advantage in running some very power demanding tasks on special
hardware components add further difficulties.
Since most of existing applications may been divided in ones for that
the actual hardware power is more than enough (i.e. for text writing,
from decades, CPU are in idle for most of the time…) and in ones for
that the power is never enough; an user having mainly to deal with
first kind of applications can gladly ignore benchmarks focusing more
on other aspects of the machine (price, ergonomics, security,
stability, look ‘n feel, availability…), elsewhere dealing with
application of second kind the user should look at benchmarks only as a
very general source of information, focusing more on benchmarking
directly what really he/she will use.
2) What’s Bench
Bench is a small and simple CPU and memory benchmark suite, monotask
and monothread, free and open source (released under GPL), written in
FreePascal, that can be compiled to be run over different operating
system and hardware architecture.
It submits to the machine some different simple tasks evaluating the
performances in different situations to profile some of the weak and
the strong points of the system and to give arbitrary ratings to the
machine’s CPU, CPU’s cache and RAM.
It will not call explicitly for Altivec, MMX, SSE, SSE2 or SSE3,
Hiperthreading or other platform specific features, nor uses ASM
routines and uses only cross platform standard fpc libraries so it’s
fully portable on the supported platforms (see www.freepascal.org).
For the same reason the results
can be considered significant for general, non optimized, cross
platform applications,
but less realistic for specific application or routines written to take
advantage of hardware specific
features and optimizations, expecially for arithmetic tasks, like in
example most games and encoding
programs.
2.1) Beware!
2.1.1 - Tests will last only few minutes on most machines, but like
almost all benchmarks, Bench will try to use at 100% your system’s CPU
and RAM; this is not a so uncommon thing, maybe you have many other
software like games and encoders that use them at 100% and for even
longer times, however remember that this tend to overheat and stress
the system and it’s dangerous especially if you have (and you may not
know) overclocked or faulty components.
Mean time between failures of those components are usually millions or
hundred of thousands hours, however remember that on overclocked system
or in subtly faulty one (even with no macroscopic symptoms) an intense
usage may result in hardware damage instantly or in a very shorter
amount of time, especially if power demanding tasks are reiterated for
many hours.
This is a standard recommendation that is worthy for any benchmark or
otherwise power demanding software, however in no case, as stated in
the GPL license that applies to Bench, the author will be responsible,
also but not only, for any kind of system damage resulting from the use
of Bench, that is provided “as is”.
2.1.2 - Beware that it will need about 150MB of free physical memory
available to be performed at the best (300 on 64 bit systems), if your
machine have less free RAM it will rank poor in memory-related tests
since the tasks will be performed via virtual memory and will be
influenced by the speed of the disk, that is several order of size less
than the RAM speed.
2.1.3 - The process should be set as real-time if your operating system
and user profile allow it, this will give a good improvement if your
system is running many other processes or applications elsewhere the
improvement would be totally marginal.
A good practice is to not run anything else while running benchmarks to
not slow down it, on the other side depending on your operating system,
running a so power demanding task should slow down any other operation,
especially if set to real-time priority, so the system may not respond
quickly or until the tests are finished.
2.1.4 - The timer resolution may be different on different operating
system, or it may be even modified by the user.
Moreover, some target system may encounter problems handling the timer
calls used by Bench, especially with a so high CPU usage, because a
generic model of call to the timer is used for portability.
If time ticket is too long or if the generic timer call model used is
not well supported some of Bench tests can be measured with a bad
approximation and can even result timed 0, this condition will generate
an appropriate error message.
You can avoid those problems modifying the system timer resolution or
running Bench not in automatic mode to specify a number of reiteration
for each test (this will lead to longer test times). Under Win32 the
timer works fine without using reiteration, under some 2.4 and 2.6
kernel Linux distributions 4 or 8 reiterations should be used.
2.2) Tests
1) Small random structure test: it’s a classic
ARCFOUR implementation that performs integer +, logical and and access
to elements in a random position in a randomly evolving 256 element
table, elements are valued 0..255 and are sized to be aligned to the
native processor bit-width size. This should evaluate the efficiency of
the processor in managing a very small set of data that should fit in
the fastest cache level of the CPU without cache miss.
2) Conditional test: performs an if..else biased
toward then, like usually done by programmers for performance, a nested
if..else biased towards last then, a big table of case of. It works on
the random output of the first test, so it’s executed like second test.
3) Medium random structure test: an ARCFOUR/16
implementation; it uses 64K elements table (256KB on 32 bit
processors). This should take advantage boot of cache size and speed of
lower level caches and of efficiency in avoiding cache miss.
4) Large random structure test: an ARCFOUR
modification using a 1M element table (4MB for 32 bit processors), with
elements valued 0..1M. This should evaluate how the processor manage a
large set of elements that will not probably fit into the processor
caches and cannot reasonably be predicted, causing frequent cache miss.
5) Integer arithmetic/logic test: it performs highly
repetitive speedy operations on integer (coming from result of previous
test, the operations are +, -, * (arithmetic) and not, and, or, xor,
shl, shr (logical).
6) Integer slow test: it performs integer div and mod
operations that are usually implemented very slow. Since they are
avoided by most of the programmes whenever it’s possible to find an
alternative or a speedier workaround, the weight of this test is very
low.
7) Floating Point arithmetic test: performs highly
repetitive speedy operations, +, -, * between floating point variables.
8) Floating Point slow test: like integer slow test
performs operation that usually are not implemented as fast in many
processor so are usually avoided and the weight of this test is very
low. Operation performed are /, trunc, int (usually the same thing but
resulting in different ASM code on most compilers), round.
9) Move 1KB chunks test: reiterate the move
instruction on several 1KB chunks from the random output of the 1M
element ARC4, that usually fit in fastest CPU cache of most of
processors;
10) Move 32KB chunks test: reiterate the move
instruction on several 32KB chunks, that may not fit in the fastest
cache of some processors but however may easily fit in the lower order
cache in most cases, it takes also big advantages if second or third
level cache is not too slow compared if L1, like it was on older
processor where additional cache was clocked ½ or less than the
CPU.
11) Move 1MB chunks test: reiterate the move
instruction on some 1MB chunks, takes big advantage of large, speed and
efficient lower level cache of the processor, or in alternative from
fast access to RAM;
12) Move 16 MB chunks test: move instruction on 16MB
chunks, evaluate mainly the RAM speed but even the huge cache of some
high end server processor can be of some help.
13) Move 64 MB chunks test: move 64 MB chunks,
evaluating mainly the RAM speed.
2.3) Results:
Bench provide 3 rating to the system:
1) Tests 1-11 are used to calculate CPU rating; 6 and 8 results are
weighted 1/3 than other tests since performs operations that are quite
slow on most systems so are avoided whenever it’s possible.
2) Test 1, 3, 4, 9, 10, 11 are used to calculate cache rating.
3) Test 4, 11, 12, 13 are used to calculate memory rating.
When Bench starts ask to be used in automatic mode or not:
- answering yes (y or any other letter but n/N) it will repeat 3 times
the tests battery, will run the test on a basis of 16M of elements of
the processor’s native bit size without reiterating the single test and
give a rating of relative MHz compared to a reference desktop P4
Northwood;
- answering no (n/N) the user can
choose to repeat the tests battery 1-4 time;
choose to reiterate each single test 1-16 times, if the system is far
too fast (maybe useful in the future) or needed if the target platform
has problems with timers used (about this see 2.1.4 warning);
choose the reference platform for the rating: to simplify customization
of the benchmark the reference platform can be changed selecting the 0
option, the program will try to load a file named system.txt in the
executable’s path that contains, one for each line, reference system’s
name (string) and the thirteen parameters to normalize speed results
(decimal base, integer sized numbers), speed result * parameter give
the number of megahertz (or other measure) of the reference system.
Using bench in non-authomatic mode gives verbose output with detailed
results for each test.
Remember that the true benchmark, whenever it’s possible, is simply to
run what you really deserves!