Bench
author: Giorgio Tani, 2004 11
license: GPL2


1)    What’s a benchmark
2)    What’s bench
    2.1) Beware!
    2.2) Tests
    2.3) Results


1) What’s a benchmark
A benchmark is simply a measure of computer’s performance executing some tasks.
From this comes that a benchmark is as significant for one user as the tasks performed are similar to the ones he/she will perform, so the best benchmark should be trying the everyday operations under different platforms in order to choose the one that fits better the user’s needs.
Since it would obviously not be easy and, sometimes, possible doing so, and since those results would be hardly useful for other users, benchmark programs usually execute a wide range of generic tasks, often more or less targeted to one of the possible usages of the machine (server, multimedia desktop, office production desktop, graphic, 3d, video editing…). Benchmarks can be focused on CPU, on RAM, on other single components (i.e. for gaming and multimedia video card is usually more important than the rest of the system) or can be a weighted overall system test.
Tasks performed may be basic arithmetic/logic operations (after all, those are the basis for any more complex program), but more advanced benchmarks will also run engines similar to, or picked from, commercial applications.
The “advanced” approach makes the benchmark results more close to what an user running those application will feel at performance level, but obviously it makes the benchmark harder to be ported on other software and hardware platform where equivalents of those engines may be differently optimized or may not exist for technical or commercial reasons.
The “basic” approach is easier to write and port but results will be more theoretical, hard to balance to describe the performance of the machine from a possible user’s point of view.
Moreover, nowadays many CPU have special features for power demanding tasks (Altivec, MMX, SSE-2-3…) that may be tested only writing specific code, so those tests cannot be ported “as is” from one platform to another, raising problems in making significant even a simple and theoretical test between different platforms; the possibility to take advantage in running some very power demanding tasks on special hardware components add further difficulties.
Since most of existing applications may been divided in ones for that the actual hardware power is more than enough (i.e. for text writing, from decades, CPU are in idle for most of the time…) and in ones for that the power is never enough; an user having mainly to deal with first kind of applications can gladly ignore benchmarks focusing more on other aspects of the machine (price, ergonomics, security, stability, look ‘n feel, availability…), elsewhere dealing with application of second kind the user should look at benchmarks only as a very general source of information, focusing more on benchmarking directly what really he/she will use.



2) What’s Bench
Bench is a small and simple CPU and memory benchmark suite, monotask and monothread, free and open source (released under GPL), written in FreePascal, that can be compiled to be run over different operating system and hardware architecture.
It submits to the machine some different simple tasks evaluating the performances in different situations to profile some of the weak and the strong points of the system and to give arbitrary ratings to the machine’s CPU, CPU’s cache and RAM.
It will not call explicitly for Altivec, MMX, SSE, SSE2 or SSE3, Hiperthreading or other platform specific features, nor uses ASM routines and uses only cross platform standard fpc libraries so it’s fully portable on the supported platforms (see www.freepascal.org).
For the same reason the results can be considered significant for general, non optimized, cross platform applications, but less realistic for specific application or routines written to take advantage of hardware specific features and optimizations, expecially for arithmetic tasks, like in example most games and encoding programs.

2.1) Beware!
2.1.1 - Tests will last only few minutes on most machines, but like almost all benchmarks, Bench will try to use at 100% your system’s CPU and RAM; this is not a so uncommon thing, maybe you have many other software like games and encoders that use them at 100% and for even longer times, however remember that this tend to overheat and stress the system and it’s dangerous especially if you have (and you may not know) overclocked or faulty components.
Mean time between failures of those components are usually millions or hundred of thousands hours, however remember that on overclocked system or in subtly faulty one (even with no macroscopic symptoms) an intense usage may result in hardware damage instantly or in a very shorter amount of time, especially if power demanding tasks are reiterated for many hours.
This is a standard recommendation that is worthy for any benchmark or otherwise power demanding software, however in no case, as stated in the GPL license that applies to Bench, the author will be responsible, also but not only, for any kind of system damage resulting from the use of Bench, that is provided “as is”.
2.1.2 - Beware that it will need about 150MB of free physical memory available to be performed at the best (300 on 64 bit systems), if your machine have less free RAM it will rank poor in memory-related tests since the tasks will be performed via virtual memory and will be influenced by the speed of the disk, that is several order of size less than the RAM speed.
2.1.3 - The process should be set as real-time if your operating system and user profile allow it, this will give a good improvement if your system is running many other processes or applications elsewhere the improvement would be totally marginal.
A good practice is to not run anything else while running benchmarks to not slow down it, on the other side depending on your operating system, running a so power demanding task should slow down any other operation, especially if set to real-time priority, so the system may not respond quickly or until the tests are finished.
2.1.4 - The timer resolution may be different on different operating system, or it may be even modified by the user.
Moreover, some target system may encounter problems handling the timer calls used by Bench, especially with a so high CPU usage, because a generic model of call to the timer is used for portability.
If time ticket is too long or if the generic timer call model used is not well supported some of Bench tests can be measured with a bad approximation and can even result timed 0, this condition will generate an appropriate error message.
You can avoid those problems modifying the system timer resolution or running Bench not in automatic mode to specify a number of reiteration for each test (this will lead to longer test times). Under Win32 the timer works fine without using reiteration, under some 2.4 and 2.6 kernel Linux distributions 4 or 8 reiterations should be used.

2.2) Tests
1)    Small random structure test: it’s a classic ARCFOUR implementation that performs integer +, logical and and access to elements in a random position in a randomly evolving 256 element table, elements are valued 0..255 and are sized to be aligned to the native processor bit-width size. This should evaluate the efficiency of the processor in managing a very small set of data that should fit in the fastest cache level of the CPU without cache miss.
2)    Conditional test: performs an if..else biased toward then, like usually done by programmers for performance, a nested if..else biased towards last then, a big table of case of. It works on the random output of the first test, so it’s executed like second test.
3)    Medium random structure test: an ARCFOUR/16 implementation; it uses 64K elements table (256KB on 32 bit processors). This should take advantage boot of cache size and speed of lower level caches and of efficiency in avoiding cache miss.
4)    Large random structure test: an ARCFOUR modification using a 1M element table (4MB for 32 bit processors), with elements valued 0..1M. This should evaluate how the processor manage a large set of elements that will not probably fit into the processor caches and cannot reasonably be predicted, causing frequent cache miss.
5)    Integer arithmetic/logic test: it performs highly repetitive speedy operations on integer (coming from result of previous test, the operations are +, -, * (arithmetic) and not, and, or, xor, shl, shr (logical).
6)    Integer slow test: it performs integer div and mod operations that are usually implemented very slow. Since they are avoided by most of the programmes whenever it’s possible to find an alternative or a speedier workaround, the weight of this test is very low.
7)    Floating Point arithmetic test: performs highly repetitive speedy operations, +, -, * between floating point variables.
8)    Floating Point slow test: like integer slow test performs operation that usually are not implemented as fast in many processor so are usually avoided and the weight of this test is very low. Operation performed are /, trunc, int (usually the same thing but resulting in different ASM code on most compilers), round.
9)    Move 1KB chunks test: reiterate the move instruction on several 1KB chunks from the random output of the 1M element ARC4, that usually fit in fastest CPU cache of most of processors;
10)    Move 32KB chunks test: reiterate the move instruction on several 32KB chunks, that may not fit in the fastest cache of some processors but however may easily fit in the lower order cache in most cases, it takes also big advantages if second or third level cache is not too slow compared if L1, like it was on older processor where additional cache was clocked ½ or less than the CPU.
11)    Move 1MB chunks test: reiterate the move instruction on some 1MB chunks, takes big advantage of large, speed and efficient lower level cache of the processor, or in alternative from fast access to RAM;
12)    Move 16 MB chunks test: move instruction on 16MB chunks, evaluate mainly the RAM speed but even the huge cache of some high end server processor can be of some help.
13)    Move 64 MB chunks test: move 64 MB chunks, evaluating mainly the RAM speed.

2.3) Results:
Bench provide 3 rating to the system:
1) Tests 1-11 are used to calculate CPU rating; 6 and 8 results are weighted 1/3 than other tests since performs operations that are quite slow on most systems so are avoided whenever it’s possible.
2) Test 1, 3, 4, 9, 10, 11 are used to calculate cache rating.
3) Test 4, 11, 12, 13 are used to calculate memory rating.
When Bench starts ask to be used in automatic mode or not:
- answering yes (y or any other letter but n/N) it will repeat 3 times the tests battery, will run the test on a basis of 16M of elements of the processor’s native bit size without reiterating the single test and give a rating of relative MHz compared to a reference desktop P4 Northwood;
- answering no (n/N) the user can
choose to repeat the tests battery 1-4 time;
choose to reiterate each single test 1-16 times, if the system is far too fast (maybe useful in the future) or needed if the target platform has problems with timers used (about this see 2.1.4 warning);
choose the reference platform for the rating: to simplify customization of the benchmark the reference platform can be changed selecting the 0 option, the program will try to load a file named system.txt in the executable’s path that contains, one for each line, reference system’s name (string) and the thirteen parameters to normalize speed results (decimal base, integer sized numbers), speed result * parameter give the number of megahertz (or other measure) of the reference system.
Using bench in non-authomatic mode gives verbose output with detailed results for each test.
Remember that the true benchmark, whenever it’s possible, is simply to run what you really deserves!