There's more to benchmarking than slapping your code inside
Benchmark.bm{|bm| bm.report("foo"){ ... } }
The first thing we instinctively do is add TIMES.times with a large TIMES
constant. Surely "errors" cancel each other out for large enough values of TIMES, right?
But, how large is large enough?
I wrote a simple AdaptativeBenchmark which works more or less like the
Benchmark in stdlib, but also decides how many times to repeat execution in order to
approach the average time with the desired precision for a given confidence
level:
AdaptativeBenchmark.bmdo|bm|# by default, approach the average with a 10% confidence interval for a 95%# confidence levelbm.report("bar"){bar}bm.report("foo",:precision=>0.05,:confidence=>0.9){foo}end
The AdaptativeBenchmark first estimates the sample variance, population
variance and population average by running the given block min_runs times (10 by
default), and then uses those initial estimates to compute how many iterations
are needed for the desired confidence interval and level. The initial number
of runs should probably be adjusted the same way the extra ones are, but I'm
not sure it's worth the effort.