Empirical Scaling Analyzer (ESA)
|
Introduction
Empirical Scaling Analyzer (ESA) is a tool that takes a file of solver
runtimes
(sample)
as input, automatically fits and evaluates parametric models, and
generates a PDF report for the analysis results
(sample). The
models are fitted to a fraction of the dataset with the smallest
problem instance sizes, and are challenged by extrapolation using a
bootstrap approach. For detailed information about the methodology,
please refer to the papers below.
People
- Yasha Pushak, PhD student & Vanier Scholar (The University of British Columbia)
- Zongxu Mu, MSc (The University of British Columbia)
- Holger H. Hoos, Professor (Universiteit Leiden and The University of British Columbia)
Running ESA
What Version Should I Use?
ESA v1 [Pushak et. al., 2020] requires that your
running time dataset must be grouped
into bins, where each bin contains problem instances of exactly
the same size. If you cannot obtain instances in this way, you
should always prefer ESA v2 [Pushak & Hoos, 2020].
If your dataset does have instances grouped into bins of the
same size, then you may choose either version -- both tend
produce qualitatively similar results. However, the method
that each procedure uses to fit models to your dataset varies.
While both methods typically produce models with quite similar
training error, ESA v1 implicitly weights the larger running
times more heavily than ESA v2. If you prefer to weight the
larger instance sizes more heavily (e.g., because you
believe the smaller instance sizes are contaminated by lower
order terms or floor effects), we believe it is more principled
to use a custom weighting scheme with ESA v2.
By far, the biggest difference between the performance of the
two methods is the running time required by the procedures to
fit the empirical performance scaling models to your running
time datasets. Fitting models is cubic in the number of data
points. Since ESA v2 fits directly to the full training set,
whereas ESA v1 fits only to a small number of statistics of the
training set, this means that ESA v2 requires substantially more
time for large datasets. For 11 000 instances, ESA v2 requires
a little over a full day to perform its analysis, whereas ESA
v1 runs in a few minutes. For this reason, we only provide
ESA v1 as an online service -- ESA v2 must be run
from the command line. However, for datasets with hundreds of
problem instances, ESA v2 should still complete its analysis
within seconds or minutes.
Papers
- [PusHoo20] Y. Pushak and H. H. Hoos. Advanced Statistical Analysis of Empirical Performance Scaling In Proceedings of GECCO, 2020. (PDF File, Video)
- [PusEtal20] Y. Pushak and Z. Mu and H. H. Hoos. Empirical scaling analyzer: An automated system for empirical analysis of performance scaling. AI Communications, 2020 — to appear. (PDF File Preprint, Supplementary Material)
- [Mu15] Z. Mu. Analysing the empirical time complexity of high-performance algorithms for SAT and TSP. M.Sc. Thesis, University of British Columbia, 2015. (Electronic version, Slides)
- [MuHoo15b] Z. Mu and H. H. Hoos. Empirical Scaling Analyser: an automated system for empirical analysis of performance scaling. Companion Publication of GECCO, 2015. (PDF_File, Poster)
- [MuHoo15a] Z. Mu and H. H. Hoos. On the empirical time complexity of random 3-SAT at the phase transition. Proceedings of IJCAI, 2015. (PDF_File, Slides, Poster)
- [HooStu14] H. H. Hoos and T. Stützle. On the empirical scaling of run-time for finding optimal solutions to the travelling salesman problem. European Journal of Operational Research, 238(1):87-94, 2014. (PDF_File)
- [Hoo09] H. H. Hoos. A bootstrap approach to analysing the scaling of empirical run-time data with problem size. Technical Report TR-2009-16, Department of Computer Science, University of British Columbia, 2009. (PDF_File)
Applications
- Propositional Satisfiability Problem (SAT)
- Phase-transition random 3-SAT
- Phase-transition random 4-SAT
- Less-constrained random 4-SAT
- Bounded model checking
- Travelling Salesperson Problem (TSP)
- Empirical scaling of state-of-the-art exact TSP solver - Concorde
- Empirical scaling of state-of-the-art inexact TSP solvers - EAX and LKH
- Impact of automated algorithm configuration on empirical scaling of inexact TSP solvers
- Machine Learning (ML) traning times on the MNIST digits dataset
- Support Vector Machines
- Random Forests
- k-Nearest Neighbours (using KD-Trees)
- Quicksort (sorting) with various pivot rules
- Random
- Median of three
- Ninther