Empirical Scaling Analyzer (ESA)

Introduction

Empirical Scaling Analyzer (ESA) is a tool that takes a file of solver runtimes (sample) as input, automatically fits and evaluates parametric models, and generates a PDF report for the analysis results (sample). The models are fitted to a fraction of the dataset with the smallest problem instance sizes, and are challenged by extrapolation using a bootstrap approach. For detailed information about the methodology, please refer to the papers below.

People

Yasha Pushak, PhD student & Vanier Scholar (The University of British Columbia)
Zongxu Mu, MSc (The University of British Columbia)
Holger H. Hoos, Professor (Universiteit Leiden and The University of British Columbia)

Running ESA

Web service (ESA v1.1 only): esa-online.html
Documentation:
- ESA v2
- ESA v1.1
Command-line:

What Version Should I Use?

ESA v1 [Pushak et. al., 2020] requires that your running time dataset must be grouped into bins, where each bin contains problem instances of exactly the same size. If you cannot obtain instances in this way, you should always prefer ESA v2 [Pushak & Hoos, 2020].

If your dataset does have instances grouped into bins of the same size, then you may choose either version -- both tend produce qualitatively similar results. However, the method that each procedure uses to fit models to your dataset varies. While both methods typically produce models with quite similar training error, ESA v1 implicitly weights the larger running times more heavily than ESA v2. If you prefer to weight the larger instance sizes more heavily (e.g., because you believe the smaller instance sizes are contaminated by lower order terms or floor effects), we believe it is more principled to use a custom weighting scheme with ESA v2.

By far, the biggest difference between the performance of the two methods is the running time required by the procedures to fit the empirical performance scaling models to your running time datasets. Fitting models is cubic in the number of data points. Since ESA v2 fits directly to the full training set, whereas ESA v1 fits only to a small number of statistics of the training set, this means that ESA v2 requires substantially more time for large datasets. For 11 000 instances, ESA v2 requires a little over a full day to perform its analysis, whereas ESA v1 runs in a few minutes. For this reason, we only provide ESA v1 as an online service -- ESA v2 must be run from the command line. However, for datasets with hundreds of problem instances, ESA v2 should still complete its analysis within seconds or minutes.

Papers

[PusHoo20] Y. Pushak and H. H. Hoos. Advanced Statistical Analysis of Empirical Performance Scaling In Proceedings of GECCO, 2020. (PDF File, Video)
[PusEtal20] Y. Pushak and Z. Mu and H. H. Hoos. Empirical scaling analyzer: An automated system for empirical analysis of performance scaling. AI Communications, 2020 — to appear. (PDF File Preprint, Supplementary Material)
[Mu15] Z. Mu. Analysing the empirical time complexity of high-performance algorithms for SAT and TSP. M.Sc. Thesis, University of British Columbia, 2015. (Electronic version, Slides)
[MuHoo15b] Z. Mu and H. H. Hoos. Empirical Scaling Analyser: an automated system for empirical analysis of performance scaling. Companion Publication of GECCO, 2015. (PDF_File, Poster)
[MuHoo15a] Z. Mu and H. H. Hoos. On the empirical time complexity of random 3-SAT at the phase transition. Proceedings of IJCAI, 2015. (PDF_File, Slides, Poster)
[HooStu14] H. H. Hoos and T. Stützle. On the empirical scaling of run-time for finding optimal solutions to the travelling salesman problem. European Journal of Operational Research, 238(1):87-94, 2014. (PDF_File)
[Hoo09] H. H. Hoos. A bootstrap approach to analysing the scaling of empirical run-time data with problem size. Technical Report TR-2009-16, Department of Computer Science, University of British Columbia, 2009. (PDF_File)

Applications

Propositional Satisfiability Problem (SAT)

Phase-transition random 3-SAT
Phase-transition random 4-SAT
Less-constrained random 4-SAT
Bounded model checking

Travelling Salesperson Problem (TSP)

Empirical scaling of state-of-the-art exact TSP solver - Concorde
Empirical scaling of state-of-the-art inexact TSP solvers - EAX and LKH
Impact of automated algorithm configuration on empirical scaling of inexact TSP solvers

Machine Learning (ML) traning times on the MNIST digits dataset

Support Vector Machines
Random Forests
k-Nearest Neighbours (using KD-Trees)

Quicksort (sorting) with various pivot rules

Random
Median of three
Ninther