Feature Milestones

HAL 1.0

target: September, 2010

Web UI Features

Page to add new external target algorithms
Page to add new parameter spaces for a given target algorithm (modified from existing spaces)
Page to add new problem instances/distributions (in the form of lists of files)
Page to specify new execution environments (Eg. cluster config details)
Pages to specify & launch included meta-algorithms
Ability to view algorithms/instances by problem (instance compatibility) during above specification
Page to view summary of all queued, running, and completed jobs
Page to view browse/view details/delete runs/problems/instances/algorithms/environments
Dynamic run monitoring analysis pages, including:
Plots: Overlaid SCDs for (fixed #) multi-alg, multi inst meta-algs (RTDs for single-inst), SQT for meta-algs where possible, scatter plot for 2-target multi-instance meta-algs, incumbent SCD/RTD for design meta-algs. done but being reworked
Descriptive statistics: (mean/sd, quantiles/iqrs) for assessing single-algorithm on an instance dist done
Statistical tests: Wilcoxon signed rank, Spearman correlation for comparing 2 algs on an instance dist done

Functionality for meta-algorithm developers

Ability to interact with the parameter space of an algorithm (examine domains, conditionalities, etc.) done
Ability to transform algorithm parameter spaces: log transforms, discretization done
Ability to run arbitrary algorithms, including other meta-algorithms, in identical fashion done
Ability to monitor the trajectories of all output variables of an executed algorithm run, in real time done
Ability to query database of previous runs directly done
Ability to access instance features done
Pre-defined metrics for aggregating performance across runs done

Backend functionality exposed in above

Ability to execute algorithms locally done
Ability to execute algorithms on a remote host via SSH needs update re: API changes
Ability to execute algorithms on a SGE cluster needs update re: object API changes
Ability to actively monitor remotely running algorithms via RPC needs update re: object API changes
MySQL database storing records of all algorithms, instances, runs, etc. done
SQLite database fallback if MySQL unavailable done
R interface for performing statistical tests, etc. done

Meta-Algorithms Included

Configuration procedure: ParamILS (external) in progress
Configuration procedure: ROAR (internal) done; will need minor updates to work with backend redesign
Analysis procedure: Paired algorithm comparison in progress
Analysis procedure: Single-algorithm analysis in progress

Distribution Issues

Documentation
Detection/configuration of external dependencies (c.f. UI/execution environment specification)
Double-click-to-run universal JAR distribution

HAL 1.1

target: December, 2010

Web UI Features

Ability to export complete experiment packages (including algorithms, instances, run instructions)
Ability to load and execute an experiment package
Ability to "chain" experiments (eg. design procs. followed by analysis proc comparing incumbents)

Functionality for meta-algorithm developers

Random Forest classification + regression models, incl. interface accepting AlgorithmRun objects for training and inference
support for feature extraction procedures

Backend functionality

Support for TORQUE clusters
Support for "bag-of-machines" execution manager

Meta-Algorithms Included

Configuration procedure: ActiveConfigurator (internal)
Multi-algorithm comparison
SATzilla-like portfolio builder
Parallelized AC
ParamILS (internal)

HAL 1.x

target: 2011

libraries of:
- search/optimization procedures
- machine learning tools
multi-algorithm comparisons
scaling analyses
bootstrapped analyses
robustness analyses
parameter response analyses
Parallel portfolios in HAL
Iterated F-Race in HAL
support for optimization/Monte-Carlo experiments
support instance generators
support for instance format converters
Support text-file inputs and outputs for external algorithms (now is only cmd line, and stdin/err)
array jobs in SGE
Wider support for working directory requirements of individual algorithm runs, e.g. Concorde's creation of 20 files with fixed names.

Unprioritized Features

new feature requests should be initially added here; notify a HAL developer and come to a HAL meeting if you feel your feature must move up the stack quickly

(FH) Support for complete configuration experiment, front to back: run configurator N times on a training set, report the N training and test set performances CN: can hopefully be implemented as a chained experiment
(FH) Developers of configurators should be able to swap in new versions of a configurator _CN:
(FH) Configuration scenarios, specifying a complete configuration task including the test set; only missing part being the configurator
(FH) Saveable sets of configuration scenarios to perform (use case: I change the configurator and want to evaluate it)
(FH) Taking this a step further: support for optimizing a parameterized configurator (configurator is an algorithm, and the above set of experiments is the set of "instances") CN: this is what is being implemented in the ongoing backend redesign
(FH) Submitting runs from a machine that is itelf a cluster submit host should not need to go through SSH
(CF) Memory usage / CPU time monitoring in HAL of target algorithm runs, in order to report warnings on potential problems (like excessive swapping for example).
(HH) Significance-gated analysis / sequential hypothesis testing (see email from HH).
(CF) Continued testing to support LAMA-ish difficulties in HAL:
* Wallclock vs. CPU cutoff options
* Warnings in the dashboard if target runs or experiments are behaving "strangely"
* Email notifications sent to users when various events happen
(CF) Restricted data/execution/targetalgs for the demo server
(CF) Selection of performance metric before selecting the configurator to use. What is the exact problem specification for configuration?
(CN) convenience methods in MetaAlgorithm hiding next(), hasNext(), report() from the 3rd-party developer; instead providing an interface like AlgorithmRun fetchRun(Algorithm a), with no InterruptedException; implies an AlgorithmRun class that can adaptively switch between a "queued" and a "running" implementation for before and after the true environment fetchRun(...) call is made/returns.
(HH) Service-oriented volunteer computing. See, e.g., "Service-Oriented Volunteer Computing for Massively Parallel Constraint Solving Using Portfolios", Zeynep Kiziltan and Jacopo Mauro, in CPAIOR-2010 proceedings.
(KLB) Handle network issues (e.g. loss of connection to datamanager, etc.) robustly. Restart runs, etc., as required to ensure that the originally-requested job ultimately completes correctly with as little babysitting by the user as possible.
(FH) Normalization transform, in addition to existing log transform

Active work items

Frontend

Release-critical

CF algorithm specification screen: implement (includes initial design space specification) (CF): In Progress
CF left side of landing page: task selection/presentation according to pattern concept (CF): In Progress
CF experiment specification and monitor screens from a pattern template, and procedure-specific requirements, including experiment and incubment naming
CF instance specification screen: implement (CF): In Progress
CF Execution environment specification (incl. R, Gnuplot, java locations) (CF): In Progress
RTDs/per-target-algorithm-run monitoring and navigation
design space specification by revision of existing spaces
Merge with backend refactor (when done)

Important

Data management interface:
- deleting runs/expts/etc.
- data export
Error logging/handling/browsing
Plotting ex-gnuplot
Documentation as a header on most of the experiment pages, paragraph explaining the intention etc.
Hiding "advanced" settings, such as configurator-specific settings or other tools, with appropriate defaults.

Backend

Release-critical

CN Split algorithms and configuration spaces, allowing run reuse for common-binary configuration spaces. Both DB and Java object model; requires Algorithm refactor below. done
CN Explicit representation of problems/encodings, compatability of algs and instances via problem (encodings). done
CN Refactor code to align class hierarchy with terminology of paper (CN: done for all but meta-algorithm implementations, which are in progress)
CN Refactor Algorithm/ParameterSpace/Parameter/Domain structure to allow above done
CN Database schema -- speed-related refactor done (may want further tuning)
CN Refactor SSH & RPC execution managers to work under refactor

Important

CN Connection pooling done
Caching analysis results (CN: in progress as part of meta-alg changes above)
CN Query optimization done (may want more depending on real-world observations)
Selective limitation of run-level archiving (dynamic based on runtime?)
add incumbentname semantic input to (design) procedures
instance features

Nice-to-have

CN DataManager API refinement (in progress as part of DataManager refactor)
CF N-way performance comparison
Stale connection issue; incl. robustness to general network issues
CN Read-only DataManager connection for use by individual MA procedures done
Allowing relationships (incl. possible run-reuse) between different-binary "builds" of algorithms, including due to bugfixes, additional exposed parameters, etc. Also for different "versions" (without reuse) corresponding to added funcitonality.
Ability to quantify membership of configurations to different design spaces done

Application: ActiveConfigurator

Release Critical

VC ROAR in Java in testing
VC Calling Matlab from Java in testing
CN parameter transformations (log, discretization, etc.) done
VC SMBO, calling Matlab for model building/evaluation (VC: implemented, in testing)
Adapt Weka RF implementation for regression
Pure-Java SMBO implementation
Merge Java AC with refactored HAL codebase once refactor is completed
Adapt standalone Java AC to work as "internal" HAL meta-algorithm

Support/QA/Misc.

Release Critical

unit testing: parameters (domains) OK
unit testing: parameter spaces OK
unit testing: algorithms
unit testing: execution managers (local, SSH, cluster)
unit testing: data managers (SQLite, MySQL)
unit testing: meta-algorithms
functional testing: full pipeline
Licensing issues (GPL'd components...)

Important

CN Git, not CVS done
CN Order+configure new DB server (CN: waiting for Dave B to make final changeover)
user-facing documentation (help)
CN Better logging/error-reporting (to console/within HAL). eg:*done* (for most cases; exceptions are auto-logged)
CN JX VC Basic Windows support done, in testing
Better handling of overhead runtime vs. target algorithm runtime

Nice-to-have

developer-facing documentation (javadocs) (in progress in parallel with other work)

Bug Reports

(CN) JSC test reliability issue (compared to R)
(CN) end-of-experiment hanging bug (GGA, multinode cluster runs)
(LX) missing current-time point in solution quality trace, so don't see the final "flat line"
(CN) accuracy of mid-run overhead accounting for PILS/GGA
(CF) Configuration file callstrings with weird spaces, i.e. "... -param '$val$ blah' ..." where '$val blah' needs to be passed to the target as a single argument. (CN) does this work with double-quotes instead of single-quotes?
(JS) FixedConfigurationExperiment UI is outdated, unusable.
(JS) HAL is not usable on WestGrid. We need a TorqueClusterExecutionManager.
(JS) Algorithms with a requirement of a new directory for each run.
(JS) one of the ExecutionManagers produces unstarted AlgorithmRuns
(FH) If a HAL slave process fails to start, the associated expt. status stays on "queued" forever
(FH) Database table contention causes locking and high query latency. Likely to be fixed by database changes and use of InnoDB, but I'm reporting it anyway.
(CN) DataManager-decorated ExecutionManager still requires explicit commit to save results. Also run results cannot be saved unless explicitly associated with an experiment id.
(CN) Parameter values (eg Instance files) with spaces are split during command string construction; need to enquote them as necessary.
(CN) Form input not validates moved from feature requests
(MC) After error: java.io.IOException: Cannot run program "gnuplot" (in directory "gnuplotData"): java.io.IOException: error=2, No such file or directory, experiment cannot be aborted.

This topic: BETA > HAL
Topic revision: r45 - 2011-01-05 - mavc