Feature Milestones
target: September, 2010
Web UI Features
- Page to add new external target algorithms
- Page to add new parameter spaces for a given target algorithm (modified from existing spaces)
- Page to add new problem instances/distributions (in the form of lists of files)
- Page to specify new execution environments (Eg. cluster config details)
- Pages to specify & launch included meta-algorithms
- Ability to view algorithms/instances by problem (instance compatibility) during above specification
- Page to view summary of all queued, running, and completed jobs
- Page to view browse/view details/delete runs/problems/instances/algorithms/environments
- Dynamic run monitoring analysis pages, including:
- Plots: Overlaid SCDs for (fixed #) multi-alg, multi inst meta-algs (RTDs for single-inst), SQT for meta-algs where possible, scatter plot for 2-target multi-instance meta-algs, incumbent SCD/RTD for design meta-algs. done but being reworked
- Descriptive statistics: (mean/sd, quantiles/iqrs) for assessing single-algorithm on an instance dist done
- Statistical tests: Wilcoxon signed rank, Spearman correlation for comparing 2 algs on an instance dist done
Functionality for meta-algorithm developers
- Ability to interact with the parameter space of an algorithm (examine domains, conditionalities, etc.) done
- Ability to transform algorithm parameter spaces: log transforms, discretization done
- Ability to run arbitrary algorithms, including other meta-algorithms, in identical fashion done
- Ability to monitor the trajectories of all output variables of an executed algorithm run, in real time done
- Ability to query database of previous runs directly done
- Ability to access instance features done
- Pre-defined metrics for aggregating performance across runs done
Backend functionality exposed in above
- Ability to execute algorithms locally done
- Ability to execute algorithms on a remote host via SSH needs update re: API changes
- Ability to execute algorithms on a SGE cluster needs update re: object API changes
- Ability to actively monitor remotely running algorithms via RPC needs update re: object API changes
- MySQL database storing records of all algorithms, instances, runs, etc. done
- SQLite database fallback if MySQL unavailable done
- R interface for performing statistical tests, etc. done
Meta-Algorithms Included
- Configuration procedure: ParamILS (external) in progress
- Configuration procedure: ROAR (internal) done; will need minor updates to work with backend redesign
- Analysis procedure: Paired algorithm comparison in progress
- Analysis procedure: Single-algorithm analysis in progress
Distribution Issues
- Documentation
- Detection/configuration of external dependencies (c.f. UI/execution environment specification)
- Double-click-to-run universal JAR distribution
target: December, 2010
Web UI Features
- Ability to export complete experiment packages (including algorithms, instances, run instructions)
- Ability to load and execute an experiment package
- Ability to "chain" experiments (eg. design procs. followed by analysis proc comparing incumbents)
Functionality for meta-algorithm developers
- Random Forest classification + regression models, incl. interface accepting AlgorithmRun objects for training and inference
- support for feature extraction procedures
Backend functionality
- Support for TORQUE clusters
- Support for "bag-of-machines" execution manager
Meta-Algorithms Included
- Configuration procedure: ActiveConfigurator (internal)
- Multi-algorithm comparison
- SATzilla-like portfolio builder
- Parallelized AC
- ParamILS (internal)
target: 2011
- libraries of:
- search/optimization procedures
- machine learning tools
- multi-algorithm comparisons
- scaling analyses
- bootstrapped analyses
- robustness analyses
- parameter response analyses
- Parallel portfolios in HAL
- Iterated F-Race in HAL
- support for optimization/Monte-Carlo experiments
- support instance generators
- support for instance format converters
- Support text-file inputs and outputs for external algorithms (now is only cmd line, and stdin/err)
- array jobs in SGE
- Wider support for working directory requirements of individual algorithm runs, e.g. Concorde's creation of 20 files with fixed names.
Unprioritized Features
new feature requests should be initially added here; notify a HAL developer and come to a HAL meeting if you feel your feature must move up the stack quickly
- (FH) Support for complete configuration experiment, front to back: run configurator N times on a training set, report the N training and test set performances CN: can hopefully be implemented as a chained experiment
- (FH) Developers of configurators should be able to swap in new versions of a configurator _CN:
- (FH) Configuration scenarios, specifying a complete configuration task including the test set; only missing part being the configurator
- (FH) Saveable sets of configuration scenarios to perform (use case: I change the configurator and want to evaluate it)
- (FH) Taking this a step further: support for optimizing a parameterized configurator (configurator is an algorithm, and the above set of experiments is the set of "instances") CN: this is what is being implemented in the ongoing backend redesign
- (FH) Submitting runs from a machine that is itelf a cluster submit host should not need to go through SSH
- (CF) Memory usage / CPU time monitoring in HAL of target algorithm runs, in order to report warnings on potential problems (like excessive swapping for example).
- (HH) Significance-gated analysis / sequential hypothesis testing (see email from HH).
- (CF) Continued testing to support LAMA-ish difficulties in HAL:
- * Wallclock vs. CPU cutoff options
- * Warnings in the dashboard if target runs or experiments are behaving "strangely"
- * Email notifications sent to users when various events happen
- (CF) Restricted data/execution/targetalgs for the demo server
- (CF) Selection of performance metric before selecting the configurator to use. What is the exact problem specification for configuration?
- (CN) convenience methods in MetaAlgorithm hiding next(), hasNext(), report() from the 3rd-party developer; instead providing an interface like AlgorithmRun fetchRun(Algorithm a), with no InterruptedException; implies an AlgorithmRun class that can adaptively switch between a "queued" and a "running" implementation for before and after the true environment fetchRun(...) call is made/returns.
- (HH) Service-oriented volunteer computing. See, e.g., "Service-Oriented Volunteer Computing for Massively Parallel Constraint Solving Using Portfolios", Zeynep Kiziltan and Jacopo Mauro, in CPAIOR-2010 proceedings.
- (KLB) Handle network issues (e.g. loss of connection to datamanager, etc.) robustly. Restart runs, etc., as required to ensure that the originally-requested job ultimately completes correctly with as little babysitting by the user as possible.
- (FH) Normalization transform, in addition to existing log transform
Active work items
Frontend
Release-critical
- CF algorithm specification screen: implement (includes initial design space specification) (CF): In Progress
- CF left side of landing page: task selection/presentation according to pattern concept (CF): In Progress
- CF experiment specification and monitor screens from a pattern template, and procedure-specific requirements, including experiment and incubment naming
- CF instance specification screen: implement (CF): In Progress
- CF Execution environment specification (incl. R, Gnuplot, java locations) (CF): In Progress
- RTDs/per-target-algorithm-run monitoring and navigation
- design space specification by revision of existing spaces
- Merge with backend refactor (when done)
Important
- Data management interface:
- deleting runs/expts/etc.
- data export
- Error logging/handling/browsing
- Plotting ex-gnuplot
- Documentation as a header on most of the experiment pages, paragraph explaining the intention etc.
- Hiding "advanced" settings, such as configurator-specific settings or other tools, with appropriate defaults.
Backend
Release-critical
- CN Split algorithms and configuration spaces, allowing run reuse for common-binary configuration spaces. Both DB and Java object model; requires Algorithm refactor below. done
- CN Explicit representation of problems/encodings, compatability of algs and instances via problem (encodings). done
- CN Refactor code to align class hierarchy with terminology of paper (CN: done for all but meta-algorithm implementations, which are in progress)
- CN Refactor Algorithm/ParameterSpace/Parameter/Domain structure to allow above done
- CN Database schema -- speed-related refactor done (may want further tuning)
- CN Refactor SSH & RPC execution managers to work under refactor
Important
- CN Connection pooling done
- Caching analysis results (CN: in progress as part of meta-alg changes above)
- CN Query optimization done (may want more depending on real-world observations)
- Selective limitation of run-level archiving (dynamic based on runtime?)
- add incumbentname semantic input to (design) procedures
- instance features
Nice-to-have
- CN DataManager API refinement (in progress as part of DataManager refactor)
- CF N-way performance comparison
- Stale connection issue; incl. robustness to general network issues
- CN Read-only DataManager connection for use by individual MA procedures done
- Allowing relationships (incl. possible run-reuse) between different-binary "builds" of algorithms, including due to bugfixes, additional exposed parameters, etc. Also for different "versions" (without reuse) corresponding to added funcitonality.
- Ability to quantify membership of configurations to different design spaces done
Release Critical
- VC ROAR in Java in testing
- VC Calling Matlab from Java in testing
- CN parameter transformations (log, discretization, etc.) done
- VC SMBO, calling Matlab for model building/evaluation (VC: implemented, in testing)
- Adapt Weka RF implementation for regression
- Pure-Java SMBO implementation
- Merge Java AC with refactored HAL codebase once refactor is completed
- Adapt standalone Java AC to work as "internal" HAL meta-algorithm
Support/QA/Misc.
Release Critical
- unit testing: parameters (domains) OK
- unit testing: parameter spaces OK
- unit testing: algorithms
- unit testing: execution managers (local, SSH, cluster)
- unit testing: data managers (SQLite, MySQL)
- unit testing: meta-algorithms
- functional testing: full pipeline
- Licensing issues (GPL'd components...)
Important
- CN Git, not CVS done
- CN Order+configure new DB server (CN: waiting for Dave B to make final changeover)
- user-facing documentation (help)
- CN Better logging/error-reporting (to console/within HAL). eg:*done* (for most cases; exceptions are auto-logged)
- CN JX VC Basic Windows support done, in testing
- Better handling of overhead runtime vs. target algorithm runtime
Nice-to-have
- developer-facing documentation (javadocs) (in progress in parallel with other work)
Bug Reports
- (CN) JSC test reliability issue (compared to R)
- (CN) end-of-experiment hanging bug (GGA, multinode cluster runs)
- (LX) missing current-time point in solution quality trace, so don't see the final "flat line"
- (CN) accuracy of mid-run overhead accounting for PILS/GGA
- (CF) Configuration file callstrings with weird spaces, i.e. "... -param '$val$ blah' ..." where '$val blah' needs to be passed to the target as a single argument. (CN) does this work with double-quotes instead of single-quotes?
- (JS) FixedConfigurationExperiment UI is outdated, unusable.
- (JS) HAL is not usable on WestGrid. We need a TorqueClusterExecutionManager.
- (JS) Algorithms with a requirement of a new directory for each run.
- (JS) one of the ExecutionManagers produces unstarted AlgorithmRuns
- (FH) If a HAL slave process fails to start, the associated expt. status stays on "queued" forever
- (FH) Database table contention causes locking and high query latency. Likely to be fixed by database changes and use of InnoDB, but I'm reporting it anyway.
- (CN) DataManager-decorated ExecutionManager still requires explicit commit to save results. Also run results cannot be saved unless explicitly associated with an experiment id.
- (CN) Parameter values (eg Instance files) with spaces are split during command string construction; need to enquote them as necessary.
- (CN) Form input not validates moved from feature requests
- (MC) After error: java.io.IOException: Cannot run program "gnuplot" (in directory "gnuplotData"): java.io.IOException: error=2, No such file or directory, experiment cannot be aborted.
This topic: BETA
> HAL
Topic revision: r45 - 2011-01-05 - mavc