Kevin Leyton-Brown: Downloads

Open-Source Software Projects

organized by date of last major code update

Machine Learning

Heuristic Algorithms

Economics & Computation

Language Models

Social Impact

2024

2023

2022

2021

2020

2019

2018

2017

2016

2015

2014

2013

2012

2011

2010

2009

2008

2005

2003

2024

STEER: Systematic and Tuneable Evaluation of Economic Rationality in LLMs. A benchmark distribution that quantitatively scores an LLM's performance on fine-grained “elements of rationality” specified in a custom taxomony. Combined with a user-provided rubric, these can be aggregated into a “rationality report card”.

Publication: N. Raman, T. Lundy, S. Amouyal, Y. Levine, K. Leyton-Brown, M. Tennenholtz; 2024.
Repository: Github

Clock Auction Testbed for Multi-Agent Reinforcement Learning. A highly configurable clock auction environment which serves as a testbed for research on multi-agent reinforcement learning for auctions. Our code exposes many parameters representing possible auction rule changes. It also generates clock-auction game instances and associated bidder values, leveraging a realistic value model from the literature.

Publication: G. d'Eon, N. Newman, K. Leyton-Brown; 2024.
Repository: Github

Agora: Motivating and Measuring Engagement in Large-Class Discussions. Agora facilitates calling on students in class in a more equitable way, with the added benefit that it automatically produces an assessment of each student’s engagement. The key ideas are to give students control over whether their hand is raised or lowered, to choose randomly among students with raised hands, prioritizing those who have not yet spoken, and to give participation credit to all students who were considered every time a speaker is chosen. The system has various other features to facilitate deployment in large classes including multiple queues to support concurrent questions on different topics; a message board to allow students to communicate discretely with the instructor; and polling.

Publication: H. Zarkoob, S. Nand, K. Leyton-Brown, G. Toti; 2024.
Repository: Github

Knuth Synthesis: Monte Carlo Forest Search for SAT. Knuth Synthesis learns DPLL branching policies for solving the Boolean satisfiability (SAT) problem, with the objective of achieving good average-case performance on a given distribution of unsatisfiable problem instances. Knuth Synthesis leverages two key ideas to avoid the prohibitive costs of policy evaluations in an exponentially-sized tree. First, it estimates tree size by randomly sampling paths and measuring their lengths, drawing on an unbiased approximation due to Knuth (1975). Second, it queries a strong solver at a user-defined depth rather than learning a policy across the whole tree, to focus policy search on early decisions that offer the greatest potential for reducing tree size.

Publication: C. Cameron, J. Hartford, T. Lundy, T. Truong, A. Milligan, R. Chen, K. Leyton-Brown; 2024.
Repository: Github

FACTOR: Factual Assessment via Corpus TransfORmation. A scalable approach for evaluating LM factuality. FACTOR automatically transforms a factual corpus of interest into a benchmark evaluating an LM's propensity to generate true facts from the corpus vs. similar but incorrect statements. We have used our framework to create three benchmarks: Wiki-FACTOR, News-FACTOR and Expert-FACTOR.

Publication: D. Muhlgay, O. Ram, I. Magar, Y. Levine, N. Ratner, Y. Belinkov, O. Abend, K. Leyton-Brown, A. Shashua, Y. Shoham; 2024.
Repository: Github

2023

Utilitarian Algorithm Configuration. The first nontrivial procedure for configuring heuristic algorithms to maximize the utility provided to their end users while also offering theoretical guarantees about performance. Our code implements our Utilitarian Procrastination algorithm and reproduces all plots in our paper.

Publication: D. Graham, K. Leyton-Brown, T. Roughgarden; 2023.
Repository: Github

Parallel Context Windows Improve In-Context Learning of Large Language Models. The code for reproducing the classification experiments on GPT2 models from our paper.

Publication: N. Ratner, Y. Levine, Y. Belinkov, O. Ram, O. Abend, E. Karpas, A. Shashua, K. Leyton-Brown, Y. Shoham; 2023.
Repository: Github

Utility Functions for Runtime Distributions. The code for reproducing all figures from our paper Formalizing Preferences Over Runtime Distributions.

Publication: D. Graham, K. Leyton-Brown, T. Roughgarden; 2023.
Repository: Github

In-Context Retrieval-Augmented Language Models. Code and data for reproducing the experiments from our paper In-Context Retrieval-Augmented Language Models, which shows that off-the-shelf general purpose retrievers provide surprisingly large language model gains across model sizes and diverse corpora, and that document retrieval and ranking mechanisms can be specialized to this setting to further boost performance.

Publication: O. Ram, Y. Levine, I. Dalmedigos, D. Muhlgay, A. Shashua, K. Leyton-Brown, Y. Shoham; 2023.
Repository: Github

Bayesian Inference for Peer Grading. This software uses Bayesian inference to compute posterior distributions over true assignment grades and grader reliabilities in a multi-week peer-grading setup, optionally leveraging the possibility of trusted TAs, modeling the possibility that students will submit grades without making an effort, and providing a mechanism to correctly handle both input and output as discrete grades from a rubric.

Publication: H. Zarkoob, G. d'Eon, L. Podina, K. Leyton-Brown; 2023.
Repository: Github

Mechanical TA: Partially automated high-stakes peer grading. A web-based application enabling students to grade work from randomly selected peers in a class, and also to gain experience with the peer review process by performing "calibration" reviews. The key difference from other peer-grading systems is that it uses TAs to perform selective spot-checks and thus incentivizes high-quality grading. The system uses Bayesian inference to estimate posterior distributions over both assignment grades and grader dependabilities.

Publication: H. Zarkoob, K. Leyton-Brown; 2024.
Publication: H. Zarkoob, G. d'Eon, L. Podina, K. Leyton-Brown; 2023.
Publication: H. Zarkoob, F. Abdolhosseini, K. Leyton-Brown; 2021.
Publication: J. Wright, C. Thornton, K. Leyton-Brown; 2015.
Web page (version 2.1: Python): Mechanical TA 2 Project Page
Web page (version 1.0: PHP; many fewer features): Mechanical TA 1 Project Page

2022

The Spotlight: Auditing Deep Learning Models to Identify Systematic Biases. This software uses a deep model's final-layer embedding to identify contiguous regions of poor performance on a dataset.

Publication: G. d'Eon, J. d'Eon, J. Wright, K. Leyton-Brown; 2022.
Repository: Github

Predict-then-Optimize vs End-to-End Learning. Formulating real-world optimization problems often begins with making predictions from historical data; typically, learning the prediction model used to generate the optimization problem and solving that problem are performed in two separate stages. Such prediction models can also be learned end-to-end by differentiating through the optimization task. This package contains code for experiments contrasting these two approaches.

Publication: C. Cameron, J. Hartford, T. Lundy, K. Leyton-Brown; 2022.
Web page: Perils of Learning Before Optimizing Project Page

A Modular, Neuro-Symbolic Architecture that Combines Large Language Models, External Knowledge Sources and Discrete Reasoning. All synthetic data used for the experiments described in the paper.

System for Matching Reviews and Papers at Large Conferences. This system leverages DBLP and Semantic Scholar profiles to infer conflicts of interest; leverages TPMS, ACL, and keywords to score reviewer-paper matchings; defines a mixed-integer program to identify good matchings; and solves it using row and column generation.

Publication: K. Leyton-Brown, Mausam, Y. Nandwani, H. Zarkoob, C. Cameron, N. Newman, D. Raghu; 2022.
Repository: Github

2021

ModeIV: Valid Causal Inference with (Some) Invalid Instruments. ModeIV extends DeepIV to perform valid causal inference in the presence of multiple instruments, only some of which are valid, leveraging the assumption that the modal instrument is valid.

Publication: J. Hartford, V. Veitch, D. Sridhar, K. Leyton-Brown; 2021.
Repository: Github

PMI-Masking. The masking vocabulary used in our PMI-Masking paper, which introduced a new technique for training masked language models like BERT.

Publication: Y. Levine, B. Lenz, O. Lieber, O. Abend, K. Leyton-Brown, M. Tennenholtz, Y. Shoham; 2021.
Repository: Github

2020

Predicting Satisfiability End to End. This software uses permutation-equivariant deep models to predict the satisfiability status of SAT formulas in raw CNF format.

Publication: C. Cameron, R. Chen, J. Hartford, K. Leyton-Brown; 2020.
Web page: Predicting Satisfiability via End-to-End Learning Project Page

FCC Incentive Auction Simulator. This simulator of the FCC's 2016–17 Incentive Auction is the most detailed of which we are aware. It permits whole auctions to be run at national scale; includes VHF licenses and ladder constraints; and includes two valuation models, one of which is a novel contribution fit to historical data from the real auction.

Publication: N. Newman, K. Leyton-Brown, P. Milgrom, I. Segal; 2020.
Repository: Github

2019

ACLib: a benchmark library for algorithm configuration. AClib defines a set of standard benchmarks for algorithm configuration in order to provide a solid foundation for empirical science in the field.

Publication: F. Hutter, M. Lopez-Ibanez, C. Fawcett, M. Lindauer, H. Hoos, K. Leyton-Brown, T. Stützle; 2014.
Repository: Github

Problem Generator for 5G Network Resource Assignment. Rather than dedicating hardware to specific network functions, 5G networks will dynamically allocate virtualized functions to servers from a generic pool based on network traffic. This software package generates realistic instances of such 5G network resource allocation problems. Its main achievements are providing a language for encoding-agnostic problem specification; modeling realistic network topologies and network functions; and modeling realistic distributions of network traffic.

Repository: Github

2018

Kudu: an electronic market for agricultural trade in Uganda. The original market, operating in 2011-2017, matched farmers selling crops with traders interested in purchasing them, based on SMS messages from both parties and performing daily market clears. The platform was substantially rewritten and relaunched in 2017-18, refocusing the market around the operations of a call center that iteratively confirmed matches between buyers and sellers in the context of globally optimal allocations, and also included numerous other improvements to the market's operation.

Publication: N. Newman, N. Immorlica, K. Leyton-Brown, B. Lucier, J. Quinn, R. Ssekibuule; 2022.
Publication: N. Newman, K. Leyton-Brown, N. Immorlica, L. Bergquist, B. Lucier, J. Quinn, C. McIntosh, R. Ssekibuule; 2018.
Publication: R. Ssekibuule, J. Quinn, K. Leyton-Brown; 2013.
Code and Trading Platform: kudu.ug

Kudu Mobile App. When the funding for our pilot ended, we could no longer operate a call center. We hence built an Android app that can be used by traders directly. It supports making and browsing active trades; viewing them on a map; filtering them by geographic area, crop, price, quantity, and recency; authenticating phone numbers and calling counterparties.

Publication: N. Newman, N. Immorlica, K. Leyton-Brown, B. Lucier, J. Quinn, R. Ssekibuule; 2022.
Code and App: Google Play Store

2017

DeepIV: Deep Learning for Counterfactual Prediction. DeepIV augments deep learning methods to accurately characterize causal relationships in the presence of instrument variables (IVs): sources of treatment randomization that are conditionally independent from the outcomes.

Publication: J. Hartford, G. Lewis, K. Leyton-Brown, M. Taddy; 2017.
Repository: Github

The Positronic Economist: A Computational System for Analyzing Economic Mechanisms. Computational mechanism analysis is a recent approach to economic analysis in which a mechanism design setting is analyzed entirely by a computer. For games with non-trivial numbers of players and actions, the approach is only feasible when these games can be encoded compactly. The Positronic Economist is a software system with two parts: (1) a Python-based language for succinctly describing mechanisms; and (2) a system that takes such descriptions as input, automatically identifies computationally useful structure, and produces a compact Action-Graph Game.

Publication: D. Thompson, N. Newman, K. Leyton-Brown; 2017.
Repository: Github

AutoWEKA: Combined Selection and Hyperparameter Optimization for Machine Learning. Auto-WEKA simultaneously selects a learning algorithm and sets its hyperparameters, going beyond previous methods that address these issues in isolation. Auto-WEKA does this using a fully automated approach, leveraging Bayesian optimization. The latest version is fully integrated into the WEKA GUI.

Publication: L. Kotthoff, C. Thornton, H. Hoos, F. Hutter, K. Leyton-Brown; 2019.
Publication: L. Kotthoff, C. Thornton, F. Hutter, H. Hoos, K. Leyton-Brown; 2017.
Publication: C. Thornton, F. Hutter, H. Hoos, K. Leyton-Brown; 2013.
Web page: AutoWEKA Project Page
Repository: Github

2016

GameNet: A Deep Learning Framework for Predicting Human Strategic Behavior. Gamenet is an architecture for learning predictive models of boundedly rational human strategic behavior. It significantly improved upon the previous state of the art without needing hand-tuned features developed by domain experts.

Publication: J. Hartford, J. Wright, K. Leyton-Brown; 2016.
Repository: Github

ASLib: a Benchmark Library for Algorithm Selection. This software provides a standardized format for representing algorithm selection scenarios and a repository that contains a growing number of data sets from the literature.

SATFC: a SAT-based feasibility checker for spectrum repacking. SATFC solves radio-spectrum repacking feasibility problems arising in the reverse auction of the FCC's broadcast incentive auction held in 2016. It leverages a SAT formulation, domain-specific heuristics, a parallel portfolio of SAT solvers tuned for the types of instances observed in auction simulations, and a novel caching strategy.

Publication: N. Newman, A. Fréchette, K. Leyton-Brown; 2018.
Publication: A. Fréchette, N. Newman, K. Leyton-Brown; 2017.
Publication: A. Fréchette, N. Newman, K. Leyton-Brown; 2016.
Web page: SATFC Project Page
Repository: Github

2015

SMAC: Sequential Model-based Algorithm Configuration. Model-based active learning methods for the automatic, black-box configuration of algorithm parameters.

Publication: Hutter, Hoos & Leyton-Brown, 2011.
Web page: Automated Algorithm Configuration Project Pages

SATenstein: an automatically configurable local search SAT solver. A generalized, highly parameterized solver framework that can be configured to instantiate a broad range of existing high-performance SLS-based SAT solvers.

Publication: KhudaBukhsh, Xu, Hoos & Leyton-Brown, 2016.
Publication: KhudaBukhsh, Xu, Hoos & Leyton-Brown, 2009.
Web page: SATenstein Project Page
Repository: Github

2014

Empirical Performance Models. Empirical performance models can be used to predict the performance of algorithms on previously unseen input, including previously unseen problem instances, previously untested parameter settings, or both.

Publication: F. Hutter, L. Xu, H. Hoos, K. Leyton-Brown; 2014.
Web page: EPMs Project Page

Features for Runtime Prediction of Domain-Independent Planners. A new, extensive set of instance features that facilitate building empirical performance models for the planning domain.

Publication: C. Fawcett, M. Vallati, F. Hutter, J. Hoffmann, H. Hoos, K. Leyton-Brown; 2014.
Repository: Github

The Configurable SAT Solver Challenge (CSSC). CSSC was a pair of competitive events that assessed the peak performance of solvers for the Boolean satisfiability (SAT) problem that accept parameters, recognizing that the value of a SAT solver often comes from its customizability rather than just its performance in a default configuration. The pages below include results, log files, solvers, incumbents and instance sets.

Publication: F. Hutter, M. Lindauer, A. Balint, S. Bayless, H. Hoos, K. Leyton-Brown; 2017.
Web page: CSSC 2014 Project Page
Web page: CSSC 2013 Project Page

2013

ParamILS: automated algorithm tuning. This iterative local search algorithm can be used as an entirely automated approach for tuning an algorithm's parameters to optimize its runtime. We've applied it to state-of-the-art SAT solvers and to CPLEX.

Publication: Hutter, Hoos, Leyton-Brown & Stützle, 2009.
Web page: ParamILS Project Page

Action-Graph Games: code and generators. Action-Graph Games (AGGs) are a compact representation for game theory. This C++ code can be used to compute expected utility in action-graph games, find their Nash equilibria through GAMUT solvers, and generate AGGs for computational experiments.

Publication: Jiang & Leyton-Brown, 2008.
Web page: AGG Project Page

GAMUT: A test suite for game theoretic algorithms. An extensible Java package that generates games from one or more classes described in the game theoretic literature. It is intended to be used as input for the empirical testing of game theoretic algorithms, e.g., multiagent reinforcement learning; computation of Nash equilibria.

Publication: Nudelman, Wortman, Shoham & Leyton-Brown, 2004.
Web page: GAMUT with AGG extensions (2013)
Web page: Original GAMUT Project Page (2004)

HAL: a framework for the automated design and analysis of high-performance algorithms. An experimental management and analysis platform for empirical algorithmics, particularly targeting meta-algorithmics. HAL has been designed to facilitate the easy and correct use of a broad range of standardized, advanced empirical analysis and design methods; to design and perform computational experiments, including large-scale analysis and design tasks involving substantial amounts of computation on potentially large clusters of machines; and to support the development and critical assessment of novel empirical analysis and design techniques.

Publication: Nell, Fawcett, Hoos & Leyton-Brown, 2011.
Web page: HAL Project Page

2012

SATzilla: A portfolio-based solver for the satisfiability problem. The latest (2012) version uses a custom implementation of cost-sensitive decision forests; predictions of feature computation time; novel instance features; and a wide variety of component solvers. It won three tracks in the 2012 SAT Challenge and placed second in the remaining track. Previously: SATzilla-2011 was the sole entry in the 2011 SAT competition Data Analysis Track; SATzilla-2009 won 3 gold and 2 silver in the 2009 SAT competition; SATzilla-2007 won 3 gold, 1 silver, 1 bronze in the 2007 SAT competition; SATzilla-2004 won 2 bronze in the 2004 SAT competition; SATzilla-2003 won 2 silver and 1 bronze in the 2003 SAT competition.

Publication (SATzilla-09): Xu, Hutter, Hoos & Leyton-Brown, 2009.
Publication (Journal paper): Xu, Hutter, Hoos & Leyton-Brown, 2008.
Publication (SATzilla-07): Xu, Hutter, Hoos & Leyton-Brown, 2007.
Publication (SATzilla-04): Nudelman, Devkar, Shoham, Leyton-Brown & Hoos, 2004.
Web page: SATzilla Project Page

2011

Benchmark Generator for Security Games. This Java-based generator allows researchers to generate synthetic security game problems from the computationally hardest region where the deployment-to-saturation ratio is 0.5, for a variety of different problem domains. It also provides tools for computing Strong Stackelberg Equilibria of these games using both GLPK and CPLEX.

Publication: Jain, Leyton-Brown & Tambe, 2012.
Web page: Security Games Benchmark Generator page

2010

Bayesian Action-Graph Games. An extension of the AGG representation to Bayesian games. The software enables the efficient computation of expected utility and of Bayes–Nash equilibria.

Publication: Jiang & Leyton-Brown, 2010.
Web page: AGG Project Page

Hydra: automatically configuring algorithms for portfolio-based selection. The software automatically builds an algorithm portfolio out of a single, highly-parameterized algorithm, targeting a given instance distribution and performance metric.

Publication: Xu, Hoos & Leyton-Brown, 2010.
Web page: Hydra Project Page

2009

Empirical Hardness Models. Code for building models that predict an algorithm's runtime based on cheaply-computable features describing a given instance. The papers below describe the construction of these models for various different classes of algorithms, problem domains, etc.

Publication: K. Leyton-Brown, H. Hoos, F. Hutter, L. Xu; 2014.
Publication: Leyton-Brown, Nudelman & Shoham, 2009.
Publication: Xu, Hoos & Leyton-Brown, 2007.
Publication: Hutter, Hamdi, Hoos & Leyton-Brown, 2006.
Publication: Leyton-Brown, Nudelman, & Shoham, 2005.
Publication: Nudelman, Leyton-Brown, Hoos, Devkar & Shoham, 2004.
Publication: Leyton-Brown, Nudelman & Shoham, 2002.
Web page: Empirical Hardness Models Project Page

Computational Analysis of Position Auctions. Software for representing position auctions as AGGs and finding their (possibly mixed-strategy) equilibria, as well as data from our published experiments.

Publication: Thompson & Leyton-Brown, 2009.
Web page: Position Auctions Project Page

2008

AGGUI: AGG Graphical User Interface. AGGUI allows users to create and edit AGGs, read in existing AGGs, and visualize strategy profiles (e.g. Nash equilibria) as a density map on the action graph.

Web page: AGGUI Project Page

Combinatorial Auction Code and Data. Instances generated by CATS 2.0 for every distribution (including "legacy" distributions), along with optimal revenue for the instance, runtimes for CPLEX 7.1/8.0 (depending on the dataset), CASS and Gonen-Lehmann. Also included are feature values for the runtime prediction models described in HTML

Leyton-Brown, Nudelman & Shoham, 2008.
Warning: it appears that occasionally, CPLEX reported that it had run to optimality, but in fact computed a suboptimal value. As far as we know, the CASS and GL numbers are reliable.

Fixed problem size:
- Archive: TAR.GZ 256 goods, 1000 non-dominated bids, 10 distributions, 500 instances per distribution. 550 Mhz Xeon, CPLEX 7.1.
- Archive: TAR.GZ 144 goods, 1000 non-dominated bids, 10 distributions, 500 instances per distribution. 550 Mhz Xeon, CPLEX 7.1.
- Archive: TAR.GZ 64 goods, 2000 non-dominated bids, 10 distributions, 500 instances per distribution. 550 Mhz Xeon, CPLEX 7.1.
Variable problem size:
- Archive: TAR.GZ 40-400 goods, 50-2000 non-dominated bids, 10 distributions, 800 instances per distribution. 2.4 Ghz Xeon, CPLEX 8.0.
Archive: ZIP Objective function values, feature values and CPLEX, CASS and GL runtimes for all the instances listed above.
Code for generating our features: TAR.GZ you only need this if you want to generate features for instances that aren't already in the above archives
Matlab Pipeline used to generate empirical hardness models: ZIP

2005

MALT: A Platform for the Empirical Testing of Multiagent Reinforcement Learning Algorithms.. A Matlab program that allows large-scale experimentation on multiagent reinforcement learning algorithm. The platform also supports visualization of the experimental results, and is easily extensible.

Publication: Lipson & Leyton-Brown, 2005.
Web page: MALT Project Page

2003

Local Effect Game solver. Java program that allows B-LEGs to be inputted graphically, and that uses myopic best response dynamics to find a pure-strategy Nash equilibrium for the game, or (upon a sufficient number of cycles detected) uses exhaustive enumeration to prove that a PSNE does not exist.

Publication: Leyton-Brown & Tennenholtz, 2003.
Java JAR file: JAR To run it, either double-click it (works on many operating systems including windows) or from a console, type "java -jar LEG.jar". JAR files are actually ZIP files, so if you want to modify or read the source code just open the file with any UNZIP program.

CATS (Combinatorial Auction Test Suite). Benchmark instances for combinatorial auction winner determination algorithms.

Publication: Leyton-Brown & Shoham, 2006. This more recent work describes version 2.0, which contains many substantial enhancements and was released August 2003.
Publication: Leyton-Brown, Pearson & Shoham, 2000.
Repository: Github Updated 2019; use this one!
Web page: CATS Project Page.Contains documentation, other downloads, etc.

CAMUS (Combinatorial Auction Multi-Unit Search). An algorithm for solving the winner determination problem for multi-unit combinatorial auctions.

Publication: Leyton-Brown; 2003. (my thesis; the fullest description of CAMUS)
Publication: Leyton-Brown, Shoham & Tennenholtz, 2000.
Archive: ZIP. Source code (ANSI C++).

CASS (Combinatorial Auctions Structured Search). An algorithm for solving the winner determination problem for single-unit combinatorial auctions.

Publication: Leyton-Brown; 2003. (my thesis; the fullest description of CASS).
Publication: Fujishima, Leyton-Brown & Shoham, 1999.
Archive: ZIP. Source code (ANSI C++).
Archive: ZIP. Windows executable.