LLM Algorithmic Thinking Benchmark

About

I was curious how well LLMs could do "algorithmic thinking", i.e. how well they could reason about code. One way of testing this is giving LLMs programs + inputs, and asking them to predict the outputs.

This repo provides a set of basic algorithms, data structures, and test inputs to run LLMs on.

The project also provides a simple and modular way to run these LLMs on the test cases and compare them against what the code would actually output.

The /web directory contains the code for visualizing the result of running the default pipeline.

Run locally

Install the python requirements with pip install -r requirements.txt.

To run the default pipeline, run python run_default_pipeline.py.

To run the web server, run cd web && pnpm dev.

To test the inputs, run python validate_all_inputs.py.

Environment Variables

OPENAI_API_KEY
COHERE_API_KEY

Name		Name	Last commit message	Last commit date
Latest commit History 50 Commits
.github/workflows		.github/workflows
execution_pipeline		execution_pipeline
tasks		tasks
web		web
.gitignore		.gitignore
README.md		README.md
evals_format.py		evals_format.py
paths.py		paths.py
requirements.txt		requirements.txt
run_default_pipeline.py		run_default_pipeline.py
utils.py		utils.py
validate_all_inputs.py		validate_all_inputs.py
writer.py		writer.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

LLM Algorithmic Thinking Benchmark

About

Run locally

Environment Variables

About

Releases

Packages

Languages

SimonBerens/LLM-Algorithm-Benchmark

Folders and files

Latest commit

History

Repository files navigation

LLM Algorithmic Thinking Benchmark

About

Run locally

Environment Variables

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages