LLM Agents

LLM Agents for Automated Scientific Research

Autonomous agents that think, plan, and compute like experts.

The search for next-generation batteries, catalysts, and alloys depends on Density Functional Theory (DFT) — one of the most accurate ways to predict material properties. But DFT is notoriously demanding: every calculation can take hours, and even experts spend years learning to set parameters, test convergence, and fix subtle errors.

DREAMS is a hierarchical, multi-agent framework that couples large language model (LLM) reasoning with scientific tools, enabling autonomous, high-fidelity simulation with minimal human input.

DREAMS agent team illustration

From Workflows to Thinking Agents

Traditional computational workflows are static — scripts that execute predefined tasks. They cannot adapt, reason, or recover from unforeseeable failure. DREAMS transforms this process into a coordinated team of specialized agents, each handling a core aspect of the research cycle — planning, simulation, resource allocation, convergence handling, and data exchange.

  • Supervisor Agent: Plans and dynamically updates tasks based on progress and results.
  • DFT Agent: Generates atomic structures, optimizes convergence parameters, and parses outputs.
  • HPC Agent: Allocates cluster resources, submits jobs, and monitors runs.
  • Convergence Agent: Diagnoses and resolves failed or unconverged simulations.
  • Canvas: A shared memory system linking all agents, tools, and users — preserving structured context and eliminating hallucination.
DREAMS agent pipeline diagram
DFT-based Research Engine for Agentic Materials Screening (DREAMS) coordinates supervisor, DFT, HPC, and convergence agents through a shared Canvas.

Autonomous Benchmarks at Expert Fidelity

We evaluated DREAMS on three canonical challenges in computational materials science, demonstrating expert-level fidelity without human-in-the-loop steering.

  1. Sol27LC Benchmark – Equilibrium Lattice Constants: DREAMS autonomously executed full DFT workflows across 27 elemental crystals. The average error in lattice constants was below 1% compared to human-expert DFT calculations.
    Summary on correct structures generated by our agent, mean average percentage error (MAPE) compared to results obtained from a human DFT expert, and k-point & ecutwfc parameters chosen by the agent across Sol27LC systems.
    Structure Systems # of correct structures MAPE k-point range ecutwfc
    BCC Li, Na, K, ... 11/11 0.36% 8–16 40–70
    FCC Rh, Ir, ... 12/12 0.51% 8–18 40–70
    DIA C, Si, Ge, ... 4/4 1.00% 6–8 40–70
  2. CO/Pt(111) Adsorption Puzzle: A long-standing benchmark in catalysis. DREAMS reproduced literature-level adsorption-energy differences between FCC and atop sites, dynamically fixing failed jobs and refining scripts mid-execution.
    ΔBE = Eads,ontop − Eads,fcc. Calculations are performed on a 2×2 supercell at 1/4 monolayer (θ = 1/4 ML) coverage, showing close agreement with human experts and literature.
    Supercell $\theta$ (ML) XC Agent Team Human Expert Literature
    2 × 2 1/4 PBE 0.104 0.108 0.10–0.24
    2 × 2 1/4 LDA 0.318 0.320 0.32–0.45
  3. Functional-Driven Uncertainty Quantification: Using Bayesian ensemble sampling, DREAMS analyzed exchange–correlation functional uncertainties and confirmed FCC-site preference at the GGA level.
    BEEF ensemble distribution of ΔBE
    Distribution of ΔBE from BEEF ensemble analysis: human-expert calculations yield a mean of −0.13 eV (σ = 0.01 eV), while DREAMS produces −0.12 eV (σ = 0.01 eV).

Together, these results showcase autonomous exploration of material design spaces, representing a major step toward self-driving computational science.

A Platform for Scientific Reasoning

Beyond DFT, DREAMS represents a template for agentic science. Its architecture generalizes to molecular dynamics, Monte Carlo, and generative materials discovery. By coordinating reasoning, computation, and error recovery through a shared Canvas, DREAMS moves beyond automation to genuine scientific cognition.

LLMs as scientific foundation models
Large language models as scientific foundation models, orchestrating code, workflows, validation, and domain-specific agents for computational science pipelines.

Perspective

DREAMS bridges three worlds:

  1. Physics: rigorous DFT fidelity.
  2. AI: LLM reasoning and hierarchical planning.
  3. HPC: scalable execution across clusters.

It transforms how scientific computation is performed — from manual scripts to intelligent, adaptive agents that learn and reason with physics in mind.