LLM Agents

LLM Agents for Automated Scientific Research

Autonomous agents that think, plan, and compute like experts.

The search for next-generation batteries, catalysts, and alloys depends on Density Functional Theory (DFT) — one of the most accurate ways to predict material properties. But DFT is notoriously demanding: every calculation can take hours, and even experts spend years learning to set parameters, test convergence, and fix subtle errors.

DREAMS is a hierarchical, multi-agent framework that couples large language model (LLM) reasoning with scientific tools, enabling autonomous, high-fidelity simulation with minimal human input.

Traditional computational workflows are static — scripts that execute predefined tasks. They cannot adapt, reason, or recover from unforeseeable failure. DREAMS transforms this process into a coordinated team of specialized agents, each handling a core aspect of the research cycle — planning, simulation, resource allocation, convergence handling, and data exchange.

Supervisor Agent: Plans and dynamically updates tasks based on progress and results.
DFT Agent: Generates atomic structures, optimizes convergence parameters, and parses outputs.
HPC Agent: Allocates cluster resources, submits jobs, and monitors runs.
Convergence Agent: Diagnoses and resolves failed or unconverged simulations.
Canvas: A shared memory system linking all agents, tools, and users — preserving structured context and eliminating hallucination.

DREAMS agent pipeline diagram — DFT-based Research Engine for Agentic Materials Screening (DREAMS) coordinates supervisor, DFT, HPC, and convergence agents through a shared Canvas.

We evaluated DREAMS on three canonical challenges in computational materials science, demonstrating expert-level fidelity without human-in-the-loop steering.

Sol27LC Benchmark – Equilibrium Lattice Constants: DREAMS autonomously executed full DFT workflows across 27 elemental crystals. The average error in lattice constants was below 1% compared to human-expert DFT calculations.

Summary on correct structures generated by our agent, mean average percentage error (MAPE) compared to results obtained from a human DFT expert, and k-point & ecutwfc parameters chosen by the agent across Sol27LC systems.
Structure	Systems	# of correct structures	MAPE	k-point range	ecutwfc
BCC	Li, Na, K, ...	11/11	0.36%	8–16	40–70
FCC	Rh, Ir, ...	12/12	0.51%	8–18	40–70
DIA	C, Si, Ge, ...	4/4	1.00%	6–8	40–70

CO/Pt(111) Adsorption Puzzle: A long-standing benchmark in catalysis. DREAMS reproduced literature-level adsorption-energy differences between FCC and atop sites, dynamically fixing failed jobs and refining scripts mid-execution.

ΔBE = E_ads,ontop − E_ads,fcc. Calculations are performed on a 2×2 supercell at 1/4 monolayer (θ = 1/4 ML) coverage, showing close agreement with human experts and literature.
Supercell	$\theta$ (ML)	XC	Agent Team	Human Expert	Literature
2 × 2	1/4	PBE	0.104	0.108	0.10–0.24
2 × 2	1/4	LDA	0.318	0.320	0.32–0.45

Functional-Driven Uncertainty Quantification: Using Bayesian ensemble sampling, DREAMS analyzed exchange–correlation functional uncertainties and confirmed FCC-site preference at the GGA level.

Distribution of ΔBE from BEEF ensemble analysis: human-expert calculations yield a mean of −0.13 eV (σ = 0.01 eV), while DREAMS produces −0.12 eV (σ = 0.01 eV).

Together, these results showcase autonomous exploration of material design spaces, representing a major step toward self-driving computational science.

Beyond DFT, DREAMS represents a template for agentic science. Its architecture generalizes to molecular dynamics, Monte Carlo, and generative materials discovery. By coordinating reasoning, computation, and error recovery through a shared Canvas, DREAMS moves beyond automation to genuine scientific cognition.

LLMs as scientific foundation models — Large language models as scientific foundation models, orchestrating code, workflows, validation, and domain-specific agents for computational science pipelines.

DREAMS bridges three worlds:

Physics: rigorous DFT fidelity.
AI: LLM reasoning and hierarchical planning.
HPC: scalable execution across clusters.

It transforms how scientific computation is performed — from manual scripts to intelligent, adaptive agents that learn and reason with physics in mind.

Read the paper Check the code

AI Agents

LLM Agents for Automated Scientific Research

From Workflows to Thinking Agents

Autonomous Benchmarks at Expert Fidelity

A Platform for Scientific Reasoning

Perspective