Fusing LLM & Classical Planning

In the previous part (Why are LLMs required for Planning?), the question we asked was: Can we improve LLM planning to have some formal guarantees?

To answer that, Let’s first recap.

	LLM Planning	Classical Planning
Open-world Planning	✅	❌
Handling Abstract Tasks	✅	❌
Handling Partial Observability	✅	❌
Feasibility	❌	✅
Optimality	❌	✅

We can observe that the formal guarantees come from classical planning. Can we somehow combine the best of both? Any guesses? Here’s a hint to jog your memories.

Ring any bells now? You guessed it right.

Considering LLM Planning as System 1 and Classical Planning as System 2, we look at two different ways of combining the best of both.

Equip LLMs with verifiers, dynamics, and heuristic search.
Use LLms as knowledge bases for open-world planning

LLMs + verifiers, dynamics, and heuristic search.

Let’s first start with some motivation. [1] shows that:

LLMs can generate good solutions if called multiple times.
Verifiers help discard bad candidate actions/plans.

1. LLMs + external verifiers

SayCan has a critical limitation: it selects actions based solely on feasibility, not their relevance to the goal. Consider this analogy: if you’re traveling from San Francisco to New York, would it make sense to fly via New Delhi simply because it’s feasible? To address this, SayCanPay[3] further adds a Pay model to estimate the payoff of an action with respect to the goal. That is, actions which are more optimal wrt the goal are more likely to be selected.

2. LLMs + external verifiers + heuristic search

SayCanPay[3] also proposes heuristic search using an aggregated score of the Say (LLM), Can (feasibility), and Pay (optimality) models. As shown in the Figure, the Beam-Action search performs a beam search over the action. This mirrors the search in heuristic planners. They show that the overall score for each action is a sum of the aggregated score and heuristic score, akin to A* planning.

3. LLMs + dynamics models

LLMs as Knowledge Base for Planners

Perhaps, a more apt conlusion would be:

References

[1] Lightman, H. (2023). Let’s Verify Step by Step.

[2] Ahn, M. (2022). Do As I Can, Not As I Say: Grounding Language in Robotic Affordances. 6th Annual Conference on Robot Learning

[3] Hazra, R. (2023). SayCanPay: Heuristic Planning with Large Language Models Using Learnable Domain Knowledge. Proceedings of the AAAI Conference on Artificial Intelligence, 20123–20133.

[4] Hao, S. (2023). Reasoning with Language Models is Planning with World Models. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 8154–8173.

[5] Du, Y. (2024). Video Language Planning. The Twelfth International Conference on Learning Representations.

[6] Liu, B. (2023). LLM+P: Empowering Large Language Models with Optimal Planning Proficiency.

[7] Wong, L. (2024). Learning Grounded Action Abstraction from Language. The Twelfth International Conference on Learning Representations