The table below shows how often the LLM is able to satisfy all constraints: GPT-4 in an astonishing 0.6% of all queries. The main issues: (1) Planning strategies like LLM agents are bad at converting their reasoning into the right actions, and keeping track of global constraints (like total budget). (2) Language agents have issues like getting locked into dead loops, hallucinations or errors in tool use. (3) Invalid actions and repetitive action loops contribute to 37.3% and 6.0% of errors, respectively. (4) Agents struggle to align their actions with their reasoning.