Latency refers to the delay that can occur when LLMs communicate with the cloud to retrieve information used to generate answers to users prompts. In some instances, high-quality answers are worth waiting for while in other scenarios speed is more important to user satisfaction.