Latency is a significant issue for most LLM related use cases. For scenarios like code suggestions and modifying long documents, latency can really affect the overall user experience. Imagine a user wanting to rewrite the last paragraph of a 2 page document. It would be better if the rewritten document appeared instantly since the change involves only a single paragraph. However, current LLM APIs require the entire document to be regenerated, causing significant latency for users. OpenAI is n

Read the full article at Neowin