The Neural Processing Unit NPU on Copilot+ PCs offers a highly efficient engine for model inferencing, unlocking a paradigm where generative AI can execute not just when invoked, but enable semi continuously running services. This empowers developers to tap into powerful reasoning engines to build proactive and sustained experiences. With our work on Phi Silica , we were able to harness highly efficient inferencing – delivering very competitive time to first token and throughput rates, while...