CPU, GPU, or NPU?
Testing the sample Prompt API playground on a Copilot+ PC shows that, for now at least, Edge is not using Window’s NPU support. Instead, the Windows Task Manager performance indicators show that Edge’s Phi model runs on the device’s GPU. At this early stage in development, it makes sense to take a GPU-only approach as more PCs will support it—especially the PCs used by the target developer audience.
It’s likely that Microsoft will move to supporting both GPU and NPU inference as more PCs add inferencing accelerators and once the Windows ML APIs are finished. Windows ML’s common ONNX APIs for CPU, GPU, and NPU are a logical target for Edge’s APIs, especially if Microsoft prepares its models for all the target environments, including Arm, Intel, and AMD NPUs.
Windows ML provides tools for Edge’s developers to first test for appropriate inferencing hardware and then download optimized models. As this process can be automated, it seems ideal for web-based AI applications where their developers have no visibility into the underlying hardware.