Today, most AI agents run every task no matter how simple through massive LLMs like GPT-4 or Claude.
NVIDIA’s researchers say: that’s wasteful, unnecessary, and about to change.
Small Language Models (SLMs) are models that fit on consumer hardware and run with low latency.
They’re fast, cheap, and for most agentic tasks just as effective as their larger counterparts.