Harmondale
Definition

Inference

Inference is the moment an AI model applies what it has learned to new input and produces an output. In production, inference is where latency, cost, safety checks, retrieval, logging, and user experience come together.

Last updated: 25 June 2026

Why it matters

It is where AI stops being a demo and becomes an operating cost and reliability problem.

Signals to watch

  • A live request reaches the model
  • The model returns an output
  • Latency and cost are measured

Related definitions