Most people can identify AI-generated images on sight. The models keep improving, and yet the tell persists. Understanding what is being detected is useful both for evaluating AI output and for prompting more effectively.
The most common identifier is a kind of hyper-coherence — every element of the image rendered with equal sharpness, equal detail, equal attention. Real photographs have a focal plane, lens aberrations, depth of field fall-off, motion blur, atmospheric haze, and a dozen other optical properties that create differentiation across the frame. AI images frequently produce uniform rendering that no camera or lens combination would actually produce. Everything is in equal focus. Everything is equally textured. It reads as synthetic because synthetic is exactly what it is.
Texture is the second tell. Skin, fabric, bark, stone — AI renders these with a consistency and smoothness that looks more like a texture map than a physical surface viewed in real light. The variation that makes surfaces look real — the irregular specular highlights on skin, the way cloth bunches and stretches unevenly, the non-repeating variation in a stone face — is statistically approximated rather than physically derived.
Backgrounds are where the generation falls apart most visibly for trained eyes. Architectural elements that do not resolve into coherent three-dimensional space. Text that blurs into plausible-looking letterforms without spelling anything. Crowd scenes where faces in the distance devolve into face-shaped noise. These are not random failures — they reflect the model’s training on 2D images without a coherent 3D world model.
The practical prompt interventions that help: specify optical characteristics explicitly (lens length, aperture, focus point), ask for imperfection (dust, motion blur, lens flare), and reference specific photographic contexts rather than generic descriptors. The model will pattern-match to real photography more accurately when the prompt speaks the language of real photography.