AI Capability Gap Analysis

What AI actually can/can't do. Documented, not theorized.

The problem

AI capability discussions are dominated by hype and fear. What's missing: systematic documentation of what models actually do well vs. where they fail, based on real usage rather than benchmarks or speculation.

Data source

353K conversation messages (2023-2025) across:

All conversations preserved with metadata. Searchable via semantic embeddings.

Findings: What works

Code generation

Research

Writing

Findings: What fails

Code generation

Research

Reasoning

Patterns observed

Capability is context-dependent

Same task succeeds or fails based on: how it's framed, what context is provided, how similar to training data, whether verification is possible.

Confidence doesn't correlate with accuracy

Model expresses same confidence on correct and incorrect outputs. User must independently verify; model self-assessment is unreliable.

Iteration beats single-shot

Multi-turn refinement outperforms trying to get perfect output first attempt. Correction feedback improves subsequent outputs.

Open questions

Status

active Data collected. Initial patterns documented. Structured analysis in progress.