Entity-level code review for Git. Graph-based risk scoring identifies which functions need careful review. No LLM, no API key. 83.5% recall on the Greptile benchmark, beating every LLM tool. Single commit in 5-67ms.
$ inspect diff HEAD~1 inspect 12 entities changed 1 critical, 4 high, 3 medium, 4 low groups 3 logical groups: [0] src/merge/ (5 entities) [1] src/driver/ (4 entities) [2] validate (3 entities) entities (by risk): ~ CRITICAL function merge_entities (src/merge/core.rs) classification: functional score: 0.82 blast: 171 deps: 3/12 public API >>> 12 dependents may be affected - HIGH function old_validate (src/validate.rs) classification: functional score: 0.65 blast: 8 deps: 0/3 public API + MEDIUM function parse_config (src/config.rs) classification: functional score: 0.32 blast: 0 deps: 2/0 ~ LOW function format_output (src/display.rs) classification: text score: 0.02 blast: 0 deps: 0/0 cosmetic only (no structural change)
git diff says 12 files changed. But which changes actually matter? A renamed variable, a reformatted function, and a deleted public API method all look the same in a line-level diff.
This gets worse with AI-generated code. DORA 2025 found that AI adoption led to +154% PR size, +91% review time, and +9% more bugs shipped. Reviewers are drowning in noise. inspect works at the entity level: functions, structs, traits, classes. It uses the dependency graph to identify which changes have real impact.
Four phases. No LLM, no network calls, all local.
tree-sitter extracts entities from all tracked source files. Builds a full-repo dependency graph via call/reference analysis.
Compare before/after. Classify each change as text (comments), syntax (signatures), functional (logic), or a combination.
Graph-centric scoring. Dependents and blast radius are the primary signals. Public API, classification, and change type set the baseline.
Union-Find on dependency edges between changed entities. Separates independent logical changes within tangled commits.
Evaluated on AACR-Bench. 158 PRs, 50 repos, 10 languages, 1,169 ground truth issues from human reviewers.
83.5% High/Critical recall on the Greptile benchmark (50 PRs, 5 repos, 97 golden comments), beating every LLM-based tool at zero cost. 100% recall at the Medium threshold. Full benchmark results →
Three tools, same foundation: sem-core's entity extraction and structural hashing.