AI-agent PRs need human receipts

AI-assisted pull requests are trustworthy when context, verification, state coverage, migration notes, and reviewer focus are visible.

AI-assisted pull requests need human receipts.

I do not mean a confession that AI was used. I mean evidence that a human understood the change, shaped the work, checked the risk, and can explain why the diff is safe. A PR can be generated quickly and still be reviewed seriously. The problem is not AI involvement. The problem is when the PR arrives with no receipts: no context, no risk map, no route checks, no state coverage, no migration note, no evidence that the author knows what changed.

For engineering teams, that is the difference between acceleration and outsourcing judgment.

The more AI becomes part of everyday development, the more important the receipts become. Speed is not enough. The PR needs to show that the work fits the product, respects the codebase, protects users, and leaves the next developer with enough context to trust the change.

ContextWhat was asked?

Problem, scope, constraints, files touched, and assumptions the agent worked under.

EvidenceWhat was checked?

Tests, browser routes, state matrix, migration output, screenshots, logs, and generated assets.

JudgmentWhat remains?

Tradeoffs, risks, follow-ups, rejected paths, and what the reviewer should inspect.

Figure 1: AI-assisted PRs become trustworthy when context, evidence, and judgment are visible.

The PR should not hide the thinking

A weak AI-assisted PR says "implemented feature" and leaves the reviewer to reconstruct the entire process from the diff.

A strong PR says what changed, why the approach fits, how the author verified it, what the risky areas are, and what they intentionally left out. That does not need to be long, but it needs to be specific.

For example:

Added five journal posts with fallback content, Supabase migration, and generated OG cards.
Kept article bodies in the existing longform module pattern.
Verified every post clears the word and visual threshold.
Build falls back to bundled content with placeholder Supabase env vars; that warning is expected.
Review migration dollar quoting and generated routes.

That description gives the reviewer a map. It also shows the author understands the system beyond the generated text.

The PR body is not paperwork. It is the bridge between machine speed and team trust.

Receipts start before implementation

The best AI PR receipt is created before the diff.

Before asking an agent to change the code, I want the task framed clearly:

goal
constraints
files likely to change
rules not to violate
quality bar
verification commands
known risky areas
expected artifacts

That context becomes part of the receipt because it explains why the final shape exists. If the task says "use the existing journal longform pattern and Supabase migration flow," the reviewer can check whether the diff respected that. If the task says "do not add a downloadable resource unless the file exists," the reviewer knows what boundary mattered.

This is also a good guardrail for the agent. Vague requests produce vague changes. Specific context creates a narrower, more reviewable diff.

Figure 2: A useful AI PR receipt connects the original task to the diff and verification evidence.

The diff should show local taste

One risk with generated PRs is that they look like they came from a generic codebase.

The code may compile, but the shape ignores local patterns. It creates a helper where the repo already has one. It adds a new CSS style where tokens exist. It writes content in a voice that does not match the site. It invents migration filenames. It adds a dependency for something the platform already solves. It creates a route with a naming convention the app does not use.

Human receipts should show that local taste was applied:

followed existing module boundaries
reused existing helpers
matched component and style conventions
kept generated artifacts in expected paths
used established verification scripts
preserved existing metadata patterns
avoided unrelated refactors

This is where a candidate can show judgment. Anyone can ask an agent to produce code. The valuable person can make the output feel native to the product.

Verification needs to match the risk

Not every PR needs the same verification, but every PR needs verification that matches its blast radius.

For a content batch, verification might include word counts, route generation, metadata, OG images, sitemap, SEO assertions, and database migration rows. For a checkout change, it might include failed payment, discount rejection, shipping methods, mobile, analytics, and support copy. For a component API change, it might include state examples, accessibility, consumers, and migration search.

The receipt should explain why the checks are enough. "Ran tests" is weaker than naming the checks. "Browser checked the route at mobile and desktop" is stronger. "Verified all five OG images are 1200x630 PNGs" is stronger. "Supabase migration list includes the new local migration" is stronger.

The stronger the PR evidence, the less the reviewer has to guess.

Screenshots are evidence, not decoration

Screenshots are useful when they answer a question.

They are not useful when they only show that the screen exists. A screenshot should prove a state, a layout, a failure mode, a responsive behavior, or a visual asset. For AI-assisted UI work, screenshots should include the places where generated code often breaks:

mobile text wrapping
long content
empty state
loading state
error state
keyboard focus
modal containment
image rendering
dark mode if the product supports it

For content work, the equivalent might be a rendered article section with visual assets, a page source snippet showing meta tags, or a file check proving OG image dimensions.

The point is to make the review faster. A reviewer should not have to pull the branch to confirm basic output when a screenshot or command result can answer the question.

State matrices make AI work safer

AI-generated UI often overweights the happy path. A state matrix is the receipt that corrects that bias.

For a feature, the matrix might list:

default
loading
empty
no results
validation error
network error
permission denied
optimistic pending
success
rollback
disabled with reason
long content

HappyIdeal state

The first generated pass usually covers this, but it is rarely enough.

MessyReal data states

Empty, stale, partial, long, failed, delayed, and permission-limited cases.

RecoveryUser path forward

Retry, edit, undo, contact support, change filters, or safely leave.

Figure 3: A state matrix turns a plausible AI draft into a reviewable product surface.

The matrix does not need to be overbuilt. It needs to cover the states that can hurt users or support. For a billing page, failed payment and pending invoice matter. For a dashboard, stale data and missing data matter. For a command palette, empty results and keyboard escape matter.

The receipt is the proof that the author did not stop at the attractive path.

Migrations deserve special receipts

Any PR touching Supabase content, schema, or data should make the database story explicit.

For content upserts, the receipt should show:

migration was created with Supabase CLI
row count matches expected content count
conflict behavior is safe
metadata is preserved or intentionally changed
generated content matches fallback content
no schema change is hidden in the data migration
local migration list sees the new file

For schema work, the bar is higher: advisors, RLS, grants, policies, rollback thinking, and test queries. But even content migrations deserve clarity because production will not update only because local fallback content changed.

This is a lesson from the site itself. The journal can build locally from fallback content, but the live site depends on Supabase rows after deployment. If the PR adds articles without a migration, local success can become production confusion.

The reviewer should know where to look

A good PR does not only say what passed. It tells the reviewer where to focus.

For example:

Review the migration dollar quoting because bodies include HTML figures.
Review the new article metadata because related resources affect UI cards.
Review mobile layout because the new CTA can crowd the header.
Review the state model because the change touches optimistic rollback.
Review the copy because this changes a customer-facing payment promise.

This is especially important with AI-assisted work because the diff can be large. Generated or semi-generated content can create a lot of surface area. A focused review note respects the reviewer and reduces the chance that the important risk gets buried.

The author should not pretend the PR has no risk. The author should name the risk and show what they did about it.

Receipts should include rejected paths

Sometimes the most useful evidence is what I did not do.

I might say:

Did not add a new resource card because no downloadable file exists yet.
Did not refactor shared layout while adding article content.
Did not apply the migration to production from this branch.
Did not change schema because metadata already supports related resources.
Did not add a dependency because Sharp already generates OG cards.
Did not rewrite the component because a wrapper solved the contained use case.

Rejected paths show judgment. They prevent reviewers from asking the same question, and they make the scope feel deliberate.

This is also where AI work benefits from human ownership. The model may suggest broad improvements. The author decides what belongs in the PR.

The PR body template I like

For AI-assisted work, I like a PR body with five blocks:

Summary.
Approach.
Verification.
Review focus.
Follow-ups.

The summary says what changed. The approach says why this shape fits the codebase. The verification lists exact checks. The review focus names risks. The follow-ups prevent unfinished ideas from hiding in the diff.

This template is not only for AI work, but AI makes it more valuable because the implementation can move faster than the team's shared understanding. The PR body slows down just enough to restore that understanding.

Comments should be sparse and useful

AI-generated PRs sometimes contain too many code comments. The comments explain obvious assignments, narrate functions, or repeat names. That is not a receipt. It is noise.

Useful comments explain decisions a future reader would not infer quickly:

why a migration uses a specific conflict target
why a fallback exists
why a browser behavior needs a workaround
why an accessibility pattern uses a specific element
why a temporary compatibility layer remains

The PR body can carry the process. Code comments should carry durable context that belongs next to the code.

AI receipts are also portfolio proof

This topic belongs on my site because companies are trying to understand what AI-assisted engineering looks like when it is done carefully.

A portfolio can show:

prompt context
diff scope
verification checklist
screenshots
migration note
reviewer focus
follow-up list

That set of artifacts proves more than "I use AI." It proves I can use AI inside a professional workflow. It shows that I do not confuse output with done. It shows I can review, test, scope, and explain.

For an engineering role, that is a stronger signal than speed alone.

SpeedDraft faster

Use agents to inspect, scaffold, and propose changes without hiding uncertainty.

CareReview harder

Use state, browser, migration, and product checks to turn output into evidence.

TrustExplain better

Leave reviewers with context, risks, proof, and follow-up instead of a mysterious diff.

Figure 4: The useful AI workflow is speed plus care plus trust, not speed alone.

CI is necessary but not enough

Passing CI is a receipt, but it is not the whole receipt.

CI can tell the team that tests passed, the build completed, formatting is stable, and maybe accessibility checks or type checks ran. That matters. But many product failures live outside automated checks: misleading copy, missing mobile menu, bad empty state, wrong business assumption, stale data label, broken support path, or a generated visual that looks generic.

The PR should distinguish between automated proof and human proof.

Automated proof might include:

build
unit tests
lint
type check
SEO assertion
image generation
migration list
route generation

Human proof might include:

browser inspection
mobile layout review
copy review
state matrix review
support promise review
comparison against existing design language
risk assessment

When a PR says only "CI passed," it hides which parts of the work still needed human judgment. AI-assisted work especially needs that distinction because the generated output can satisfy machines while still missing product taste.

Evidence quality matters

Not every verification line has equal value.

"Ran build" is useful. "Build generated 70 journal routes and 103 OG cards" is more useful. "Checked article page" is useful. "Checked five new article routes for BlogPosting schema, related resources, four figures, and per-article OG image" is stronger. "Tested checkout" is useful. "Tested failed payment, discount rejection, shipping estimate, and confirmation email copy" is stronger.

The receipt should be as specific as the risk.

This does not mean the PR body has to become a log dump. It means the author should include the evidence that helps a reviewer trust the change. If a command produced a meaningful count, include the count. If a browser check targeted a risky viewport, name the viewport. If a migration changed rows, name the expected row count.

Specific evidence also helps after merge. If production behaves differently, the team can compare the live result to the verified local result. Vague verification gives no baseline.

Live-site checks close the loop

For content and static sites, merging is not the end. The production deploy has to show the content.

An AI-assisted content PR can pass locally and still fail to appear live if:

Supabase rows were not inserted
the deploy used stale environment variables
the build fetched remote content instead of fallback content
the migration was not applied
cached pages were not invalidated
the sitemap was generated before content existed
a route was generated locally but excluded remotely

That is why live-site checks are a separate receipt. After deploy, I want to check the public URLs, not only the local build.

For journal work, the live receipt might be:

article appears in journal index
article route returns 200
article body is the expanded version
related resources render
OG image URL returns 1200x630 PNG
page source contains BlogPosting schema
sitemap includes the slug

This matters because the whole point of the work is public credibility. If the post exists only in a branch or only in local fallback content, it is not doing the job.

The migration receipt should preserve content trust

For content migrations, the receipt should prove that the database copy and fallback copy are aligned.

I like generating the migration from the same local source when possible, then checking:

same slug list
same body source
same metadata
same read time
same publish date
expected conflict behavior
no missing resource links

If the migration is hand-written separately from fallback content, drift becomes easy. A paragraph changes locally but not in SQL. A related resource gets added to the metadata but the migration has the old JSON. The route works in local preview, but production after migration shows a stale version.

That is a bad receipt because the evidence does not match the deployed system.

The reviewer should not have to trust that the author remembered to keep two copies in sync. The PR should make the sync obvious.

AI should make review narrower, not wider

A common failure mode is using AI to generate a large diff that makes review wider.

The agent touches formatting, unrelated helpers, styles, tests, metadata, copy, and structure in one pass. Some of the changes are useful. Some are incidental. The reviewer cannot easily tell which is which. The PR feels fast for the author and slow for the team.

The better AI workflow narrows review:

ask the agent to inspect first
name the files likely to change
keep generated work in expected modules
avoid unrelated cleanup
separate generated assets from logic when possible
commit cohesive changes
tell the reviewer where to focus

This is a discipline issue, not a tooling issue. AI can be used to create focused work or broad churn. The author chooses.

For candidate proof, focused AI PRs are more impressive than giant AI PRs. They show that I can use leverage without losing respect for review.

The receipt for generated content

Generated or AI-assisted content needs a different kind of receipt.

For articles, I want to show:

topic fits the site's positioning
voice matches existing work
word count clears the target
visuals are authored artifacts, not stock filler
related resources exist
metadata is role-forward
OG card is generated
route builds
SEO schema passes
database migration exists

The article should also have enough specificity to avoid feeling like generic AI writing. That means concrete product situations, real tradeoffs, local patterns, and a point of view. A long article can still feel thin if it only repeats general advice.

The receipt cannot prove taste completely, but it can prove the mechanical and structural parts. Then the reviewer can spend more attention on whether the writing sounds like the person.

The receipt for generated code

Generated code receipts should focus on assumptions and behavior.

I want to know:

What assumptions did the agent make?
Which assumptions were replaced with codebase truth?
Which existing helpers or components were reused?
Which states were added beyond the happy path?
Which tests prove behavior?
Which browser interactions were checked?
Which edge cases remain?

For generated code, "looks right" is not enough. The receipt should show that the author interrogated the output.

This is especially important for accessibility. AI-generated components often include plausible ARIA but weak behavior. The receipt should name keyboard path, focus return, labels, error association, or whatever interaction matters for the component.

The receipt for design changes

AI can generate polished UI fast. That makes design receipts important.

The PR should show:

which existing design language it follows
which new visual choice is intentional
how mobile behaves
how text fits
how dark mode behaves if relevant
which state visuals were checked
whether any new pattern should enter the design system

If the change introduces a new card style, button density, visual tone, or layout pattern, the receipt should explain why. Otherwise the product accumulates generated taste fragments.

For my own site, this matters because the design should feel authored. The visuals should look like thinking artifacts: matrices, state maps, annotated frames, decision tables. If a post gets a generic illustration, the page may technically have a visual while still feeling less credible.

The author still owns the result

The most important receipt is ownership.

If a reviewer asks why a file changed, the author should be able to answer. If a migration fails, the author should understand the rollback. If a state is missing, the author should know whether it was intentionally out of scope. If the generated output uses a local pattern incorrectly, the author should fix it, not blame the model.

AI can draft. AI can inspect. AI can suggest tests. AI can generate assets. But the author owns the PR.

This is the standard I want to model. The tool can be visible in the workflow, but responsibility stays human.

Receipts after review comments

The receipt does not stop when the first PR description is written.

If review comments arrive, the follow-up should keep the same standard. The author should say what changed in response, which checks were rerun, and whether the risk moved. A small reply like "fixed" is sometimes enough for a typo. It is not enough for a behavior change.

For example:

Updated the drawer focus return based on review and reran the mobile keyboard path.
Changed the metadata resource list and reran the related resource existence check.
Split the migration body generation to preserve dollar quoting and reran local migration list.
Kept the proposed component API out of this PR and added a follow-up because the current change only needs a local wrapper.

This follow-up evidence helps reviewers close the loop. It also makes the PR history useful later. If someone reads the review six months from now, they can understand how the risk was resolved.

AI-assisted work benefits from this because review comments often reveal where the generated output was too generic. The fix should not be another blind generation pass. It should be a smaller, more deliberate change with its own receipt.

What failure teaches

When an AI-assisted PR breaks something, the response should not be only "AI made a mistake." That is too easy and not useful.

The better question is: what receipt was missing?

Maybe the prompt lacked a boundary. Maybe the state matrix ignored a failure mode. Maybe the reviewer focus did not name the risky file. Maybe the browser check covered desktop but not mobile. Maybe the migration was not compared to fallback content. Maybe the author trusted a generated helper without reading the existing one.

Turning the failure into a receipt improvement makes the workflow better. Add the missing check. Update the agent context. Improve the PR template. Add the route to the browser checklist. Tighten the migration generator.

That is how AI-assisted development gets safer over time. Not by pretending failures will disappear, but by making each failure update the operating system.

The public signal is discipline. A hiring team should see that I can use faster tools while keeping the slow parts that matter: reading, checking, explaining, and owning the result after review.

That balance is the whole point of the workflow in real teams shipping real products under review pressure together.

The hiring signal

AI-assisted PRs are becoming common. Careful AI-assisted PRs are still a signal.

The signal is not that I can make an agent produce a large diff. The signal is that I can turn that diff into a product-quality change: scoped, verified, explainable, and reviewable.

That is the standard I want my own work to show. If a hiring manager reads my PRs or my case studies, they should see that I can move quickly without asking the team to accept mystery. I can bring AI into the workflow and still leave human receipts.

AI-agent PRs need human receipts

The PR should not hide the thinking

Receipts start before implementation

The diff should show local taste

Verification needs to match the risk

Screenshots are evidence, not decoration

State matrices make AI work safer

Migrations deserve special receipts

The reviewer should know where to look

Receipts should include rejected paths

The PR body template I like

Comments should be sparse and useful

AI receipts are also portfolio proof

CI is necessary but not enough

Evidence quality matters

Live-site checks close the loop

The migration receipt should preserve content trust

AI should make review narrower, not wider

The receipt for generated content

The receipt for generated code

The receipt for design changes

The author still owns the result

Receipts after review comments

What failure teaches

The hiring signal

Use this after reading.

UI PR Risk Review Checklist

AI Product Sprint Checklist

Prompt Library for UI Critique

More from the Journal