AI-agent PRs need human receipts
AI-assisted pull requests are trustworthy when context, verification, state coverage, migration notes, and reviewer focus are visible.
AI-assisted pull requests need human receipts.
I do not mean a confession that AI was used. I mean evidence that a human understood the change, shaped the work, checked the risk, and can explain why the diff is safe. A PR can be generated quickly and still be reviewed seriously. The problem is not AI involvement. The problem is when the PR arrives with no receipts: no context, no risk map, no route checks, no state coverage, no migration note, no evidence that the author knows what changed.
For engineering teams, that is the difference between acceleration and outsourcing judgment.
The more AI becomes part of everyday development, the more important the receipts become. Speed is not enough. The PR needs to show that the work fits the product, respects the codebase, protects users, and leaves the next developer with enough context to trust the change.
Problem, scope, constraints, files touched, and assumptions the agent worked under.
Tests, browser routes, state matrix, migration output, screenshots, logs, and generated assets.
Tradeoffs, risks, follow-ups, rejected paths, and what the reviewer should inspect.
The PR should not hide the thinking
A weak AI-assisted PR says "implemented feature" and leaves the reviewer to reconstruct the entire process from the diff.
A strong PR says what changed, why the approach fits, how the author verified it, what the risky areas are, and what they intentionally left out. That does not need to be long, but it needs to be specific.
For example:
- Added five journal posts with fallback content, Supabase migration, and generated OG cards.
- Kept article bodies in the existing longform module pattern.
- Verified every post clears the word and visual threshold.
- Build falls back to bundled content with placeholder Supabase env vars; that warning is expected.
- Review migration dollar quoting and generated routes.
That description gives the reviewer a map. It also shows the author understands the system beyond the generated text.
The PR body is not paperwork. It is the bridge between machine speed and team trust.
Receipts start before implementation
The best AI PR receipt is created before the diff.
Before asking an agent to change the code, I want the task framed clearly:
- goal
- constraints
- files likely to change
- rules not to violate
- quality bar
- verification commands
- known risky areas
- expected artifacts
That context becomes part of the receipt because it explains why the final shape exists. If the task says "use the existing journal longform pattern and Supabase migration flow," the reviewer can check whether the diff respected that. If the task says "do not add a downloadable resource unless the file exists," the reviewer knows what boundary mattered.
This is also a good guardrail for the agent. Vague requests produce vague changes. Specific context creates a narrower, more reviewable diff.
The diff should show local taste
One risk with generated PRs is that they look like they came from a generic codebase.
The code may compile, but the shape ignores local patterns. It creates a helper where the repo already has one. It adds a new CSS style where tokens exist. It writes content in a voice that does not match the site. It invents migration filenames. It adds a dependency for something the platform already solves. It creates a route with a naming convention the app does not use.
Human receipts should show that local taste was applied:
- followed existing module boundaries
- reused existing helpers
- matched component and style conventions
- kept generated artifacts in expected paths
- used established verification scripts
- preserved existing metadata patterns
- avoided unrelated refactors
This is where a candidate can show judgment. Anyone can ask an agent to produce code. The valuable person can make the output feel native to the product.
Verification needs to match the risk
Not every PR needs the same verification, but every PR needs verification that matches its blast radius.
For a content batch, verification might include word counts, route generation, metadata, OG images, sitemap, SEO assertions, and database migration rows. For a checkout change, it might include failed payment, discount rejection, shipping methods, mobile, analytics, and support copy. For a component API change, it might include state examples, accessibility, consumers, and migration search.
The receipt should explain why the checks are enough. "Ran tests" is weaker than naming the checks. "Browser checked the route at mobile and desktop" is stronger. "Verified all five OG images are 1200x630 PNGs" is stronger. "Supabase migration list includes the new local migration" is stronger.
The stronger the PR evidence, the less the reviewer has to guess.
Screenshots are evidence, not decoration
Screenshots are useful when they answer a question.
They are not useful when they only show that the screen exists. A screenshot should prove a state, a layout, a failure mode, a responsive behavior, or a visual asset. For AI-assisted UI work, screenshots should include the places where generated code often breaks:
- mobile text wrapping
- long content
- empty state
- loading state
- error state
- keyboard focus
- modal containment
- image rendering
- dark mode if the product supports it
For content work, the equivalent might be a rendered article section with visual assets, a page source snippet showing meta tags, or a file check proving OG image dimensions.
The point is to make the review faster. A reviewer should not have to pull the branch to confirm basic output when a screenshot or command result can answer the question.
State matrices make AI work safer
AI-generated UI often overweights the happy path. A state matrix is the receipt that corrects that bias.
For a feature, the matrix might list:
- default
- loading
- empty
- no results
- validation error
- network error
- permission denied
- optimistic pending
- success
- rollback
- disabled with reason
- long content
The first generated pass usually covers this, but it is rarely enough.
Empty, stale, partial, long, failed, delayed, and permission-limited cases.
Retry, edit, undo, contact support, change filters, or safely leave.
The matrix does not need to be overbuilt. It needs to cover the states that can hurt users or support. For a billing page, failed payment and pending invoice matter. For a dashboard, stale data and missing data matter. For a command palette, empty results and keyboard escape matter.
The receipt is the proof that the author did not stop at the attractive path.
Migrations deserve special receipts
Any PR touching Supabase content, schema, or data should make the database story explicit.
For content upserts, the receipt should show:
- migration was created with Supabase CLI
- row count matches expected content count
- conflict behavior is safe
- metadata is preserved or intentionally changed
- generated content matches fallback content
- no schema change is hidden in the data migration
- local migration list sees the new file
For schema work, the bar is higher: advisors, RLS, grants, policies, rollback thinking, and test queries. But even content migrations deserve clarity because production will not update only because local fallback content changed.
This is a lesson from the site itself. The journal can build locally from fallback content, but the live site depends on Supabase rows after deployment. If the PR adds articles without a migration, local success can become production confusion.
The reviewer should know where to look
A good PR does not only say what passed. It tells the reviewer where to focus.
For example:
- Review the migration dollar quoting because bodies include HTML figures.
- Review the new article metadata because related resources affect UI cards.
- Review mobile layout because the new CTA can crowd the header.
- Review the state model because the change touches optimistic rollback.
- Review the copy because this changes a customer-facing payment promise.
This is especially important with AI-assisted work because the diff can be large. Generated or semi-generated content can create a lot of surface area. A focused review note respects the reviewer and reduces the chance that the important risk gets buried.
The author should not pretend the PR has no risk. The author should name the risk and show what they did about it.
Receipts should include rejected paths
Sometimes the most useful evidence is what I did not do.
I might say:
- Did not add a new resource card because no downloadable file exists yet.
- Did not refactor shared layout while adding article content.
- Did not apply the migration to production from this branch.
- Did not change schema because metadata already supports related resources.
- Did not add a dependency because Sharp already generates OG cards.
- Did not rewrite the component because a wrapper solved the contained use case.
Rejected paths show judgment. They prevent reviewers from asking the same question, and they make the scope feel deliberate.
This is also where AI work benefits from human ownership. The model may suggest broad improvements. The author decides what belongs in the PR.
The PR body template I like
For AI-assisted work, I like a PR body with five blocks:
- Summary.
- Approach.
- Verification.
- Review focus.
- Follow-ups.
The summary says what changed. The approach says why this shape fits the codebase. The verification lists exact checks. The review focus names risks. The follow-ups prevent unfinished ideas from hiding in the diff.
This template is not only for AI work, but AI makes it more valuable because the implementation can move faster than the team's shared understanding. The PR body slows down just enough to restore that understanding.
Comments should be sparse and useful
AI-generated PRs sometimes contain too many code comments. The comments explain obvious assignments, narrate functions, or repeat names. That is not a receipt. It is noise.
Useful comments explain decisions a future reader would not infer quickly:
- why a migration uses a specific conflict target
- why a fallback exists
- why a browser behavior needs a workaround
- why an accessibility pattern uses a specific element
- why a temporary compatibility layer remains
The PR body can carry the process. Code comments should carry durable context that belongs next to the code.
AI receipts are also portfolio proof
This topic belongs on my site because companies are trying to understand what AI-assisted engineering looks like when it is done carefully.
A portfolio can show:
- prompt context
- diff scope
- verification checklist
- screenshots
- migration note
- reviewer focus
- follow-up list
That set of artifacts proves more than "I use AI." It proves I can use AI inside a professional workflow. It shows that I do not confuse output with done. It shows I can review, test, scope, and explain.
For an engineering role, that is a stronger signal than speed alone.
Use agents to inspect, scaffold, and propose changes without hiding uncertainty.
Use state, browser, migration, and product checks to turn output into evidence.
Leave reviewers with context, risks, proof, and follow-up instead of a mysterious diff.
CI is necessary but not enough
Passing CI is a receipt, but it is not the whole receipt.
CI can tell the team that tests passed, the build completed, formatting is stable, and maybe accessibility checks or type checks ran. That matters. But many product failures live outside automated checks: misleading copy, missing mobile menu, bad empty state, wrong business assumption, stale data label, broken support path, or a generated visual that looks generic.
The PR should distinguish between automated proof and human proof.
Automated proof might include:
- build
- unit tests
- lint
- type check
- SEO assertion
- image generation
- migration list
- route generation
Human proof might include:
- browser inspection
- mobile layout review
- copy review
- state matrix review
- support promise review
- comparison against existing design language
- risk assessment
When a PR says only "CI passed," it hides which parts of the work still needed human judgment. AI-assisted work especially needs that distinction because the generated output can satisfy machines while still missing product taste.
Evidence quality matters
Not every verification line has equal value.
"Ran build" is useful. "Build generated 70 journal routes and 103 OG cards" is more useful. "Checked article page" is useful. "Checked five new article routes for BlogPosting schema, related resources, four figures, and per-article OG image" is stronger. "Tested checkout" is useful. "Tested failed payment, discount rejection, shipping estimate, and confirmation email copy" is stronger.
The receipt should be as specific as the risk.
This does not mean the PR body has to become a log dump. It means the author should include the evidence that helps a reviewer trust the change. If a command produced a meaningful count, include the count. If a browser check targeted a risky viewport, name the viewport. If a migration changed rows, name the expected row count.
Specific evidence also helps after merge. If production behaves differently, the team can compare the live result to the verified local result. Vague verification gives no baseline.
Live-site checks close the loop
For content and static sites, merging is not the end. The production deploy has to show the content.
An AI-assisted content PR can pass locally and still fail to appear live if:
- Supabase rows were not inserted
- the deploy used stale environment variables
- the build fetched remote content instead of fallback content
- the migration was not applied
- cached pages were not invalidated
- the sitemap was generated before content existed
- a route was generated locally but excluded remotely
That is why live-site checks are a separate receipt. After deploy, I want to check the public URLs, not only the local build.
For journal work, the live receipt might be:
- article appears in journal index
- article route returns 200
- article body is the expanded version
- related resources render
- OG image URL returns 1200x630 PNG
- page source contains BlogPosting schema
- sitemap includes the slug
This matters because the whole point of the work is public credibility. If the post exists only in a branch or only in local fallback content, it is not doing the job.
The migration receipt should preserve content trust
For content migrations, the receipt should prove that the database copy and fallback copy are aligned.
I like generating the migration from the same local source when possible, then checking:
- same slug list
- same body source
- same metadata
- same read time
- same publish date
- expected conflict behavior
- no missing resource links
If the migration is hand-written separately from fallback content, drift becomes easy. A paragraph changes locally but not in SQL. A related resource gets added to the metadata but the migration has the old JSON. The route works in local preview, but production after migration shows a stale version.
That is a bad receipt because the evidence does not match the deployed system.
The reviewer should not have to trust that the author remembered to keep two copies in sync. The PR should make the sync obvious.
AI should make review narrower, not wider
A common failure mode is using AI to generate a large diff that makes review wider.
The agent touches formatting, unrelated helpers, styles, tests, metadata, copy, and structure in one pass. Some of the changes are useful. Some are incidental. The reviewer cannot easily tell which is which. The PR feels fast for the author and slow for the team.
The better AI workflow narrows review:
- ask the agent to inspect first
- name the files likely to change
- keep generated work in expected modules
- avoid unrelated cleanup
- separate generated assets from logic when possible
- commit cohesive changes
- tell the reviewer where to focus
This is a discipline issue, not a tooling issue. AI can be used to create focused work or broad churn. The author chooses.
For candidate proof, focused AI PRs are more impressive than giant AI PRs. They show that I can use leverage without losing respect for review.
The receipt for generated content
Generated or AI-assisted content needs a different kind of receipt.
For articles, I want to show:
- topic fits the site's positioning
- voice matches existing work
- word count clears the target
- visuals are authored artifacts, not stock filler
- related resources exist
- metadata is role-forward
- OG card is generated
- route builds
- SEO schema passes
- database migration exists
The article should also have enough specificity to avoid feeling like generic AI writing. That means concrete product situations, real tradeoffs, local patterns, and a point of view. A long article can still feel thin if it only repeats general advice.
The receipt cannot prove taste completely, but it can prove the mechanical and structural parts. Then the reviewer can spend more attention on whether the writing sounds like the person.
The receipt for generated code
Generated code receipts should focus on assumptions and behavior.
I want to know:
- What assumptions did the agent make?
- Which assumptions were replaced with codebase truth?
- Which existing helpers or components were reused?
- Which states were added beyond the happy path?
- Which tests prove behavior?
- Which browser interactions were checked?
- Which edge cases remain?
For generated code, "looks right" is not enough. The receipt should show that the author interrogated the output.
This is especially important for accessibility. AI-generated components often include plausible ARIA but weak behavior. The receipt should name keyboard path, focus return, labels, error association, or whatever interaction matters for the component.
The receipt for design changes
AI can generate polished UI fast. That makes design receipts important.
The PR should show:
- which existing design language it follows
- which new visual choice is intentional
- how mobile behaves
- how text fits
- how dark mode behaves if relevant
- which state visuals were checked
- whether any new pattern should enter the design system
If the change introduces a new card style, button density, visual tone, or layout pattern, the receipt should explain why. Otherwise the product accumulates generated taste fragments.
For my own site, this matters because the design should feel authored. The visuals should look like thinking artifacts: matrices, state maps, annotated frames, decision tables. If a post gets a generic illustration, the page may technically have a visual while still feeling less credible.
The author still owns the result
The most important receipt is ownership.
If a reviewer asks why a file changed, the author should be able to answer. If a migration fails, the author should understand the rollback. If a state is missing, the author should know whether it was intentionally out of scope. If the generated output uses a local pattern incorrectly, the author should fix it, not blame the model.
AI can draft. AI can inspect. AI can suggest tests. AI can generate assets. But the author owns the PR.
This is the standard I want to model. The tool can be visible in the workflow, but responsibility stays human.
Receipts after review comments
The receipt does not stop when the first PR description is written.
If review comments arrive, the follow-up should keep the same standard. The author should say what changed in response, which checks were rerun, and whether the risk moved. A small reply like "fixed" is sometimes enough for a typo. It is not enough for a behavior change.
For example:
- Updated the drawer focus return based on review and reran the mobile keyboard path.
- Changed the metadata resource list and reran the related resource existence check.
- Split the migration body generation to preserve dollar quoting and reran local migration list.
- Kept the proposed component API out of this PR and added a follow-up because the current change only needs a local wrapper.
This follow-up evidence helps reviewers close the loop. It also makes the PR history useful later. If someone reads the review six months from now, they can understand how the risk was resolved.
AI-assisted work benefits from this because review comments often reveal where the generated output was too generic. The fix should not be another blind generation pass. It should be a smaller, more deliberate change with its own receipt.
What failure teaches
When an AI-assisted PR breaks something, the response should not be only "AI made a mistake." That is too easy and not useful.
The better question is: what receipt was missing?
Maybe the prompt lacked a boundary. Maybe the state matrix ignored a failure mode. Maybe the reviewer focus did not name the risky file. Maybe the browser check covered desktop but not mobile. Maybe the migration was not compared to fallback content. Maybe the author trusted a generated helper without reading the existing one.
Turning the failure into a receipt improvement makes the workflow better. Add the missing check. Update the agent context. Improve the PR template. Add the route to the browser checklist. Tighten the migration generator.
That is how AI-assisted development gets safer over time. Not by pretending failures will disappear, but by making each failure update the operating system.
The public signal is discipline. A hiring team should see that I can use faster tools while keeping the slow parts that matter: reading, checking, explaining, and owning the result after review.
That balance is the whole point of the workflow in real teams shipping real products under review pressure together.
The hiring signal
AI-assisted PRs are becoming common. Careful AI-assisted PRs are still a signal.
The signal is not that I can make an agent produce a large diff. The signal is that I can turn that diff into a product-quality change: scoped, verified, explainable, and reviewable.
That is the standard I want my own work to show. If a hiring manager reads my PRs or my case studies, they should see that I can move quickly without asking the team to accept mystery. I can bring AI into the workflow and still leave human receipts.
Use this after reading.
Practical downloads and templates that turn the article into something you can bring into a product review, implementation pass, or agent workflow.
UI PR Risk Review Checklist
A merge-readiness checklist for product intent, states, accessibility, visual durability, and UI implementation risk.
AI Product Sprint Checklist
A practical sprint checklist for using AI across discovery, UX, implementation, and verification without skipping product judgment.
Prompt Library for UI Critique
Reusable prompts for pressure-testing layout, copy, hierarchy, accessibility, interaction states, and implementation risk.