A practical QA pass for AI-generated UI

AI can draft screens quickly, but the review still has to catch states, data assumptions, accessibility, and product intent.

AI-generated UI often looks done before it is product-ready. The screen has spacing, color, copy, buttons, maybe even realistic data. That can make the review harder because the work appears complete enough to lower your guard.

I use a separate QA pass for AI-generated UI because the failure modes are consistent.

Check the product job first

Before reviewing pixels, ask what the screen is supposed to help someone do. AI output can be plausible without being purposeful. It may include a beautiful chart that does not answer the operating question, or a CTA that points to the generic next step instead of the user's next step.

If the job is unclear, fix that before tuning the interface.

Hunt for missing states

AI-generated UI tends to show the happy path. The missing states are where the product breaks:

Loading.
Empty.
Error.
Permission denied.
Long text.
No image.
Partial data.
Slow network.
Mobile keyboard.

I ask the model for states, but I still inspect them manually. A generated error message can sound confident while giving the user no recovery path.

Replace decorative data

Mock data should stress the layout. Short names, perfect values, and clean percentages do not prove much.

Use long customer names, zero values, missing fields, international prices, unusual dates, and dense lists. If the layout only works for pretty data, it is a comp.

Review semantics and keyboard behavior

AI can produce div soup with convincing visual styling. Buttons should be buttons. Links should be links. Inputs should have labels. Modals should manage focus. Menus should have keyboard behavior. Custom controls need a reason to exist.

I check keyboard paths early because they reveal whether the UI is built as an interface or as an image.

Remove invented product behavior

Generated UI often invents features: export buttons, filters, statuses, settings, and workflows that sound reasonable but are not in scope.

Every invented affordance becomes product debt if it ships. The review should ask which controls are real, which are placeholders, and which should be deleted.

Verify copy against consequence

AI copy can be too cheerful for serious moments. Billing failures, destructive actions, support escalations, and compliance warnings need plain language.

The tone should match the user's risk. A delightful empty state is fine. A playful payment failure is not.

My QA checklist

Before accepting AI-generated UI, I check:

Does the screen support the actual product job?
Are loading, empty, error, and permission states designed?
Does real data fit?
Are controls semantic and keyboard reachable?
Did the model invent features?
Does copy match consequence?
Is the implementation smaller than the generated surface?

AI can accelerate the draft. It does not remove the need for product judgment. The QA pass is where the draft becomes software.

What AI UI gets wrong most often

AI-generated UI is usually weakest in the places that do not show up in a static screenshot. It can create a dashboard that looks credible, but the filters may not map to real questions. It can draft a settings page, but the destructive action may have no confirmation. It can produce a modal, but the focus behavior is missing. It can write polished empty-state copy that does not match the actual reason the state is empty.

That is why my QA pass is not a pixel pass. It is a product truth pass.

I treat the generated screen as a confident intern: useful, fast, often directionally right, but not allowed to ship without inspection. The review has to separate signal from performance.

RealityWhich controls are real?

Delete invented filters, exports, statuses, and actions.

ResilienceWhich states are missing?

Loading, empty, error, permission, long data, and mobile.

SemanticsCan people operate it?

Buttons, links, labels, focus, keyboard, and assistive tech.

Figure 1: My AI UI QA pass checks product reality, resilient states, and semantic operation before visual polish.

Pass one: remove hallucinated scope

The first pass is deletion. I go through every control and ask whether it belongs to the product.

Common hallucinations:

export buttons with no export story
filters that do not exist in the data model
status labels that are not part of the domain
settings toggles nobody requested
charts with no operating question
bulk actions without permission or recovery logic
AI summaries with no source trail

Deleting these can make the screen feel less impressive. That is fine. A smaller honest interface is better than a larger fictional one.

I mark controls as real, candidate, or remove. Real controls have known behavior. Candidate controls may become useful but need a product decision. Remove controls are decorative guesses.

Pass two: replace demo data with rude data

AI screens often use polite data. Names are short. Numbers are tidy. Dates are recent. Every row has an avatar. Every customer has a plan. Every percentage looks plausible.

Real products are rude.

I test with:

very long names
duplicate names
missing owners
zero values
old dates
failed payments
unsupported currencies
long translated strings
partial sync states
permission gaps

This pass quickly shows whether the interface is flexible or staged. It also shows whether the visual hierarchy is doing real work. If the design only holds with perfect data, the design is not ready.

Pass three: inspect state paths

I ask for each major region:

What is loading?
What is empty?
What can fail?
What can be partial?
Who might not have permission?
What happens after the action?
What happens if the user leaves halfway through?

For AI-generated UI, I write these states down even if I do not design them fully in the first pass. The act of naming them exposes product gaps. A generated billing screen might look complete until we ask what happens when payment fails but the subscription is still active for seven days.

Figure 2: Generated UI usually shows the happy path. QA has to force the empty, error, and limited states into view.

Pass four: check the DOM, not the screenshot

A screenshot cannot tell me whether the interface is usable. I need to inspect behavior.

I check:

Are interactive elements semantic?
Does tab order make sense?
Is focus visible?
Do menus and dialogs close predictably?
Are form fields labeled?
Are errors associated with fields?
Is color doing the only communication?
Does reduced motion remove decorative transitions?

This is where AI output can be especially misleading. It can write the visual shape of an interface while missing the operational contract.

Pass five: write a merge note

Before accepting the AI-generated surface, I want a short note:

AI UI QA
- Removed invented behavior:
- States reviewed:
- Real data stress case:
- Accessibility path:
- Remaining risk:

This note is not bureaucracy. It creates a record that the work was reviewed as product, not just accepted because it looked complete.

The standard

I am not against AI-generated UI. I use it. But I hold it to the same standard as any other product work: it has to explain itself, survive real data, respect interaction behavior, and avoid promising features that do not exist.

The model can accelerate the first draft. The QA pass is where taste turns back into responsibility.

Prompts I use after the first draft

I do not ask an AI agent, "Is this good?" That question invites confidence. I ask narrower questions that force critique.

Prompts I use:

List every control in this screen and mark it as real, implied, or invented.
For invented controls, explain what product decision is missing.

Create a state table for this screen: loading, empty, error, permission, partial data, long content, and mobile.
Do not redesign yet. Only identify missing states and questions.

Stress-test this layout with ugly data: long names, zero values, missing avatars, old dates, duplicate records, and translated labels.
Return the layout risks only.

Review this markup for semantic interaction risk.
Focus on buttons, links, labels, focus management, dialogs, menus, and keyboard behavior.

These prompts make the model more useful because they narrow the job. The agent can help find gaps, but I still decide what matters.

I also ask for a deletion pass. That is one of the best uses of AI in review: "Which parts of this screen are not justified by the product brief?" The answer is not always right, but it surfaces controls worth questioning.

The final review is human. I want to know whether the screen matches the product's promise, not whether it resembles a plausible SaaS interface. AI is good at plausibility. Product work needs truth.

My acceptance bar

I accept AI-generated UI only after it has been made boring in the right ways. The controls are real. The data is rude. The states are named. The markup is semantic. The copy matches consequence. The implementation follows the local codebase instead of carrying the model's invented architecture.

That last point matters. AI output often brings its own style of abstraction. It may create generic card systems, animation helpers, local state machines, and utility functions that do not match the repo. Even when the screen looks good, the code can feel foreign. I treat that as a review issue, not a matter of taste.

The interface should also survive deletion. If I remove the decorative chart, the screen should still have a product job. If I remove the invented export button, the flow should still make sense. If the only thing holding the screen together is the visual density of generated UI, the product idea is probably thin.

I want AI to accelerate my first draft, not lower my standards. The QA pass is the boundary between those two outcomes.

The product-owner pass

Before merge, I want the product owner or designer to answer three questions in plain language. What user decision does this screen support? Which generated controls did we remove or confirm? What state still carries the most risk?

Those questions keep the review grounded. If the team cannot answer them, the screen may be visually convincing but product-thin. AI makes it easy to create complete-looking surfaces. The product-owner pass makes sure the surface is still attached to a real workflow, a real data model, and a real consequence.

That is the standard I trust before calling the generated screen product-ready for real users.