Reviewing AI-built product screens

The useful review is not whether the screen looks impressive. It is whether the product job, states, data, and code can survive real use.

AI can produce a screen that looks finished long before it is ready to become product. That is the trap. The surface has spacing, labels, a chart, maybe a clean table and a confident button. It looks like work has moved forward. Sometimes it has. Sometimes the model only created a convincing picture of a product decision nobody has made yet.

When I review AI-built screens, I try not to start with taste. Taste is the easy part to argue about and the easiest part for a model to imitate. I start with the product job. What is the user trying to understand or do here? What did the system decide on their behalf? Which parts are real behavior and which parts are decorative guesses?

The review is not a test of whether AI is good. It is a test of whether the draft can survive contact with real product constraints.

IntentWhat job is this screen doing?

Name the decision or action before tuning the pixels.

StatesWhat happens off the happy path?

Loading, empty, error, permission, long content, and stale data.

CodeCan this be maintained?

Semantic controls, real data contracts, smaller diff, fewer invented features.

Figure 1: I review AI-built UI through three lenses: intent, states, and code. Visual polish comes after those hold.

Start by deleting the pretend product

AI-generated screens often invent product behavior. Export buttons appear because dashboards usually have export buttons. Filters appear because filters look useful. Status pills appear because a table feels more convincing with color. Settings appear because a settings page has room for switches.

Some of those invented things may be good ideas. Most are scope leaks.

I make a pass where I mark every control as real, placeholder, or invented. Real means we know the behavior and can build it. Placeholder means it is part of the direction but not ready. Invented means it slipped in because the pattern looked plausible.

That pass is uncomfortable because it removes the things that make the draft look complete. It is also where the screen starts becoming honest.

For example, if a customer health dashboard shows "Export CSV," "Auto-prioritize," and "Share report," I ask whether those workflows exist. If not, they come out. A screen with fewer real controls is stronger than a screen full of fake promises.

Check the state map before the layout

The model usually gives me the ready state. The product needs more than ready.

For every meaningful surface, I want to know:

What shows while data loads?
What shows when there is no data yet?
What shows when a filter removes every result?
What shows when the user lacks permission?
What happens when the action fails?
What happens when text gets long?
What happens on a narrow phone?
What data can be stale or partial?

I do not need every state designed with the same fidelity on the first pass. But I need to know the screen has a place for those states. If there is no place, the layout is probably lying.

This is especially true for AI-generated dashboard and admin work. A happy table with six perfect rows proves almost nothing. Give it a long customer name, a missing owner, a zero value, a delayed sync, and an error row. Then the useful problems appear.

Replace polished dummy data with stressful data

Dummy data is not harmless. It teaches the layout what to expect. If all mock names are short and all numbers are clean, the interface gets good at looking good in a demo.

I like to use data that feels annoying:

customer names with legal suffixes
names in two languages
0 values beside large values
dates across time zones
missing avatars
long plan names
partial sync states
odd currency formatting
repeated names

The goal is not to make the screen ugly. The goal is to make the screen tell the truth early.

AI output often uses data that feels like a startup landing page. Real product data feels messier. The faster the draft meets messy data, the faster I can judge whether the design is real.

Review semantics like a product feature

The visual layer can hide weak implementation. A model may produce a button-looking div, a menu without keyboard behavior, an input without a label, or a modal that does not manage focus. Those are not small technical details. They change who can use the product and how reliable the interface feels.

Before I accept the direction, I check the basics:

Buttons are buttons.
Links are links.
Inputs have labels.
Error messages are connected to fields.
Focus is visible.
Dialogs trap and restore focus.
Menus can be used from the keyboard.
Loading states preserve layout.

I do this early because it changes the design. Once you care about keyboard flow, focus order, and error recovery, the surface gets more honest. The design has to support behavior, not only composition.

Ask whether the screen got smaller

My favorite sign of a good review is that the screen gets smaller. Fewer controls. Fewer cards. Fewer invented states. Fewer explanations. Fewer dependencies.

AI drafts often start wide because the model tries to be helpful. Product work usually improves when the team narrows. What is the one useful loop? What does the user need to do next? Which data deserves space? Which control can wait?

When the review is working, the screen becomes less impressive and more useful.

Keep the parts that are genuinely helpful

I do not want to turn every AI draft into a scolding exercise. Sometimes the model finds a layout rhythm I would keep. Sometimes it names a state I forgot. Sometimes it offers a small grouping that makes the flow easier to scan.

The point is not to reject generated work. The point is to make the review serious enough that the generated parts have to earn their place.

My pass before build

Before an AI-built screen moves into implementation, I want a short note that answers:

What is the product job?
Which controls are real?
Which states are covered?
Which data assumptions are being made?
Which accessibility behaviors are required?
Which parts were removed from the generated draft?
What is the smallest version worth shipping?

That note is not bureaucracy. It is how I keep the screen from becoming a polished guess.

AI can help me get to a draft faster. It cannot tell me whether the draft belongs in the product. That judgment still has to come from the person shipping the work.

How I review AI-built screens before they become product