Last week I published an essay “AI Agents, Robots, and a world built for humans”, and argued that the winning products are the ones that are built for humans first. My friend Pantelis Vratsalis replied with this very interesting question on LinkedIn:
I decided to dedicate this week’s essay to answering this question, so let’s get started. As always with these kinds of questions, it is important to take a couple of steps back and reason about it from a more holistic perspective.
1. UI is a form of communication
A UI is just a language between a human and a machine. It is how you tell a computer what you want and how it shows you what it did. The job is simple. Minimize ambiguity. Minimize effort. Make outcomes obvious.
This matters because different “modes of communication” (reading, speaking, pointing, seeing) have very different bandwidths and error profiles. If you choose the wrong channel for the job, you may be compromising efficiency or speed.
2. Human bandwidth: reading, writing, speaking… and seeing
Here’s the quick reality check on our I/O limits:
Reading (silent): Average adult rates cluster around ~238 words/min for nonfiction (faster for fiction). (source)
Listening/Speaking: Comfortable speech/listening is around ~140–160 wpm in everyday use. Presentations and audiobooks target that range. Conversational can go higher but comprehension can drop. (source)
Typing: Keyboard input for typical users is often ~20–40 wpm (power users go much higher). On phones, a massive 37k-person study found ~36 wpm average. The conclusion: voice can be ~3× faster than typing when conditions are ideal. (source)
Seeing (vision): This is where humans are exceptional. The retina pushes ~10 million bits/second to the brain, and we can extract the gist of an image in ~13 ms. Vision is parallel and insanely high-throughput. (source)
Language information limit: Across languages, human speech carries ~39 bits/second of information on average. That’s a universal bottleneck for verbal channels. (source)
Translation to UX: If you want to show lots of options, comparisons, and state at once, a screen is far superior to voice. If you want to express a short intent or capture a thought hands-free, voice can beat a keyboard.
3. Two levers to make communication efficient
You can improve the “dialog” in two ways:
A) Increase channel bandwidth.
Replace typing with voice, complement voice with visuals, add structure to chat (chips, forms), or use richer layouts for scanning, etc.
B) Reduce the information that needs to be said at all.
Let the system do more by default:
Built-in business logic & automation: Move the repetitive and predictable work behind the scenes and surface only decisions that truly need the human.
Great defaults: Users rarely change them, so pick them carefully to remove decisions and clicks.
One-click actions: One click can carry identity, context, preferences, and history, so the system can execute the entire workflow with no extra fields, steps, or questions.
4. Point-and-click isn’t just due to lack of alternatives
I find it strange when people say that conversational interfaces (voice/chat) will make traditional UIs obsolete, as if the only reason we needed point-and-click interfaces was the absence of a good voice/chat alternative. This is simply not true. There are many reasons why we have needed, and will continue to need point-and-click interfaces:
Faster for single actions: Clicking Mute is quicker than saying a sentence and waiting for recognition.
Precision: point-and-click interfaces allow a level of precision that is much harder to achieve with voice or natural language.
Language is non-deterministic: Natural language is ambiguous. Buttons, menus, and forms constrain inputs and make outcomes predictable. This is why we invented programming languages and protocols to to remove ambiguity from language and achieve universal agreement and reproducibility.
5. Humans are visual creatures
We choose clothes, furniture, and food by looking at them. We don’t book a hotel room without seeing photos. We browse destinations on Instagram and travel sites before we commit. Even with a perfect voice interface, people will ask to see options before they buy or book.
6. What this means for UIs in the world of AI
Human to Human
Collaboration is visual. When people plan a trip, review a funnel, or choose a design, they point at maps, charts, and screens. These types of interfaces are here to stay, and while AI will be able to produce many of these UIs on the fly, there is still an argument to be made for the importance of familiarity and pattern recognition in building great user experiences. I won’t even get into the dimension of human taste, because the topic of good taste in the world of AI probably needs an essay of its own.
Human to Agent
State intent in plain language. I believe that a combination of “rich” conversational interfaces and really easy-to-use UIs are going to win. First and foremost, your focus should be on “making the machine do more”. This means defaults, automation, one-click actions, etc. Second, find the parts of the experience where switching to voice or chat increases the information bandwidth and reduces ambiguity instead of increasing it.
Agent to Agent
A few years ago at Google IO, Google Assistant made a phone call to book an appointment. One person on Twitter, who I can’t find anymore, said soon enough we’ll have an AI assistant on the other end answering the phone and taking the appointment. But if that became true, and we ended up with two computers on the two ends trying to communicate with each other, is human language the most efficient way for these computers to communicate? At a speed of ~39 bits/second? Of course not. Agent to Agent communication will look more like API calls or digital signal transmissions than natural language, and I believe something like the MCP Protocol was the first step along that path.
Closing thought
Traditional UIs are not going away. They will sit alongside language interfaces and automation. Use language to capture intent quickly. Use screens to compare, verify, and fine tune. Let agents do the heavy lifting behind a clear, visual surface.
Build for humans first. Then let AI remove steps the human should not have to take.
I’d love to hear from you, so if you have any thoughts you would like to share, please do so in the comments, or find me on Twitter.