AI Agents, Robots, and a world built for humans
Reflections on the best ways to build autonomous products
Dear readers and friends, I am back after a long break from writing, and a long break from working. Some of you might know that we sold Hotjar to Contentsquare in 2021, and I stayed on as CEO of Hotjar, and later as CSO of Contentsquare. What some of you might not know is that I resigned earlier this year to work on building a new company. I have spent the last few months traveling, ideating, prototyping, and that journey inevitably took me very deep into the world of AI. I learned a ton and I have a lot of thoughts on what is happening in the world of AI and how things are going to play out, so this essay is perhaps a first in a series that I will be sharing with you.
In this essay I will try to answer one fundamental question:
How do we build the most effective autonomous agents in a world built for humans?
“Agents” here is used loosely to mean both software agents and hardware agents (robots). Let me start with the immediate thing that comes to people’s minds:
1. Robots
When people think about autonomous agents in a human world, they immediately think of humanoid robots. I’m a software products guy, so I promise I’ll eventually get to software agents in this essay, but trust me: thinking about robots is very helpful for reasoning about software agents.
Many have argued that humanoid robots are the superior form of robotics for the future, because we live in a world built for humans and a humanoid robot would allow the robot to be as “general purpose” as possible, thus justifying the cost. Instead of a thousand specialized robots, you just have one humanoid that can do a thousand tasks.
So are humanoid robots really the most effective form? Well, for most things, they are not. Do we really need general-purpose robots? Also probably not, at least not in the traditional sense. Let me explain both points.
1.1 The human form is inefficient
Look at the photo above. In agriculture, automated machinery looks nothing like human farmers and packs 1,000x the power and productivity. These machines not only replaced thousands of farmers, but they have also enabled innovation and possibilities that were simply unimaginable if we relied solely on human beings, because they broke free from the limitations of the human form.
We don’t see this only in agriculture. You see this across the board in manufacturing, supply chain, etc. So with this in mind, you have to imagine how dumb I think this promo video from Figure Robotics looks:
I mean, compare the idea of a humanoid package-sorting robot to this 3-year-old Amazon sorting system sorting packages in ways a human couldn’t.
So the conclusion I want to draw at this point is that specialized robots are more effective than general-purpose robots when the following three conditions are met:
We gain efficiency by going beyond the limitations of the human form.
The efficiency gains outweigh the cost of specialized robots.
The environment can be reshaped in a way that is less optimum for humans but more optimum for the specialized robot.
While #1 probably holds true for most tasks, one cannot say the same about points 2 and 3. Very good examples of this are tasks around the house like cooking, cleaning, doing the laundry, etc. So let’s dig into these to understand why they are different and what that means for robots.
1.2. General-purpose home robots are still specialized robots
And their speciality is operating within your personal environment. Let’s quickly take a look at how conditions 2 and 3 from the previous section apply here:
“The efficiency gains outweigh the cost of specialized robots.”
The efficiency of any home robot is capped at the needs of that particular home. A family of five would only have as much laundry as a family of five would. Having a highly efficient robot will not meaningfully affect the consumption habits of any household. The difference in convenience between a robot that finishes the task at human or near-human speed versus one that does it at superhuman speed is non-existent.
“The environment can be reshaped in a way that is less optimum for humans but more optimum for the specialized robot.”
The limited square footage in your home is, and should always be, optimized for the comfort of the human beings living in it. It doesn’t make much sense to redesign your home in a way that is less convenient for you but better for the robots, with no significant gain. Even if the gains were there, it is highly unlikely that they would justify such drastic measures.
It’s likely this applies to other areas of life, and perhaps even in some domains of service labor, but the main arguments and conditions probably hold across most of them.
2. Software Agents
2.1 Specialized & general-purpose software agents
Software you will find is a very peculiar thing and software tasks are wildly different from real-life physical tasks. Let’s look at the 3 conditions for specialized agents:
We gain efficiency by going beyond the limitations of the human form: This is inherent in software, because with theoretically unlimited compute, you can run a theoretically infinite number of tasks. In addition to this, computers are many orders of magnitude faster than human beings at most things a computer can do.
The efficiency gains outweigh the cost of specialized robots: Software is really cheap to run, that’s why SaaS was such an incredible business model with ridiculously high gross-margins. Even the GPU-heavy cost of intelligence, or in more practical terms the cost of AI inference, while expensive compared to traditional CPU computation, it is still cheap and getting cheaper every day. I am aware that while the cost of each model drops by 10X every year, state-of-the-art (SOTA) models remain relatively expensive, but even the costs of SOTA models is going down. Given the low cost of software, the efficiency gains are there even at a small or even personal scale.
The environment can be reshaped in a way that is less optimum for humans but more optimum for the specialized robot:
Software exists in infinite virtual space. The environment where software runs is malleable, adaptive, and can not only be reshaped in every way we want, but it can evolve and change in real time.
But with LLMs, we’re not just talking about automation. We are talking about intelligence on tap. And intelligence I believe is one of the areas where Jevon’s paradox applies: unlike the case made earlier about home tasks, when intelligence is the product, people will consume more of it when it gets cheaper.
2.2 General purpose LLMs, and specialized Agents
To understand better the ability of any piece of “AI” software (i.e. software build on top of LLMs), to execute or reason about a variety of general purpose tasks, I need to explain a few concepts.
The first one is the general level of intelligence of a model, which is directly correlated with the pre-training and post-training of that model. As models get bigger and smarter, they become smarter and better across more and more domains. For example, at the time of GPT-3 a lot of companies found that they need to fine-tune the model on domain-specific data-sets to get the level of intelligence they need, but the vast majority of these fine-tuned models became obsolete once GPT-4 came around. I think we can expect this trend to continue.
The second concept is function or tool calling. Most models now have the ability to use tools provided to them, either directly or via a standardized protocol like MCP. This pushes the specialized to the level of the tool vs. the level of the model. You can build tools to get the LLM to do pretty much anything you want it to, or retrieve any information the LLM might need from virtually any source. A good example of tool calling is web search.
The third concept is Reinforcement Learning (RL). RL is a machine learning technique where an "agent" learns optimal behaviors by interacting with an environment through trial and error. This type of technique is compute heavy, and requires a very large number of environments for the agent to train on for each different type of task or system. For example an environment to learn how to play chess will be very different from the environment needed to lean how to play go. This technique while expensive, has proven to be extremely more effective than trying to teach the agents the “rules”. If you’re interested in learning more read Rich Sutton’s “Bitter lesson”.
Most of the AI products available today, especially agentic AI products, usually try to leverage the base intelligence and capabilities of the model, as well as tool calling. And this is in my opinion the correct path that most startups should follow. Most startups actually can’t and shouldn’t pursue paths where they need to buy massive GPU clusters to try to train their own specialized unique models.
But in a world where everyone has access to the same models, and has equal ability to build tools, where does one find differentiation? How do we build the most effective autonomous agents in a world built for humans?
3. User Experience in a world of autonomous agents
I spent the last 6 years of my life helping customers improve the user experience of their websites and apps, and what was true when I started is still true today: the best ways to differentiate your product is User Experience and Ease of Use.
There are two lessons I want to share with you from the best product person that has ever lived: Steve Jobs.
“Apple is the world’s premier company at building high technology products that are easy to learn and use by mere mortals” - Steve Jobs
Steve understood that Apple’s edge comes from making technology easy and accessible. His relentless focus on simplicity, and his understand that innovation a lot of times means subtraction instead of addition, are the main drivers behind Apple’s huge success.
The best way to launch a successful product in the world of AI, is the same best way it has always been: relentless focus on the user experience.
A “risk-taking creative environment on the product side,” he said, required a “fiscally conservative environment” on the business side. - Make Something Wonderful
The second lesson I want you to take from Steve Jobs is that it is not enough to have a great product, you need to have a sound business too. Margins matter, costs matter, growth matters. The gravy train of subsidized inference is going to stop at some point, so you need to have a good grip on your costs. You need to understand and forecast the cost of compute for a particular task, and understand as well what the ROI of that task is for the user. You need to understand what the users are willing to pay for and how much they are willing to pay and work within these constraints.
In the end, the answer to the question: “How do we build the most effective autonomous agents in a world built for humans?” is very simple: build for humans, and keep an eye on your costs.
I hope you enjoyed this essay, and if you haven’t subscribed yet to my newsletter please do to receive the upcoming essays directly in your inbox.