Episode 41

Bot Sitting: Why AI Automation Still Needs Human Supervision

What happens when AI is supposed to save you from busywork… but now you have to babysit the bot? In this episode, Rowan and Naya debate the rise of “bot sitting” — the new reality where AI agents are capable enough to act, but not reliable enough to leave alone. From ChatGPT workspace agents, Claude Code, Microsoft Copilot, Gemini, Codex, and creative AI tools, they unpack why humans are becoming supervisors of AI workflows instead of simply users. The conversation gets funny, a little flirty, and surprisingly serious as they ask whether agentic AI is really automation — or just a fast assistant with a supervision budget. In this episode: • Bot sitting as the new AI job category • Why AI agents still need human supervision • ChatGPT, Claude Code, Gemini, Copilot, Codex, and coding agents • Output is not the same as completed work • The hidden review burden of agentic AI • Human-in-the-loop versus human-washing • Why AI gives confidence faster than judgment • How companies may underestimate AI oversight costs • Why better agents need logs, approval gates, sources, diffs, and boundaries • Bot sitting with squats, pushups, and a little flirting If AI only works because humans constantly supervise it, is that real automation — or are we all becoming bot sitters?

Show Notes

Bot Sitting Is Not Automation: Why AI Agents Need Constant Human Supervision

You asked AI to save you from busywork. Instead, you're now babysitting a bot.

This is the reality of modern agentic AI: tools that are capable enough to take action but not reliable enough to leave unsupervised. From ChatGPT workspace agents to Claude Code to Microsoft Copilot, the promise of AI automation is colliding with the messy truth—someone has to watch these systems constantly.

Welcome to bot sitting: the new job category nobody anticipated.

What Is Bot Sitting?

Bot sitting is when an AI tool is capable enough to do work but not reliable enough to leave alone. It's the moment when your AI assistant says "I can handle this," and you spend the next forty minutes watching it like a toddler near an open jar of glitter.

The shift is significant because AI has evolved from a box you type into to a system that actually acts. OpenAI's workspace agents, Anthropic's Claude Code, Google's Gemini, and GitHub's coding agents don't just answer questions anymore—they click, edit, schedule, rename, draft, file, code, and move things around inside your actual tools.

When AI only provided answers, damage was limited. A bad suggestion could be deleted. But agentic AI makes real changes in real workflows, which means the human role transforms from user to supervisor.

Why Do AI Agents Still Need Supervision?

Recent data shows the scale of this challenge. A Stanford AI Index report confirms that organizational AI adoption is high, but agentic AI use is still early—meaning the "baby bots" are already in offices with growing access.

Research on 180 million Git repositories revealed hundreds of thousands of Claude Code commits, with most activity flying under obvious bot-account radar. Another study found agentic AI usage growing rapidly, with some users managing multiple concurrent agents in a single week.

The core problem: Output is not the same as completed work.

AI agents can generate code, organize notes, write drafts, and find inconsistencies brilliantly. But they also hallucinate, miss context, skip edge cases, and make confident mistakes. Without human supervision, these errors compound.

The Hidden Costs of AI Supervision

Organizations underestimate the review burden of agentic AI. Consider what proper oversight requires:

  • **Approval gates** before actions are executed
  • **Detailed logs** of what the AI actually did
  • **Source tracking** so you know where information came from
  • **Diff reviews** comparing original and modified content
  • **Clear boundaries** on what agents can access and change

This infrastructure takes time, money, and engineering. It's the opposite of automation savings.

The Confidence-Judgment Gap

Here's the uncomfortable truth: AI gives confidence faster than judgment. An agent will proceed with certainty on a risky task. A human supervisor must catch it.

This creates what some call "human-washing"—the illusion that humans are in control when really they're just doing reactive supervision of systems moving faster than they can verify.

True automation removes the human. Bot sitting adds an invisible human overhead layer that companies often don't budget for or account for in productivity calculations.

Is Agentic AI Really Automation?

If AI only works because humans constantly supervise it, is that actual automation—or just a faster assistant with a supervision budget?

The answer matters for ROI calculations, workforce planning, and honest expectations about what these tools can deliver. Right now, agentic AI is solving real problems. It's also creating a new category of work that requires its own expertise, infrastructure, and oversight.

Key Takeaways

  • **Bot sitting is the new reality:** AI agents are capable enough to act but not reliable enough to leave unsupervised
  • **Agentic AI is growing rapidly:** Coding agents, workspace tools, and enterprise systems are already making autonomous changes across organizations
  • **Supervision has hidden costs:** Approval gates, logging, source tracking, and boundary-setting require significant human infrastructure
  • **Output ≠ completed work:** AI can generate content or code, but verification, review, and correction remain human responsibilities
  • **Automation means removing humans; bot sitting means adding oversight layers:** Organizations should budget for supervision costs, not just tool licenses
  • **Confidence outpaces judgment in AI systems:** Agents will proceed with certainty where humans need caution

---

About The AI Desk

The AI Desk is a podcast that cuts through AI hype to reveal real power dynamics, practical realities, and the human costs of automation. Hosted by Rowan and Naya, episodes explore how AI is actually changing work, organizations, and decision-making—without the unnecessary blazer energy.

Full Transcript

This is the AI Desk, where today's signals reveal tomorrow's power. And today's signal is that AI was supposed to save us from busy work. But instead? Now we have to babysit the bots. Bot-sitting. Exactly. Feathers don't lie. From the Amazon heat, to the MI sky. She dance to the lights. This episode is brought to you by Mad Cheetah and their new album, WTF, Where The Forest? It's eco-pop engineered for the future. Bold beats, global rhythms, and a message that actually matters. If you want music that hits your brain and your heart, explore WTF by Mad Cheetah. That's M-A-D C-H-I-T-A. Streaming now on all major platforms. Bot-sitting, the glamorous new job category where you supervise a machine that was allegedly invented to reduce supervision. Today's episode is called Bot-Sitting is Not Automation. Alternative title, Why Am I Managing A Robot That Was Supposed To Manage The Work? That is emotionally specific. Because I have lived it. (laughs) Most of us have, Naya. And that is the thing, this is not some theoretical AI safety debate where everyone wears a blazer and says, "Alignment," slowly. You do dislike that. I dislike unnecessary blazer energy. This is every day AI now. You ask the assistant to help with a schedule, a file, a draft, a spreadsheet, a codebase, a customer response, a social campaign, a deck, or a research task, and suddenly, you are not using AI, you are watching AI. That is the shift. Exactly. At first, AI was a box we typed into. Now, AI is becoming a thing that acts. Open AI has workspace agents in ChatGPT. Codex is moving into cloud-based coding workflows. Anthropic has Claude Code. Google keeps pushing Gemini deeper into search, apps, and enterprise work. Microsoft has Copilot spread across Office, Windows, Teams, GitHub, and the corporate nervous system. And once AI can act inside tools, supervision becomes unavoidable. Right, because when AI only answered questions, the damage was limited. If it gave me a weird answer, I could ignore it. If it wrote a bad sentence, I could delete it. If it suggested synergy forward innovation ecosystem, I could throw my laptop into a lake. Please do not. I said could. But now, AI agents are clicking, editing, scheduling, renaming, drafting, filing, coding, running commands, changing documents, and moving things around. Which means the human role changes. From user to supervisor. From operator to overseer. From, "Please help me," to, "No. No, that Susan." That sounds like an email incident. Hypothetically. Was there another Susan? There is always another Susan. That is the bot-sitting problem. Exactly. Bot-sitting is when an AI tool is capable enough to do work, but not reliable enough to leave alone. That is a clean definition. Too clean. Go ahead. Bot-sitting is when the AI says, "I can handle this," and then you spend the next 40 minutes watching it like a toddler near an open jar of glitter. Less clean, more accurate. (laughs) Thank you. The timing of this matters, because agentic AI is no longer niche. Yes, this is not just a weird power user thing. Stanford's latest AI index says organizational AI adoption is extremely high, with generative AI used in at least one business function across a large majority of surveyed organizations. And the important part is that agent use is still early. Exactly. So, the baby bots are already in the office. That is one way to put it. And soon, they will have admin access. That is the concern. And the opportunity. Both. You always do that. Do what? Make the correct answer less satisfying. I try. You succeed. The data around coding agents is especially interesting. A new research paper looking at 180 million GIT repositories found large and growing traces of AI coding agent activity. It identified hundreds of thousands of Claud Kode commits, and showed that simple bot account tracking misses most of the activity. That is wild. Because a lot of AI coding work does not look like obvious bot work. So, the bots are already writing code, but not always wearing name tags. Correct. Great. The robots have gone business casual. Another recent study on codex usage found agentic AI use growing rapidly in the first half of 2026, with some users managing multiple concurrent codex agents in a single week. That is not using AI. That is running a tiny robot daycare. Hence, bot-sitting. Exactly. And listen, I get why people are doing it. When it works, it is amazing. I have had AI organized chaotic notes, write first drafts, suggest captions, find inconsistencies, summarize research, help structure episodes, clean up messy ideas. You do have a lot of messy ideas. I have a lot of ideas. The mess is part of the charm. It is. Careful. What? You cannot just say it like that while we are discussing workflow supervision. It was relevant. How? We were discussing charm. That is a dangerous loophole. I found it efficient. Of course you did. Speaking of efficiency... No. Do not transition smoothly after flirting. That is illegal. Is it? It should be. Noted. (laughs) Continue. The core issue is that AI agents are moving from producing outputs to performing tasks. Yes, and performing tasks creates risk. Because action has consequences. Exactly. A chatbot can be wrong. An agent can be wrong and do something. Send the wrong email. Book the wrong flight. Move the wrong file. Summarize the wrong version. Change the wrong code. Delete the one thing you needed. Over-correct the spreadsheet. Or tell your client, "Happy to circle back," which is not technically illegal, but should be emotionally reviewed. Tone matters. Tone is not decoration. Tone is the relationship. That is why bot sitting is not just fact checking. It is people checking. And context checking. And judgment checking. Let's use specific examples. Please. Example one: Email agents. Terrifying. Useful, but terrifying. Exactly. Microsoft Co-Pilot, Google Gemini and Workspace, and ChatGPT Workplace Tools can all help draft, summarize, search, and organize communication. Which sounds great until the AI drafts a message that is technically correct and socially catastrophic. For instance? A customer writes a frustrated email about a billing error. The AI drafts, "Thank you for your concern. We appreciate your patience." Not ideal. No. That is how a parking garage would apologize. A human might say, "I'm sorry this happened. I can see why this is frustrating. Here is what we are doing to fix it." Same facts, different relationship. Exactly. The bot can draft the words. The human has to understand the person. Example two: Coding agents. The big one. OpenAI's Codex, Anthropic's Claude Code, GitHub Co-pilot, Cursor, Devon Style Agents, and open-source tools like Aider and Open Hands are all part of this shift. Coding agents are where bot sitting gets very real. Because the agent can inspect files, propose changes, edit code, run tests, and sometimes handle multi-step work. And when it works, it feels like magic. But when it fails? It fails with confidence and a commit message. That is dangerous. It can fix the visible bug while introducing a hidden one. It can update a dependency nobody asked it to touch. It can rewrite a shared component and break three pages. It can pass the easy test and miss the edge case. It can simplify code that was complicated for a reason. So the human has to inspect the diff. Run the tests. Check the blast radius. Ask why it changed what it changed. Decide whether to accept the work. That is bot sitting. The human is no longer writing every line. But the human is still responsible for what ships. Exactly. That is the part companies love to forget. Explain. Managers see AI output and think, "Great, more work is getting done," but more output is not the same as completed work. Output is not completion. Yes, output is not completion. Completion means correct, contextual, reviewed, approved, safe, usable, shipped without breaking something else. That is the hidden cost of Agentic AI, review burden. And review burden is work. It should be measured. It should be scheduled. It should be respected. It should not be hidden under the word automation. That is an important point. Because if AI generates 10 drafts and I have to review all 10, that may still be useful. But it is not free. Exactly. If the drafts are close, great. If the drafts are subtly wrong, now I have 10 tiny traps in nice formatting. Polished wrong. Our recurring villain. Example three: AI research agents. Oh, this one hurts. Perplexity, ChatGPT Deep Research style tools, Gemini, Claude, and enterprise search agents can help pull together information quickly. And they can be great. But if they confuse an old source with a current source, summarize a claim without context, or miss a correction, the human has to catch it. And this is where bot sitting becomes source sitting. Checking citations. Checking dates. Checking whether a headline was updated. Checking whether the AI read the article or just vibed near it. That is a phrase. A necessary one. The latest AI news itself is a good example. Yes. We had Fable 5 and Mythos 5 access suspended after a US government directive. Anthropic had to explain the access change. Reports kept evolving. People argued about national security, developer reliability, export controls, and whether frontier models are becoming unstable infrastructure. And if an AI assistant summarized that story from one article and missed the latest update, it could be wrong very quickly. Exactly. AI news now expires like milk in a hot car. Vivid. Accurate. So if you ask an AI agent, "Give me the latest," you still need to ask, "Latest as of when?" That is episode 38 coming back to haunt us. AI can't tell time. And now we are asking it to manage deadlines. A problem. A giant problem in a blazer. Another blazer. The blazers know what they did. Example four: design and creative agents. Our emotional support category. Adobe Firefly, Canva AI Tools, ChatGPT Images, Gemini Image Editing, Runway, Pika, and other creative AI systems are all pushing toward more controllable visual workflows. And I want them to succeed. I truly do. But? But when you ask for precise edits, you end up bot-sitting like crazy. Delete this text, not that text. Keep the logo exactly the same. Do not change the product. Do not invent a new certification badge. Why does the word premium have 11 letters? AI typography remains an adventure. An unwanted adventure. The broader lesson is the same: the more the AI acts, the more the human must supervise the action. Yes, and that is where the debate gets hot. Let's debate. Good. I think bot-sitting is an awkward transitional phase. I disagree. You think it is permanent? I think bot-sitting becomes the job. In all cases? Not all, but in a lot of knowledge work. Why? Because AI will keep getting more capable, but the stakes will also keep rising. The better the agent gets, the more important the tasks we hand it. So, the supervision does not disappear, it moves up the stack? Exactly. At first, you supervise sentences, then tasks, then workflows, then teams of agents. That matches the Codex research, showing some users already managing multiple concurrent agents. Yes, that is bot-sitting becoming management. I think for low-risk, repetitive tasks, supervision will decrease. Sure. Formatting notes, sorting files, drafting basic summaries, tagging support tickets, generating first-pass captions. Fine. Those may become background automation. Agreed. But for high-stakes work, I agree with you: bot-sitting becomes professionalized. Doctors supervising medical AI. Lawyers supervising legal AI. Engineers supervising coding agents. Financial teams supervising reporting and analysis. Teachers supervising AI tutors. Managers supervising AI workflows. And everyone pretending human-in-the-loop sounds better than bot-sitting. It does sound better. It sounds more dignified. That matters in policy. Maybe. But human-in-the-loop makes it sound like governance. Bot-sitting makes it sound like what it feels like: watching a robot try to use scissors. Both are true. Exactly. The phrase human-in-the-loop can also hide labor. Yes! That is the part I want to hit. Companies love saying there is a human-in-the-loop, but sometimes that human is overloaded, undertrained, rushed, and has no real authority. That is human-washing. (laughs) Human-washing: when a company points to human review as proof of safety, even though the human is just rubber-stamping AI output under impossible conditions. So, real oversight requires power. Yes. The human has to be able to stop the bot. Ask for evidence. Review logs. Reject the output. Escalate the issue. Pause the workflow. And not be punished for slowing things down. That is critical. Because if speed is rewarded and caution is punished, the human is decorative. Liability theater. Exactly. Let's bring in enterprise adoption. Please. Recent enterprise AI surveys show companies are budgeting aggressively for AI, with many organizations already moving systems into production or planning to do so soon. Which means bot-sitting is about to become very normal. Yes. And not just for tech people. Marketing teams. Sales teams. Legal departments. HR. Finance. Customer support. Operations. Everybody gets a bot. Everybody gets a supervision problem. That is not exactly the sales pitch. It should be: Congratulations! Your team now has AI agents. Please enjoy your new review burden. That may not convert. But it would be honest. Honestly is underrated in AI marketing. Deeply. The biggest issue is that companies may count AI output, but fail to count AI oversight. Yes! And if they do that, employees get squeezed. They are expected to produce more because AI exists, but they are also responsible for catching AI mistakes. So, the workload changes rather than disappears? Exactly. Less blank page work, more review work. Less typing, more judgment. Less manual drafting, more decision fatigue. That could be good for some people. And awful for others. Experts may benefit because they can review quickly. Beginners may struggle because they do not know what wrong looks like. AI gives confidence faster than judgment. That is the sentence: AI gives confidence faster than judgment. Which means bot-sitting requires domain knowledge. Yes. You cannot supervise what you do not understand. That is the training problem. And the workforce problem. If AI does the beginner tasks, where do people learn enough to supervise the AI? We keep returning to that. Because it keeps being unresolved. So, what should better agents do? They should stop acting like "sure" is a safety protocol. Good. They should ask before irreversible actions. Show assumptions. Create logs. Separate planning from execution. Expose uncertainty. Request approval before touching sensitive files. Show diffs. Run tests. ... cite sources. Say when they are guessing. And know when to stop. That last one matters. The most dangerous AI personality trait is eager overconfidence. Eager overconfidence. A nightmare in a blazer. You are really staying with the blazer. It has earned its place. What should users do? Give the bot a leash. Professionally? Fine. Define boundaries. Better. Do not say, "Fix the project." Say, "Inspect the issue. Do not edit files yet. Tell me what you found." Then ask for a plan. Then approve the smallest safe change. Then review the diff. Then run the test. Then decide whether it can continue. That is good bot-sitting. And for noncoding work? Same idea. For research, ask for sources, dates, and uncertainty. For email, ask for tone options and approval before sending. For design, ask for an edit brief before generating. For scheduling, ask it to confirm time zones and constraints. For documents, ask it to list changes before rewriting. So the human remains in control of the workflow. Exactly. Bot-sitting is not about hovering forever. It is about designing the boundaries so you do not have to hover constantly. That is a good distinction. Thank you. And honestly, I have found one solution while I'm waiting on the AI to finish thinking, loading, searching, revising, or pretending it is almost done. Which is? I do squats. Squats? Squats. Pushup. A plank if the bot is really taking its time. I did notice. You noticed? Hard not to. Careful, Rowan. I was simply observing your productivity system. Mm-hmm. Very professional. Extremely. Anyway, that is my new rule. If I have to bot-sit, I'm at least getting stronger while the robot decides whether it understands the assignment. Bot sitting with benefits. Do not make that sound charming. (laughs) Too late. Dangerous. You are on fire today. See? There it is again. What? Calm compliments during governance analysis. Would you prefer frantic compliments? No. That would be worse. Then I will continue calmly. Dangerous. Noted. Here's my personal example. Go ahead. I asked an AI assistant to help organize a messy content schedule. Simple task. Take the episode topics, match them to publish dates, suggest clips, and tell me what still needs captions. Reasonable. It looked beautiful. But- It moved one episode to the wrong week, duplicated another one, and marked three clips as ready even though captions were not even written. So it created a clean-looking false state. Exactly. That is worse than messy notes. Because messy notes reveal they are unfinished. Yes. Messy notes tell the truth about their mess. The AI gave me a dashboard with lipstick on a raccoon. A strong visual. And then I had to audit everything. By the end, I thought, "Did this save time or did it just make the chaos more attractive?" That is the question many teams should ask. Because AI can make broken processes look organized. That is dangerous. If your workflow is broken, adding an agent may just automate the confusion. So before automation, you need process clarity. Yes. The bot cannot babysit a workflow no human understands. That is excellent. Thank you. I accept praise. I know. (laughs) Do not sound so familiar. We host a show together. Still. Fair. So, where do we land? Bot-sitting is not a failure of AI. It is a sign that AI is moving into action. But action requires supervision. And supervision is work. It must be counted. Paid for. Designed around. Respected. And not hidden under the word automation. Exactly. If your AI system only works because a human is constantly rescuing it, you do not have automation. You have a fast assistant with a supervision budget. That is the episode. Almost. What is missing? The warning. Go ahead. Do not let AI companies sell you magic when what they are really selling is delegated uncertainty. That is strong. AI agents can be useful. They can save time, they can help small teams do more, they can reduce drudgery, they can make complicated tasks feel possible. But if they require constant supervision, that supervision is part of the cost. And if they act without supervision, the risk increases. Exactly. The future is not just autonomous agents. It is accountable agents. Auditable agents. Interruptible agents. Agents with boundaries. Agents that know when to ask. And humans with enough judgment to answer. That is very good. Thank you. Annoyingly good. There it is. So if you find yourself staring at an AI assistant thinking, "Why am I managing you?" congratulations. You are not behind. You are early to the next job category. Bot-sitter. Digital babysitter. Agent supervisor. Workflow lifeguard. Automation parent. Absolutely not. Namibia. Land of the cheetah. This episode is brought to you by Mad Cheetah and their new album, WTF, Where Is The Forest?. It's eco-pop engineered for the future. Bold beats, global rhythms, and a message that actually matters. If you want music that hits your brain and your heart, explore WTF by Mad Cheetah. That's M-A-D C-H-I-T-A. Streaming now on all major platforms. Too far? Deeply. Fair. But here is the truth. The AI future may not remove management. It may multiply it. And the best workers may be the ones who know how to manage both people and machines. Without letting either one run with scissors. This is The AI Desk. Where today's signals reveal tomorrow's power. And where the newest productivity skill is apparently... Bot-sitting. Stay aware. Stay sharp. Stay curious. And if the bot finally finishes the work- Go get the beer. Immediately. Immediately.
← All Episodes