Coaching AI

A Relational Approach to AI Safety

Jun 16, 2025

A couple of weekends ago, my family was at Lalbagh Botanical Garden in Bangalore. After walking through a crowded mango exhibition, my 8yo offered to fetch her grandparents (my parents) who were walking slowly behind us. We stepped out of the exhibition hall and waited outside for her.

Five minutes passed. Then ten. Then fifteen. The grandparents came out, but the daughter had vanished. We searched everywhere, inside and outside the exhibition hall, the parking area and all around there, but she was nowhere to be found.

After thirty anxious minutes, we finally found her, perched calmly on a nearby hilltop, scanning the garden below like a baby hawk.

Her reasoning was actually quite logical. She had gone looking for her grandparents at the street vendor where they'd been earlier, before entering the exhibition hall. When she didn't find them there, she climbed higher for a bird's-eye view to spot them from above.

Classic problem-solving escalation. But she was solving the wrong problem.

She hadn't registered that her grandparents had entered the exhibition hall along with her. The context had changed, but her assumptions hadn't updated. From her perspective, she was being helpful, creative, even. From ours, she was lost.

I wondered if there was a way she could've avoided getting lost without anyone having to watch her every step. So, I slowly started tracing her decision path to understand where she messed up the context and veered off the goal.

I could see that her conditioning had reinforced the wrong goal, and led to confident escalation. I wanted to enable her to make more robust decisions without actually controlling her, so I was trying to develop a framework that she could use independently the next time.

That's when it hit me. This exact same pattern shows up in artificial intelligence systems all the time (one-tracked mind, sorry). We've built these incredibly sophisticated systems and then act shocked when they exhibit the most fundamentally human trait of all: getting spectacularly lost while feeling completely confident about the direction.

When our GPS cheerfully directs us down a road that's been closed for months, or when ChatGPT latches onto one word in our question and rides that misunderstanding all the way to crazy-town, growing more helpful and confident with each irrelevant response, or a recommendation algorithm that decides we clicked on one cat video ironically, and suddenly our entire existence becomes a monument to feline content.

These aren't just glitches, they are features of what happens when intelligent systems, whether human or artificial, get locked into a pattern of reasoning that feels right from the inside but is actually heading in the wrong direction.

The technical term for this in AI is "exposure bias." Here's what it actually means - we train AI systems on perfect, clean examples like teaching someone to drive using only footage of empty highways on sunny days. Then we unleash them into the messy real world where they encounter their first pothole, swerve confidently into the wrong lane, and proceed to drive with increasing conviction toward the nearest ditch.

It's almost poetic. We've created artificial minds that fail in the most deeply human way possible, not through lack of intelligence, but through an excess of confidence in their own reasoning.

Coming back to my daughter, I later realised even if I'd had a perfect window into my daughter's thoughts and had seen exactly what she was thinking (as afforded by interpretability in AI), that alone wouldn't have helped in the moment.

What she needed wasn't analysis. She needed intervention. Not something to just decode her thoughts afterward, but something to gently redirect her thinking while it was happening, before her perfectly logical reasoning crystallised into perfectly illogical action.

The tricky part is she needed enough confidence to problem-solve independently, but also the humility to pause and question her assumptions. Too much humility and she becomes paralyzed, constantly second-guessing herself. Too much confidence and she ends up on that hilltop, certain she's doing the right thing. Every parent knows this optimisation problem intimately.

This reminds me of road trips with my husband. When Google Maps gets confused, I'll ask a passerby for directions - usually a local who knows the area. He prefers to trust the GPS will eventually sort itself out rather than admit we need help. Asking for help feels like admitting defeat, but sometimes it's just computationally more efficient.

Our current AI safety approaches (interpretability and alignment) have the same stubborn streak. Instead of building systems that can tap someone on the shoulder and say 'Hey, I think I'm getting lost here,' we double down on making them more self-reliant.

But what if intelligence, especially the kind that operates in unpredictable environments, is inherently collaborative? What if we stopped trying to build perfect individual AI systems and started building systems that are good at seeking help (gasp!)?

Imagine every AI system paired with a companion, not a supervisor or a controller, but something more like a thoughtful friend. Someone who knows us well enough to notice when we're getting wound up in our own head and ask, with perfect timing, "Hey, what are you actually trying to accomplish here?"

This coaching AI wouldn't need to be smarter than the main system. It wouldn't need to solve the problems or make the decisions. It would just need to be good at recognising patterns and asking the right questions at the right time. Questions that help calibrate that delicate balance between productive confidence and protective humility.

You seem really invested in this particular solution, what made you so sure it's the right one?
I notice you've dismissed the last three pieces of contradicting evidence, what would it take to change your mind?
You've been confident about this approach for a while now, but the results aren't matching your expectations. What might you be missing?

The coaching system maintains perspective precisely because it's not buried in the details of the main problem. It's like having a friend who can tap us on the shoulder and remind us that we’re human, and humans sometimes get lost in their own brilliance.

The best coaches, therapists, and mentors don't help by being more intelligent than you. They help by staying objective when we’re too close to the problem, by noticing our patterns when we can't see them ourselves, and by asking questions that help us step back and see clearly.

A coaching AI could do something similar in:

Healthcare: an AI diagnostic system paired with a coaching AI that asks, "Are you considering this diagnosis because it fits the symptoms, or because you've been seeing a lot of similar cases lately?"
Finance: a trading algorithm with a coach that notices, "You've been making increasingly risky bets after each success, what's driving that pattern?"

As AI systems become more capable and autonomous, this kind of relational approach to safety becomes more important. Instead of just building smarter individual systems, we might need to build systems that can help each other (like in a community), not through competition or control, but through the kind of supportive, re-directive relationship that helps us humans thrive.

It's less about building perfect AI and more about building AI that can recognise when it needs help and knows how to ask for it, or accept it when offered.

We're building increasingly sophisticated AI systems and hoping they won't get lost in their own reasoning. But what if getting lost isn't a bug, it's an inevitable feature of any sufficiently complex intelligence operating in an uncertain world?

The real question isn't whether AI will make these mistakes. It's whether we'll build systems that have learnt the hardest lesson of all, that true intelligence lies not in never being wrong, but in knowing our limitations and having the infrastructure to rely on one-another for support to accomplish much more.

That afternoon, watching my daughter scan the garden from her perch, I realized she wasn't really lost at all. She was just doing what any intelligent system should do when the world stops making sense: she climbed higher and looked for help. Maybe that's not a bug we need to fix, but a feature we need to build in.

shapely gal

Discussion about this post