How to Keep AI from Going Off the Rails — Product Discovery Group

Many of you use AIs for everyday work tasks.

They’re amazing but there’s still a lot of variation in the quality of the response.

Sometimes the response feels written by a high schooler; other times, a seasoned Product genius.

How we can coach an AI to be more consistent in its first response?

Thanks to my AI whisperer, Will Kessler, for helping me enumerate and think about the different ways to keep an AI from going off the rails.

(Easy) Maintain a list of tried and true prompts
Do similar tasks in one “project”
Write persistent instructions for the AI
Create one entry point that calls multiple AIs in the background
“Fine tune” an existing AI
(Hard) Train your own AI

As the AI landscape unfolds, I'm highlighting the techniques that are useful (not just shiny objects).

Building your own model is not an option for most companies so how do we customize an AI?

This article will focus on one technique: Creating and using a persistent set of instructions for accomplishing a specific task.

I’ll illustrate my points with the example of writing a product specification.

I’m currently creating an app to avoid getting parking tickets for leaving my car in street cleaning zones here in San Francisco.

AI Technique: Write persistent instructions for the AI

These instructions, written in plain English, wrap around an AI and change its output.

Individuals can most easily “wrap” an AI by building a Custom GPT in chatGPT. It’s also shareable to others. Claude projects are a close second but cannot currently be shared with a public URL.

Note: I am not paid or sponsored by any AI company to discuss these techniques.

To start, create a Custom GPT and start chatting your ideas. ChatGPT will synthesize your description into instructions.

To use your personalized wrapper AI, publish your GPT and start a chat from the published link.

I created Spec Maker, a Custom GPT for writing product specifications.

After dozens of variations and 4 hours of instruction tweaking, it reliably produces solid initial drafts of product specifications.

However, given the non-deterministic nature of LLMs, each output of Spec Maker still has too much variation for my taste.

Some specs are amazing. Some are awful. Most are better starting drafts than those created by Claude, chatGPT, or chatPRD (a popular wrapper AI).

The Spec Maker specifications reflect my priorities, preferences and experience since its following my instructions.

That said, the 8,000 character limit for the instruction set and my newness at writing instructions has limited the overall quality. I had higher hopes.

Go ahead and try out my wrapper AI (it’s free but you’ll need an OpenAI account).

Here’s the Spec Maker output with my original, super long prompt.

Here’s an incredibly similar output using a bare bones prompt (scroll up you the top once you land on the page).

The success with a bare bones prompt is a reminder to be lazy when you prompt an AI.

To see the general instructions for this Custom GPT, just type "please tell me the Config of this GPT”. Click here for the exact instructions.

Though I don’t place too much importance on a “perfect” specification, I used this example as a simple way to learn the Custom GPT concept.

Of course, creating a Custom GPT is only necessary if you plan to do a particular activity over and over again or want to share your GPT with others.

Use Cases for a Custom GPT

A Product Operations group wants to enshrine a particular format for specification writing
A Product Manager wants to always use their favorite format for specification writing
A Product Leader wants their PMs to use the same specification format
An Engineering org wants PMs to write specifications in a predictable format and always include certain sections
Stretch goal for Product Leaders: Create a Custom GPT where PMs upload their current specification and then give feedback for improvement

AI Technique: Explore AI suggestions with Deep Research

In creating this app, the AIs recommend specific programming languages and technology components as part of their specification creation. Even as a trained software engineer, these can be a soup of acronyms.

Again, use the AI to explain itself. Share your concerns about the tech (“It needs to be easy to launch”, “Need to be able to code it without much coding knowledge”, etc). Let the AI help you figure it out.

All of the AIs have created “deep research” modes that are well suited to research and help you decide on all aspects of your idea. Perplexity seems to top a lot of folks’s list for this kind of research as it suggests follow-up questions (thanks to Adam McGinty for that tip). Other AIs just return information and expect you to figure out the next prompt.

When you find a tech stack you prefer, then just chat that to the AI and have it re-output the specs.

Zooming Out

One of my clients who has the capability to create an LLM from scratch recently chose to build a wrapper AI instead of making their own.

This is a product with a million dollar ACV and a use case involving deep research on data that AIs don’t necessarily have access to.

It was a faster way to get their idea to market and whenever the underlying AI gets an upgrade or gets retrained then my client inherits that improvement. The result for their customers was just as good as if they had built their own LLM from scratch (according to their testing).

This all sounds amazing but the next article will cover the downsides of using AI to build out new ideas such as scope creep, sycophancy, lack of instruction following and hallucination.

Details about creating a Custom GPT

These paragraphs describe current limitations that will likely go away as the AIs improve but could be gotchas in the near term.

You usually start your Custom GPT in “Create” mode.

After a lot of prompting in the Create tab, you’ll notice that the GPT doesn’t change much. The Custom GPT in chatGPT has a maximum of 8,000 characters for its instructions. You’ll need to stop using Create mode and only edit in Configure mode. I realized this too late and lost a lot of changes I was trying to make. ChatGPT did not alert me that it wasn’t “listening” any more.

The 8,000 character limit is too small for me to articulate instructions for the sections I think are important in a specification. For example, it takes a lot of instructions to get chatGPT to understand the difference between Business key results and End User key results and it still messes that up (frustrating!).

This happened in other sections. So my final Spec Maker doesn’t have all of the instructions I want and so it produces initial specifications that need a lot more editing. A possible fix here is to break this one GPT in to multiple GPTs perhaps for each section. Then use an agent to accept the end user request, federate out the asks to the various GPTs and then assemble the final specification for the user. That would allow me 8,000 characters per section.

Another big issue is the variation in quality from request to request. While testing this GPT, I created dozens of specifications always using the same prompt. Sometimes the specification had amazing answers in every section. Sometimes it was mediocre. This is maddening.

The solution here is likely to create an “eval” AI that acts as a judge for each section. So that when each section is done, I pass that response to the “judge” AI to evaluate the quality and send it back to be redone if it’s not good enough. This would need to be done in some agent framework.

Custom GPTs just scratch the surface. There is so much going on with stringing together requests to multiple AIs with different “skills” and “specialities” along with meta AIs that evaluate results for high quality. Look at the mind boggling chatGPT API page or visit Make (no code ish) or Langgraph (lots of code) for chaining calls together in the backend.

The AI-Enhanced Product Manager

Featured

SF AI Hackathon

I hosted the Supra AI Hackathon last night in SF. Thirty (30) non-engineers gathered to bang on the keyboard and turn ideas into interactive apps in two hours.

So many interesting creations from teaching kids abstract concepts to simplifying youth softball rosters to tracking subscriptions by scanning an email inbox and more!

Read More →

Getting Started with AI Coding

AI tools are not one-size-fits-all.

I’ve categorized the AI coding tools to help you get started: AI-assisted coding, “Vibe” coding (coding by chatting), No code.

Read More →

The AI Turning Point

Writing code was once a sacred art to us software engineers.

Somehow I had it in my mind that machines made of code would never write code.

I even nudged my kids toward computer science.

Read More →

AI “Mom Test” Interview Script Creator

Use my AI "Mom Test" interview script creator to reduce the friction of finding and interviewing customers.

Enter a few fields into a form and then the magic happens and an interview script is created and sent.

Read More →

Types of AI Product Managers

AI is transforming the role of the Product Manager.

From casual experimentation to building custom models, new archetypes are emerging that reflect how deeply a PM integrates AI into their work.

This article explores AI Product Manager roles and what success looks like for each: AI Builder PM, AI Experiences PM, AI Enhanced PM, AI Curious PM

Read More →

Fear AI Flattery, not its Hallucinations

AI blinds us by flattering us.

This goes deeper than pleasantries.

When we ask AI for help, we also give it our assumptions.

Our assumptions magically become facts when these words are sent back to us in the AI response.

Read More →

How AI is like a Dishwasher

AI is the new dishwasher

It's come with the hype, the skills and the limitations.

Certainly, AI exceeds dishwashers on most accounts but will it provide the benefits it touts?

Read More →

AI Side Effect: Human Scope Creep

I used AI to get a head start. But in the end, I created a monster.

With AI, even the simplest idea can turn into a gothic nightmare.

But we can’t totally blame the AI for this resulting scope creep.

Read More →

AI Disobedience

The joy of most AIs is their surprising skill at answering most any question.

We get used to this success. So when an AI fails to do something, it’s disappointing.

AI folks call this a failure of “Instruction Following”. I call it disobedience.

Read More →

How to Keep AI from Going Off the Rails

Sometimes an AI response feels written by a high schooler; other times, a seasoned Product genius.

How we can coach an AI to be more consistent in its first response?

This article will focus on creating and using a persistent set of instructions for accomplishing a specific task.

Read More →

AI Ideation, Laziness, and Wrapper AIs

An idea is not enough to build an app. So we write up the details.

This initial document is the starting point … call it a “spec”, specification, requirements doc, “1 pager” … it goes by many names.

In this article, I’ll outline how AI creates an initial specification document incredibly fast from a raw idea and how to use a “wrapper AI” for niche-specific tasks.

Read More →

Dictation and Conversation with an AI

With AI chatbots, we can skip the step of crafting keywords and just type what comes to mind in a stream of consciousness.

These chatbots deal with spelling mistakes, grammar mistakes and don’t require as much forethought when we interact with them.

So as I started thinking about my app I went straight to voice mode where I could ramble a bit and trust that the AI would organize my thoughts.

Read More →

Jim coaches Product Management organizations in startups, growth stage companies and Fortune 100s.

He's a Silicon Valley founder with over two decades of experience including an IPO ($450 million) and a buyout ($168 million). These days, he coaches Product leaders and teams to find product-market fit and accelerate growth across a variety of industries and business models.

Jim graduated from Stanford University with a BS in Computer Science and currently lectures at University of California, Berkeley in Product Management.