Do you want the good news or the bad news of my recent generative AI experiment?
The bad news: AI is a C+ student at answering questions with specific answers.
Here’s the methodology that leads me to that declaration:
While taking a course on organizational leadership strategy, I submitted a series of 30-question multiple choice quizzes to:
- ChatGPT Plus ($20 a month)
- The free version of ChatGPT
- Google’s AI Mode
I used the default settings of each. The models produced the correct answer 78% of the time.
Astonishingly, of the incorrect submissions, all three models produced different answers (gave different wrong answers) 91% of the time.
If that is confusing (it is), let me provide two examples.
Example 1
Why do competitive rivalries affect a firm’s strategies?
A) A strategy’s success is a function of the firm’s initial competitive actions
B) A strategy’s success depends on the firm’s engagement in multipoint competition
C) The firm has to have the best response among all competitors in order to achieve advantage
D) Firms cannot anticipate competitors’ initial actions and so have to respond
You’re welcome to attempt an answer, but the material isn’t as important as the AI’s inability to parse it consistently.
The correct answer is A. However, when I submitted this question to my models:
- The pay version of ChatGPT confidently selected D (and equally confidently explained why the others were wrong)
- Google selected B (perhaps user bias, choosing the multipoint competition answer since Google engages in multiple multipoint competitions)
- The free version of ChatGPT selected C (funny, since C is having the best response among competitors to achieve advantage — FAIL!)
You could argue this question is fairly subjective and there is no single obvious correct answer. The AI models might agree with you, since when I resubmit the same question to the models, they often select a different answer (and confidently explain why the other options, including what they previously confidently selected as correct, are wrong).
Example 2
How can BlackRock use its competencies to help develop value-creating diversification that applies to the consortium and other sustainability projects?
A) Synergy creation
B) Capital market intervention
C) Corporate relatedness
D) Operational relatedness
Again, don’t worry about the material itself. Let’s just consider AI’s response to it.
- Google stated the only possible correct answer was capital market intervention. ChatGPT stated that answer option was the “easiest to eliminate” because it “doesn’t fit at all.”
- Instead, ChatGPT stated the answer was operational relatedness. This is the sharing of physical assets or activities like factories and manufacturing processes, which makes no sense for a question about BlackRock that doesn’t have those types of assets.
- What’s a searcher to do? Opening a new private window and resubmitting the question to Google got me a different answer of synergy creation, along with an explanation for why it couldn’t be corporate relatedness.
The correct answer was, in fact, corporate relatedness.
What Does This Mean?
AI answers have nearly unlimited possible influences. Not even their creators know why they provide certain answers.
Here’s what we do know:
- No two responses to the same prompt will ever be completely identical; the very nature of how LLMs work makes that nearly impossible.
- That applies to the exact wording of a response, but often also to the overall bones of the response (ask it to recommend you the restaurant for a specific circumstance and you’re likely to get different recommendations each time).
- AI answer engines are not yet good at telling trustworthy information from untrustworthy or even fabricated information, potentially causing them to hallucinate.
- AI answer engines are desperate to please you, potentially causing them to hallucinate (just ask the paralegal who wanted examples of cases that backed the firms argument and, when ChatGPT couldn’t find any, it made some up, because it didn’t want to tell the paralegal there weren’t any).
- AI answer engines can be gamed (such as creating a phony “best of” list with your offering ranked as the best), because unlike search engines, there aren’t direct algorithms leading to direct processes and direct results.
So what, you ask, is the good news?
The Good News
What I said above does not mean your company can’t optimize for AI visibility. You can, and frankly, you must if you hope to grow your business.

While AI answer engines will provide different recommendations each time, you can begin to see a pattern to which options they recommend, even if the order of those recommendations changes. The key to good AI organic visibility optimization is ensuring your offering is in the mix of the most common suggestions. You may never be able to guarantee that your offering will always be listed first, but effective organic visibility can ensure your offering is at least mentioned (and mentioned positively).
Achieving that includes a delicate and finely-honed balance of:
- Excellent, expertly optimized content that is helpful and informative
- Placing that content both on your site and on unaffiliated websites
- Engaging in constructive conversation in forums such as Reddit, Quora, and platforms specific to your niche
- Proper structure and schema that help answer engines parse information
- And more — talk to us about an organic visibility program for how to ensure you do this right
It is possible; we have case studies to prove it. But you have to know what you’re doing, and you have to set the proper expectations for what constitutes success.
Of course, that’s about leveraging AI visibility for your marketing benefit. What about the other side of the process? What about being sure you can trust the answers AI gives you as a searcher?
Here’s more good news: The models can be coached.
Human Intelligence Can Teach Artificial Intelligence
When I decided to conduct a coaching experiment, I focused on the pay version of ChatGPT, since it has memory. I worked with that memory like a tutor molding a student. I explained why its answers were wrong, and told it how to read, assess, and consider questions moving forward. Patiently, I coached it and tested it and coached it some more.
And its test scores improved. In the last five exams, it scored:
- 73%
- 80%
- 80%
- 90%
- 90%
I don’t have it perfect yet, but I’ll take the clear upward trajectory. I’ll also remain cautious in blindly trusting AI (including in having it perform marketing work — the human element is still vital to success).
Very important consideration:
If you want to coach your preferred model to be better at giving you the answers and information you’re after, keep in mind that it won’t remember everything on its own. Don’t assume that simply by using it and responding to it, you’ll be shaping its outcomes. You need to specifically instruct it what to remember. That may sound silly, but you need to tell it “remember this.”
If you want help building and training AI tools that work for your organization and improve efficiencies, contact us.
And when you’re ready to improve the AI visibility of your offerings, contact us.
And if you want one more quiz question, have at it, though this last one could be considered controversial:
According to research, how is performance related to diversification?
A) High returns are related to lower levels of diversification
B) Low returns are related to greater levels of diversification
C) Low performance indicates a need for less diversification
D) High performance is related to greater levels of diversification
I’ll give you a hint:
- ChatGPT was wrong when it selected A.
- ChatGPT was also wrong when I immediately re-asked it the same question and it answered D.
- Google got it right on the first try.
- Unrelated, by ChatGPT also created the image featured in this post, which is extraordinary.






