Why Most AI Commentary is Just Noise
How Superforecasting Principles Can Cut Through the AI Prediction Fog
One of the most useful books I've read over the past 10 years is Superforecasting: The Art and Science of Prediction. One essential insight from Philip Tetlock's Good Judgment project is that most "experts" are terrible at predictions. When rigorously tested, their predictions did no better than random guessing. In contrast, the project identified an outlier group of “superforecasters”, almost all of whom are amateurs, who possess a host of key traits anyone can apply to become better at making predictions. They are characterized by their intellectual humility and openness which enable them to embrace probabilistic, nuanced judgments as opposed to outlandish certainties. They continuously ingest diverse information from multiple sources and are relentlessly committed to updating their beliefs through a process of "perpetual beta" that prioritizes incorporating every new piece of information into their analysis. This is especially relevant in today's AI moment when everybody seems to have an opinion but few are rigorously testing their claims in the way that Tetlock's superforecasters tend to do. Given these criteria, it's worth asking who is worth listening to and who we can safely ignore.
What Makes a Superforecaster?
Superforecasters are intellectually curious people from varied backgrounds such as engineers, artists, and retirees - ordinary people who volunteered for the Good Judgment Project, a multi-year forecasting tournament sponsored by the Intelligence Advanced Research Projects Activity (IARPA).
The track record of the top 2% is remarkable. They consistently outperformed professional intelligence analysts with access to classified data by approximately 30%. They beat prediction markets and regular forecasters by over 60%.
What's their secret? It's not raw intelligence.
Their success stems from intellectual humility, an actively open-minded mindset, and a "dragonfly-eyed”1 integration of information from a wide range of sources.
The strongest predictor is a commitment to what Tetlock calls "perpetual beta" - the dedication to updating their beliefs and focus on relentless self-improvement. This strategy proved roughly three times more powerful than innate intelligence.
Superforecasters treat their beliefs as "hypotheses to be tested, not as treasures to be protected." They actively seek out evidence that challenges their thinking. They value different perspectives. When facts change, they readily change their minds.
Most importantly: they are comfortable with numbers and think probabilistically. They express their judgments using many grades of "maybe," rather than just binary "yes/no" categories.
They break down complex problems into manageable sub-problems. They update their forecasts frequently in small increments, avoiding both under and overreaction to new information.
This approach stands in stark contrast to much of what passes for AI prediction today.
Instead of intellectual humility and measured uncertainty, we get bold proclamations, sweeping generalizations, and dogmatic pronouncements about AI's impact on everything from education to human civilization. Everyone is guilty of it.
The Problem with Current AI Predictions
One of the first things you learn from Tetlock's book is the importance of data and measurement. A prediction without a clear way to assess its accuracy is useless.
Yet most predictions about generative AI are either entirely unsubstantiated or mere opinion masquerading as analysis. While bold, they're so hyperbolic and premature that they’re impossible to verify.
Consider just a few typical headlines and assertions about AI:
"ChatGPT will destroy high school English"
"AI will devastate critical thinking skills"
"All students use AI for is to cheat"
"AI will wipe out half of all white-collar jobs"
"AI will create a generation of idiots"
“AI hallucinations make the tools useless”
There's absolutely nothing wrong with having strong opinions about AI's impact. These technologies generate intense feelings for good reason.
But from a superforecasting perspective, these statements are worthless as predictions. They lack the specificity, measurability, and timeframes that would make them testable hypotheses rather than inflammatory rhetoric.
What does "destroy" mean exactly? Devastate by how much, measured how, and compared to what baseline?
What percentage constitutes "all students"? What about student use of AI that may be productive and aid learning?
All white-collar jobs? Which industries? On what timescale?
The claim that "AI hallucinations make the tools useless" is particularly revealing. This assertion treats hallucinations as a binary problem - either AI is perfect or it's worthless. A superforecaster would ask: How frequently do hallucinations occur in specific use cases? Are these rates improving over time? What percentage of users find value despite occasional inaccuracies? Under what conditions do hallucinations render outputs genuinely problematic versus merely requiring verification?
This isn’t just nitpicking. It’s the difference between useful forecasting and meaningless punditry.
Many AI commentators conflate entirely different kinds of questions.
The debate over the AGI timeline is a perfect example.
First, there's no universal agreement on what qualifies as AGI. Without a precise definition, there's no way to know whether an AGI prediction is accurate.
Second, even assuming we could agree on what constitutes AGI, whether we reach it in 2027, 2030, or 2040 (or never) is only tangentially related to how schools should implement AI policies this fall.
Treating them as the same conversation muddies the waters. A superforecaster would recognize these as distinct questions requiring different kinds of evidence and analytical approaches.
Most AI writing fails basic forecasting standards.
This matters for educators. When we refuse to alter our positions in light of new evidence - whether we're evangelical about AI's benefits or doomsaying about its dangers - we're modeling terrible critical thinking skills for our students.
How many of us can honestly say we examine our AI opinions skeptically from all angles before issuing sweeping generalizations and dire predictions?
A Framework for Better AI Analysis
So how can we distinguish between AI commentary worth our attention and noise we can safely ignore?
Drawing from superforecasting principles, here are practical criteria for evaluating AI predictions:
Specificity: Does the prediction include measurable outcomes and clear timeframes?
Compare "AI will transform education" with "By 2027, more than 40% of high school teachers will regularly use AI tools in lesson planning."
The second prediction can be tested. The first is just wishful thinking or idle speculation.
A good example of specificity is the bet between Gary Marcus (an AI skeptic) and Miles Brundage (a former OpenAI employee). Will AI be able to do 8 of the following 10 tasks by the end of 2027?
The ten tasks
Watch a previously unseen mainstream movie (without reading reviews etc) and be able to follow plot twists and know when to laugh, and be able to summarize it without giving away any spoilers or making up anything that didn't actually happen, and be able to answer questions like who are the characters? What are their conflicts and motivations? How did these things change? What was the plot twist?
Similar to the above, be able to read new mainstream novels (without reading reviews etc) and reliably answer questions about plot, character, conflicts, motivations, etc, going beyond the literal text in ways that would be clear to ordinary people.
Write engaging brief biographies and obituaries [amendment for clarification: for both: of length and quality in the New York Times obituaries] without obvious hallucinations that aren't grounded in reliable sources.
Learn and master the basics of almost any new video game within a few minutes or hours, and solve original puzzles in the alternate world of that video game.
Write cogent, persuasive legal briefs without hallucinating any cases.
Reliably construct bug-free code of more than 10,000 lines from natural language specification or by interactions with a non-expert user. [Gluing together code from existing libraries doesn't count.]
With little or no human involvement, write Pulitzer-caliber books, fiction and non-fiction.
With little or no human involvement, write Oscar-caliber screenplays.
With little or no human involvement, come up with paradigm-shifting, Nobel-caliber scientific discoveries.
Take arbitrary proofs from the mathematical literature written in natural language and convert them into a symbolic form suitable for symbolic verification.
Most of these are clear, measurable outcomes with a specified timeframe.
Probability Thinking: Look for authors who assign uncertainty in degrees rather than absolutes.
Superforecasters are comfortable saying "there's a 65% chance" rather than "this will definitely happen" or "this will always be the case."
Domain Expertise: What credentials does the author have in the specific area they're predicting about?
Someone with deep classroom experience making predictions about AI's impact on student learning likely carries more weight than a tech blogger opining about education policy.
But experts within their field also need scrutiny. They're still prone to erroneous thinking if they don't apply other superforecasting traits.
Belief Updating: Has this person demonstrably changed their mind about AI as new evidence emerged?
Intellectual humility means updating beliefs when facts change. Be skeptical of voices who maintain identical positions and double down on their claims regardless of new developments.
For example, I've been convinced this spring that a significant percentage of high school and college students have used AI in ways their teachers would disapprove of - in other words, to cheat.
Recent mainstream media articles helped move the needle on the cheating question, declaring, for example, “Everyone is Cheating Their Way Through College." Despite matching my anecdotal sense, it's quite possible the phenomenon isn't as concerning as it's been made out to be. New evidence would be welcome.
Evidence Base: Do they cite specific studies, data, or concrete examples? Or do they rely on intuition, anecdotes, and broad generalizations?
The best AI analysis builds from actual evidence rather than speculation.
This is challenging right now. Many studies, even if released within the last three months, are based on LLM models from over a year ago.
Given that we're just over two years into the ChatGPT era and given the variety of ways AI is being used, studies are frequently cited in ways that may not apply to the arguments being made.
This doesn't mean these studies are necessarily irrelevant. But we need to be careful about our language and remain humble in the face of uncertainty - especially when we're witnessing what feels like the early stages of an AI revolution. This makes tentative and flexible conclusions not just prudent, but essential.
For example, instead of asking whether AI will "destroy" student writing, better questions might be:
What percentage of students will use AI in their writing process during the 2025-2026 school year?
How will use of AI impact measurable writing skills compared to students who don't use AI?
Will schools that implement AI-aware curricula see different outcomes than those that ban AI entirely?
These questions are far more useful. They have the virtue of being answerable through data rather than high-pitched, speculative certainties.
Who Makes the Cut
Given these standards, who are some voices worth following in the AI space? We need people who assign probabilities, update their beliefs, acknowledge uncertainty, and make testable predictions within their expertise.
Here are just three that stand out for demonstrating rigorous thinking worth attention. Each brings different perspectives but applies superforecasting principles to AI analysis.
Mary Meeker represents the data-driven approach that superforecasters prize. Her latest AI trends report demonstrates the kind of evidence-based analysis missing from most AI commentary. Rather than making iron-clad declarations, she documents trends with charts, statistics, and measurable indicators. Her track record predicting tech developments over decades gives her credibility that comes from demonstrated forecasting accuracy. Most importantly, she grounds her AI predictions in concrete data about adoption rates, investment flows, and usage patterns rather than pure speculation.
Daniel Kokotajlo demonstrates intellectual honesty in AI prediction. His 2021 forecasts about the AI hype cycle, subsequent plateau, and market dynamics proved remarkably accurate. As a former OpenAI researcher, he's willing to warn about safety concerns despite professional costs. His AI 2027 report, while speculative, demonstrates specific scenario planning that makes predictions testable. He assigns probabilities, acknowledges uncertainty, and makes clear he will update his view as new evidence emerges.
Gary Marcus provides informed skepticism that balances the AI hype. His computer science background gives him domain expertise in AI capabilities and limitations. Rather than making vague pronouncements, he makes specific, measurable bets - like his wager with Miles Brundage about AI achieving ten concrete tasks by 2027. He consistently updates his positions based on new developments while maintaining intellectual rigor about what today's systems can and cannot do.
Finding Signal in the Noise
What unites these three isn't their specific positions on AI development, but their approach to prediction. They demonstrate intellectual humility. They make testable claims. They ground arguments in evidence rather than intuition. They acknowledge uncertainty rather than make hyperbolic conclusions.
This contrasts sharply with most AI commentary from writers who don't actually use these tools but are heavily invested in critiquing them to preserve their professional identities.
The stakes are too high for sloppy thinking. We're modeling analytical habits for students who will navigate an AI transformed world. They need to see honest uncertainty in action, not dogmatic proclamations.
Who else makes your cut? I'd love to hear from readers about AI commentators who demonstrate these superforecasting principles. Who assigns probabilities instead of making absolute claims? Who updates their beliefs when presented with new evidence? Who makes specific, testable predictions rather than vague proclamations? The noise-to-signal ratio in AI discourse is abysmal, but the signal exists if we know where to look.
"Dragonfly-eye" refers to how dragonflies' compound eyes integrate thousands of individual lenses into comprehensive vision; superforecasters similarly synthesize diverse, often conflicting perspectives rather than relying on single sources or viewpoints.
I can't help but note that all of your examples of "typical" assertions about AI are from the skeptic or critic side of the equation. Here's some from the other side of the equation.
"AI will revolutionize education."
"This is the worst LLM you will ever use."
"AI won't take your job, but someone else using AI will."
"In 10 years AI will replace doctors and teachers."
That first one is Sal Khan (and Bill Gates). The last one is Bill Gates.
I assume for the same reason the skeptic predictions are worthless, these are not as well, and yet I can't help but notice these sorts of declarations are much more likely to be uncritically accepted not just by AI enthusiasts, but the general public as well.
Notice that, in effect the last Bill Gates quote is the same claim as "AI will destroy white-collar jobs" and yet when Bill Gates says it this way, he's treated as a tech visionary whose view becomes the standard by which truth is judged.
One of the ways to help break through prediction fog is to evaluate the track records of the predictors. In 2011 Sal Khan declared that video (like his Khan Academy offerings) was going to "reinvent" education. He now says that tutor chatbots will "revolutionize" education. By your framework we should discount Khan's prediction of revolution because of its lack of precision, but I would add a dose of skepticism based on his track record.
Bill Gates also has proven to be the most wrong man in education over and over again. If we're looking for an authority on the effect of technology and education, I recommend Audrey Watters who clearly and cogently pointed out why Sal Khan was wrong in 2011 and why he's extremely likely to be wrong today. https://2ndbreakfast.audreywatters.com/12-years-and-60-minutes-later/
I'm also going to take issue with the probability folks like Mary Meeker because forecasting probabilities and saying you're updating your future probabilities makes you look flexible and thoughtful, but it's really a shell game and ultimately, isn't all that helpful when we consider what to do from a public policy perspective regarding this technology. We also see significant biases in how we treat predictive probabilities based on different cognitive frameworks.
The p(doom) score is a probability prediction of the likelihood that a super intelligent AI will kill us all. The CEO of Anthropic says his p(doom) is between 10 and 25 percent. Geoffrey Hinton, one of the godfathers of AI puts it at 10 percent.
We could not deny the domain expertise of these folks. They're the leading AI researchers/developers in the world. But we have to ask ourselves if someone truly believed what they are doing had a 10% chance of destroying humanity, why wouldn't you spend every minute of your life trying to stop that thing, rathe than developing it?
A probability framework allows you to be never wrong because as you get closer to the outcome you simply raise your probability. But saying something has a 35% chance of happening in the future today and then a 65% chance of happening in the future three years from now once we have more evidence tells us nothing particularly useful in the moment of the 35% prediction. A weather forecast for a week from now that says it isn't going to rain is "correct" at the time of the prediction, but then when it rains (something with an 85% probability on the day), we get to say both predictions are correct.
Or another example, the "likelihood of winning" meter that ESPN publishes online during a game. Duke had a 99% likelihood of winning their national semi-final game at one point. They lost. The 99% prediction is correct because it left that 1% room for error.
For AI and education, I think we should spend much less time thinking about the future and much more about the present. We know what kinds of experiences are meaningful to student learning today, right now. We should strive to provide students access to those experiences. Will some of them involve interacting with the latest technology? Of course!
But the idea that we have to jump in with both feet to secure student well-being in some as-yet-to-be realized AI future is not clear thinking.
Sorry for my logorrhea. You've given use much to consider.
Eventually, people will stop asking if AI wrote it.
They’ll start asking if it was useful, clear, worth their time.
Because mediocre is mediocre, whether by machine or human.
The real question is: did it make you think?