What Happens When You Invite AI to Audit Your Lessons
Three decades in the classroom taught me to question what I think I know
I shared this with our graduating class a year and a half ago and it still resonates:
Now that you’re graduating, I will let you in on a little secret - and I hope my colleagues will agree with me - there are days when we have no idea if what we are doing is working. Teaching is one of the few jobs I can think of where we cannot be sure on a week to week, month to month, or even year to year basis whether we are having the impact we want, whether you are understanding or even paying attention to what we think is important, or whether or not you care about anything we are talking about other than your grade in our class.
That’s the part no one tells you early on: teaching means embracing uncertainty.
This uncertainty becomes even more pronounced when we add AI to the mix. Are these tools enhancing learning or replacing thinking? Should teachers be using AI for curriculum design or does that compromise pedagogical integrity? Can AI reveal blind spots in our lessons that we miss on our own?
After nearly three years of experimenting with AI, here’s what I’ve learned: Most discussions about AI and teaching focus on the wrong use cases.
The dominant narrative is about generating content from scratch - ”let AI write your lesson plans,” “use ChatGPT to create assignments,” “have AI design your curriculum.” For newer teachers building their practice, that might be useful. But for experienced teachers? That’s not where the power is.
What most educators aren’t talking about is AI’s ability to analyze what you’ve already created - and then help you iterate to make it better.
Why This Analysis Matters
As teachers, we rarely get sophisticated, detailed feedback on our existing lessons. Maybe an administrator observes once a year. Maybe you collaborate with a colleague when you can both find the time. But, for most of us, the deep, rigorous pedagogical analysis - the kind that applies specific learning frameworks, catches gaps you’re too close to see, and asks uncomfortable questions about alignment - doesn’t happen nearly as often as it should.
The Tool: A Custom Claude Skill
I’ve built a Claude Skill - a custom AI tool trained on the Understanding by Design framework - that functions like an expert with unlimited patience and deep pedagogical knowledge. It’s the inverse of the AI tutors everyone talks about for students. This is an AI graduate instructor for teachers.
And here’s the uncomfortable truth: when I used it to analyze just two lessons - activities students consistently enjoyed - it eviscerated them.
Not because the lessons were failures. Students were engaged, read primary sources, wrote reflections, and presented findings. But AI revealed gaps I couldn’t see on my own. The activities were fun, but the learning outcomes were vague. The design was thoughtful, but the transfer goals were unclear.
What follows are two examples of what happened when I subjected two current lessons to this analysis. Both are activities I was proud of. Both needed more work than I realized. And both are now significantly better because the tool helped me see gaps I’d missed and pushed me to clarify what I actually wanted students to learn.
This is what embracing uncertainty looks like in practice - using AI to reveal blind spots rather than replace judgment.
Example 1: The Mercantilism Simulation Game
I was confident this lesson worked. Students were engaged. But engagement isn’t the same as learning - and I needed help seeing the difference.
Last fall, I designed a mercantilism simulation game for my American History class. Students played colonists navigating British trade laws and I created an activity that incorporated rolling dice, historical role-play, and “gold bullion” coins representing profits and taxes. I’ve used historical simulations for years, and I love the challenge of finding new ways to make complex concepts accessible. The game was chaotic, but the students were engaged. We moved on.
This year, reviewing the slide deck, something felt off. Where was the evidence students actually understood mercantilism’s core principles? Had I gotten so caught up in the game mechanics - the fun part - that I lost sight of what mattered?
Rookie mistake from a 30-year veteran.
I needed fresh eyes because I was too close to the assignment. I needed the kind of pedagogical analysis that applies specific frameworks rather than just gut reactions.
The challenge: how to get sophisticated curriculum feedback without spending hours starting over from scratch?
I uploaded everything I had for the mercantilism game - slides, handouts, colony scenarios, and game rules. The Skill’s job:
Comprehensive, clear, and actionable feedback on lesson plans, activities, assignments, and curricular materials to ensure learning goals and outcomes are met. Use when reviewing educational materials for pedagogical quality, alignment, and effectiveness. Applies Understanding by Design framework.1
Claude’s Feedback
The Skill identified the issue immediately:
Areas for Growth
1. Learning Goals & Transfer (UbD Stage 1)
Observation: The game rules focus entirely on mechanics (roles, scenarios, dice, bullion distribution) without articulating what students should understand by the end. What’s the transfer goal here? What should students be able to do with their understanding of mercantilism after this experience?
Why it matters: Without clear learning goals, the game risks becoming “hands-on without minds-on”—engaging but not necessarily developing the understanding you need for later units. Students might enjoy the competition and learn the rules without grasping why mercantilism created colonial resentment or how economic systems shape political conflict—the crucial insights for understanding pre-Revolutionary tensions. [emphasis added]
Ouch. Claude articulated exactly what I’d sensed but hadn’t pinned down.
The Iteration
Over the next 45 minutes, I worked with Claude to rebuild the game around explicit learning goals. The conversation was iterative: I’d propose changes, Claude pushed back, forcing me to focus on what students were actually learning, not just what they were doing.
Claude emphasized making the economic constraints of British policies feel personally unfair to the students role-playing colonial merchants. This raised the stakes from merely describing colonial resentment to letting students actually experience it. That shift reframed the entire game.
From there, Claude helped me generate revised materials as complete artifacts:
New colony cards with detailed economic profiles
Pre- and post-activity reflection questions tied to learning goals
A one-page factsheet on mercantilism’s core principles
A 10-slide deck matching the scenarios and options to the student handouts
Everything came bundled in a zip file with all the documents fully editable and ready to modify as needed.
The backward design work I should have done from the start - anchoring the activity in clear learning outcomes - took 45 minutes in structured conversation. Another 20 minutes, and the new version of the game was classroom-ready.
The Results
I ran the revised game last week and the difference was stark.
Last year, students clearly enjoyed the activity - when many of my current juniors saw the sophomores playing Mercantilism, they remembered it fondly.
But this year, the student exit tickets showed much more specific understanding. Sample responses:2
England gained more from trade with the colonies than the colonies gained from trade with England, which pushed colonies towards the foreign markets, trapping them between the consequences for smuggling or the unfair dealings with England. No matter what the colonies did, England would still gain, while making it unprofitable to run a business without smuggling, which explains the feelings of colonial resentment.”
The system being designed for England to win tells us a lot about colonial resentment. It shows that the colonies were really just a source of wealth for England, rather than a supportive territory. Heavy punishments for the colonies also added to colonial resentment, as constricting policies created by England meant less freedom for the settlers.
That was the transfer I was aiming for when I first imagined the activity but hadn’t thought through rigorously enough the first time around.
What I Learned
Looking back on this process, what strikes me isn’t just the improved student outcomes but the quality of thinking the AI enabled.
So much of the conversation around AI focuses on efficiency or generation. But this wasn’t about saving time or producing content from scratch. This was about applying rigorous analysis to work I struggled to evaluate objectively on my own.
The Skill, trained on the UbD framework, revealed structural flaws I’d missed and prompted redesign approaches I likely wouldn’t have considered otherwise.
This felt like intellectual partnership, not cognitive offloading. I was fully immersed in the process, questioning, defending, and re-evaluating my design choices, forced to think carefully about learning outcomes. These are things teachers do every day, but not always with the depth, rigor, and detail we aspire to.
All teachers have activities and lessons we love and we think students love as well. But how often do we examine them under the microscope of rigorous pedagogical frameworks? How confident are we that lessons we’ve used for years accomplish what we hope they do?
The uncomfortable answer I got from this exercise? Probably not as confident as we should be.
Example 2: The Proclamation of 1763 Debate
If I had any doubts that the Claude Curriculum feedback skill could probe for weakness in any assignment, it was dispelled once I subjected another assignment to its analysis.
As a debate coach, I’ve used classroom debates for years. Students look forward to them and are genuinely engaged in preparation and argumentation. But when I looked closely at student reflections from my Proclamation of 1763 debate - an activity I’d devised over the past few years - I realized they were mostly evaluating debate performance, not demonstrating historical understanding.
The setup was straightforward: students researched the Proclamation Line of 1763, the British decision after the French and Indian War to limit colonial settlement westward. They were assigned sides (British or Colonial), prepared arguments using primary sources and textbook excerpts, and debated whether the policy was justified. The debates themselves were always energetic.
But I was never satisfied with the quality of their arguments. More troubling: their post-debate reflections revealed students were judging “who won” based on debate performance with flimsy analysis and little detail - rather than engaging with the historical complexity of the moment.
Again, I could benefit from an outside perspective. Was the problem the resources? The scaffolding? Or something deeper about the way I structured the assignment itself?
I uploaded the assignment, resources, sample student responses, and asked the Curriculum Review Skill for analysis.
Claude’s Feedback
The diagnosis was brutal and immediate:
The debate format is eating your learning goals for breakfast.
What’s Actually Happening (Diagnosis)
Looking at these student samples, I can see:
All the reflections evaluate debate performance, not historical understanding:
“British side won because they were more structured”
“Colonists won because most points made by the British were well refuted”
“I sided with pro because they connected arguments to how they defined the terms of the debate”
What’s completely absent:
Any recognition that this moment is genuinely complex
Any sense that BOTH sides had legitimate concerns shaped by their circumstances
Native Americans as anything other than debate props
Acknowledgment that “winning” the debate doesn’t resolve the historical question
The Core Problem (UbD Analysis)
Your Stage 1 (unstated) goal seems to be: Students understand that the Proclamation emerged from genuinely competing interests—British fiscal crisis, colonial expectations, Native sovereignty—and that this clash of legitimate perspectives is what makes history messy and ultimately led to revolution.
But your Stage 2 assessment measures: Who gave better speeches and refuted opponents more effectively.
These are completely misaligned. Students can “win” your assessment without any historical understanding whatsoever. In fact, the assessment structure actively prevents the understanding you want by forcing students into advocacy roles.
This one hit hard. I’m a debate coach. I value the skills debate teaches. But Claude was right: the binary format - British vs. Colonists, justified vs. unjustified - was preventing the very understanding I wanted students to develop. My role as coach was getting in the way of helping students access the historical complexity of the moment.
The Iteration
The conversation that followed forced me to articulate what I actually wanted: students understanding that all sides had legitimate but incompatible interests, and that this clash - with Native sovereignty ignored on both sides of the Atlantic - made conflict inevitable and laid the groundwork for increasing tensions.
The debate format made that impossible. Students were performing advocacy, not analyzing complexity.
Claude pushed: What if we change what they’re debating? Instead of ‘was it justified,’ what if they debate ‘how did the Crown balance these competing interests?’
I restructured the entire activity around four stakeholder groups rather than two sides:
British Imperial Officials (Treasury & Military facing post-war debt)
Colonial Settlers (Frontier families and veterans expecting western land)
Land Speculators (Ohio Company investors like George Washington)
Native Nations (Ohio Valley tribes seeking sovereignty)
Now students couldn’t “win” by being more persuasive. They had to understand and represent one perspective while recognizing the legitimacy of three others. Native Americans weren’t afterthoughts but a full stakeholder group with distinct interests.
Over the next hour, I completely rebuilt the activity:
Class 1:
Before receiving stakeholder briefings, students predict what each of the four groups wants and fears (primes thinking about all perspectives)
Groups receive stakeholder briefings and prepare speeches
Exit ticket: Complete a metacognitive worksheet requiring them to synthesize their arguments, cite evidence, anticipate opposition, and reflect on competing interests
Class 2:
Four-way “Royal Council” presentation to King George III
Q&A where groups must respond to challenges from other perspectives
Individual reflection: “Given what you heard, what should the King decide and why? What trade-offs are unavoidable?”
The assessment shifted from “who won the debate?” to:
Why was the conflict over land nearly impossible to resolve fairly through the Proclamation of 1763?
Again, at the conclusion of the back and forth, once I was satisfied the exercise was designed much more effectively, Claude generated a zip file with all the revised documents needed to run the activity which I edited, fine-tuned, and fact-checked.
What Changed (And Why It Matters)
The redesigned activity worked exactly as intended - students were invested in their stakeholder positions, better articulated their own interests and understood competing points of view, asked much better questions during the town hall format, and the final reflections used source materials and referenced historical events far more carefully to answer the essential question of the unit (altered from the original question of whether the Proclamation was “justified”).
The original debate format let students succeed by cherry-picking evidence and was focused on “winning” the debate. The new format required them to:
Understand their own stakeholder’s interests in context
Recognize that other groups have equally legitimate concerns
Grapple with the impossibility of satisfying everyone
See why this moment set the stage for future conflict
It was the same content, same historical event, and involved most of the same resources.3 But the new structure made it impossible to succeed without the critical understandings I wanted all along.
The original activity’s real weakness - now obvious in retrospect - was treating Native American sovereignty as a footnote rather than critical to understanding why the Proclamation failed. The four-stakeholder format specifically addressed that gap.
The Uncomfortable Pattern
Twice now in a single semester, I’ve subjected activities I found valuable to a much more methodical and rigorous pedagogical analysis. Twice, the Skill identified fundamental structural problems I failed to see on my own when I initially designed them.
The mercantilism game was engaging but pedagogically fuzzy. The Proclamation debate was dynamic but structurally misaligned. Both worked in the sense that students participated and enjoyed them. Neither targeted precisely what I really wanted them to learn. Did students benefit? Probably. But if they truly grasped my learning goals, it was more a product of luck than curricular design.
Now that I know this pattern exists, it will be impossible not to engage in the same process for past activities I’ve run before to find and plug holes, be forced to defend pedagogical choices, and, just in general, make assignments I’ve already thoughtfully created, even better.
Conclusion
After 30 years of teaching, I gave that graduation speech admitting up front that teachers don’t always know what’s working. The kicker to that paragraph was its final line which got laughs I wasn’t expecting:
Being selected for this award is confirmation that, for at least some of you, for some of the time, and in some circumstances, something I’ve done must have worked.
With the introduction of powerful AI, that uncertainty hasn’t disappeared. It’s only gotten more complicated. Over the past year, something has shifted in how I think about these tools.
From Defense to Development
In my first year with AI, I was red-teaming my assignments from the student perspective - running every essay prompt through ChatGPT to see if students could game the system. That was defensive, focused on protecting my assignments from students.
Now I’m red-teaming from the teaching perspective, using AI to audit whether my lessons accomplish what they’re designed to do. That’s constructive - improving assignments for students and reflecting on my own practices.
The uncomfortable truth suggests many of my lessons could benefit from this level of scrutiny. That’s disconcerting after three decades, but it’s also an opportunity. The Skill I built isn’t magic. Any teacher could create their own version using whatever pedagogical framework matters to them. The technology just makes systematic improvement possible at a scale that wasn’t available before.
This shift matters. Last week I wrote about AI whiplash, about how every breakthrough comes with a price, and how we need a third conversation that holds both realities at once. This post tries to model what that conversation looks like in practice. Is it possible to use AI ethically yet still remain conscious of the tradeoffs? I think it is.
These aren’t prescriptions. You don’t need to build a Claude Skill or audit your lessons. What I’m offering here is documentation. It’s one teacher’s attempt to use AI deliberately, try to use the tools productively, and assess the results honestly.
What Remains Uncertain
I occupy an uncomfortable position. I have deep skepticism about edtech hype offset with enough technical fluency to build sophisticated AI tools. That combination lets me see both what’s possible and what’s problematic.
This piece demonstrates clear value on the teacher side. Using AI to audit curriculum design and reveal blind spots feels both defensible and ethical.
Student-facing integration is much more vexing and context-dependent. In most classes, I remain ambivalent about deliberately integrating AI tools into my teaching, not because students aren’t already using them - we know they are - but because I’m not yet convinced about its pedagogical role in the classroom. That’s the much harder problem, and one I’m still wrestling with.
Uncertainty isn’t going away. A willingness to be wrong is essential to the current moment. Even after 30 years, I’m still figuring out what works, just with better questions to ask and better tools to answer them.
All student responses were anonymized before analysis.
One more benefit - Claude helped me design a Deep Research prompt to locate several additional primary sources to provide for each group during the simulation.





Excelent post! I wonder how could I learned what you just explained.. how did you learn to use Claude skills? Would you like to share the prompt for creating the skill? And the workflow as well: did you feed Claude with the transcript of your class or just the idea/ planning?
Thank you for this. You articulated what I have been trying to share with teachers about some effective ways of using AI.