One of the most powerful current use cases for AI is unsurprisingly, also one of the most compute intensive1. When I learned about the Deep Research models released by Gemini (with a paid account), OpenAI (ditto), as well as Perplexity and Grok’s versions (both currently free but limited), I could see the direction we were headed2.
A number of AI articles that kicked off the New Year claimed that 2025 was going to be the year of “agentic” AI, the ability for AI agents to complete multi-step tasks through simple prompts. Deep Research was the first obvious move in this direction. If you haven’t seen the results of a Deep Research query, I encourage you to try out Perplexity (just make sure Deep Research is selected before entering your prompt). For the free user, Perplexity permits up to 3 “enhanced queries” per day.
What differentiates a Deep Research response from an ordinary one is the depth, comprehensiveness, and accuracy of the report, at least compared to a typical AI response. The formats, styles, and overall quality of the outputs differ slightly from model to model, but the end result is similar - a final document which gives you a detailed overview of an issue, topic, question, or problem, drawn from a thorough searching of the web with citations. If your prompt is limited, the results will be as well.
The inclusion of citations is crucial. Now, you can quickly verify whether a point, argument or detail located by the model and included in the report is based on something real or made up.
You will rarely read a description of AI from those critical of the technology that does not include the fact that AI “hallucinates.” That is undeniable.
But in my use of AI over the past two years, the rates of hallucinations have gone down considerably, especially with respect to the kinds of responses that draw on advanced reasoning models like the ones used to conduct Deep Research3. It does not mean it never happens and, like any vigilant AI user, you need to double check information, especially if it feels off or strange in some way. I don’t know if AI will ever get to the point where “hallucination” rates approach 0%. We’re clearly not there yet.
For skeptics, this fact alone is enough to cry foul - if we can’t be certain of the information given to us by LLM models, then AI is unreliable and it’s another nail in the coffin as to why AI is overhyped. It is often Exhibit A for the “resistance” crowd.
While I understand this perspective, I think this view is misguided (at least as far as the argument about “hallucinations” - there are plenty of other reasons to oppose AI in schools that are more compelling) and my experience with Deep Research bears out why.
In my reading about AI, particularly in posts by those who are most opposed, it often feels as if the goal posts and standards for AI are significantly higher than they are for other platforms. I often ask myself prior to using AI for a given research task whether or not I could get the information I need by first checking on Wikipedia (or the web in general … which, ironically, now that Google has folded Gemini into its search function, returns an AI response!) .
In other words, is there something special about the information I am currently looking for that requires the use of a dedicated LLM rather than just going to the internet’s leading free Encyclopedia to start? And then doing more traditional research as I would have prior to the availability of AI?4
It’s worth remembering that when Wikipedia was first introduced it was shunned for many of the same reasons as AI - it could not be relied on for accurate information. Over time, it turned out, that allowing for human editors to maintain and update its entries in real time resulted in better and more accurate results than many online or for-profit encyclopedias.
Today, I think it’s fair to say that, while teachers and professors frown on Wikipedia citations in a formal research paper, it’s not necessarily because we doubt the accuracy of the information. Rather, we want students to push the boundaries of their knowledge beyond the basics. We don’t dislike Wikipedia because it’s “bad.” We dislike Wikipedia because it’s lazy. It does not represent the best scholarship on the topic.
But as a primer for background and context when someone is first getting oriented to their topic, there is nothing wrong with going to the Wikipedia page to read and take notes. We don’t fear that you will come across “hallucinated” information that will necessarily steer you in the wrong direction. Often, the citations on the Wikipedia page are more valuable than the overall entry itself.
Which brings me back to Deep Research. My current analogy is that a well-crafted Deep Research prompt produces a unique Wikipedia page for any kind of question you can possibly think of and with citations custom-made for your specific question. Is there a risk that a piece of information in the report might be “hallucinated”? Just like with a Wikipedia entry, there is, but the possibility of one is simply not enough to tilt the scales against the general usefulness of the report overall, especially when you can quickly check any specific piece of information against the source provided.
What’s far more important is the quality of the prompt - if you ask Deep Research to prioritize certain kinds of sources and generate the output in a certain style or format, you can significantly improve the quality of information and shape the report to your needs. If you want it to provide an annotated bibliography, it can do that. If you want a “fact sheet” for some topic or issue with a thorough overview of both sides, it can do that as well. The possibilities are limited only to how well the user is capable of describing what they are looking for.5
A lot of the reviews and use cases of Deep Research reports are business related. I haven’t seen nearly as many for academic subjects, but I have created many of them myself and they are all a great starting point for virtually any research question.6
There are also specialized scholarly sites that focus on academic journals and other more vetted source material that will produce Deep Research reports with reference to the most relevant papers and articles. Links will also be provided though many of the citations may be paywalled or otherwise unavailable on the web.
All told, a student who creates a thoughtful prompt on their topic can produce multiple reports (for free!) in about 15 minutes. In an ideal world, a student has 2 or 3 documents serving as a primer for their topic and dozens of targeted and specific citations to pursue further avenues of research.
I have no idea how many students are aware of these tools or using them on a regular basis, but this is all possible now. A half hour of coming up with a good prompt and using Deep Research will yield a treasure trove of information in a fraction of the time it would take using more traditional methods.
However, I’m really struggling to determine if this is a good thing for students in the early phase of learning how to research. Is this cheating? And if not in the ordinary sense, is it cheating them of a learning opportunity?
I am torn on this question for several reasons.
On the one hand, I’m really struggling to think of reasons why to tell students they are not “allowed” to do this when it is clear it will be the future of research. Anyone doing any kind of knowledge work today is taking advantage of these models. There is a reason they are so popular and straining demand on the servers and systems that undergird everything connected to AI. And they will get better and better. Just look at the use case examples for tools like Manus AI which will be the next wave in the AI frontier. Learning how to use AI agents effectively, of which Deep Research models are early prototypes, seems like a reasonable goal.
But there is also a strong argument that the traditional method of manually searching databases, vetting each source as you go, painstakingly reviewing each one and adding it to your understanding of the topic, and building up your knowledge piece by piece, is worthwhile and important. As a bibliophile, I know the pleasure of finding just the right book, article, or journal and settling down to read deeply in a cozy corner. Allowing AI to short circuit that initial phase feels wrong on some level.
And my very real fear is that most students will stop with the Deep Research report and claim to be “done” with their research. Many won’t open each link, evaluate its quality, and do the deep thinking and decide for themselves how useful it may be for their overall research paper.7 Instead of a launching pad, the Deep Research report may become the final destination - the end product rather than the beginning of genuine intellectual engagement.
But we as teachers can fix that. There is a way to do both. Where I have currently landed at the moment is to demonstrate how these Deep Research models work. I plan to show them examples of thoughtful prompts that can excavate information related to their topic, and teach them how to craft requests asking the AI to prioritize primary sources, archival databases, and other legitimate and well-regarded documents and websites and how to evaluate them. Of course, no student will be required or even expected to use them, but if they do, I want it to be fully out in the open.
Most importantly, I intend to explain my fears about their use of Deep Research models and its potential for entirely outsourcing the process of research. I will stress the importance of reviewing each citation carefully and fully engaging with the best ones, and then using the information as a jumping off point to further conduct their research to pursue lines of inquiry generated by their own thinking. I will assign a research log requiring them to keep track of any instance in which they use AI to assist with the research process and provide links to chats and any reports created with AI.
I have no idea whether this will work. If students are going to use Deep Research, and they will, I would rather have it be under my guidance with transparency, than be on the lookout and play policeman for any AI work. It may result in “better” and more thoroughly researched papers but less learning. I just don’t know.
What I do know is that AI tools continue to challenge our fundamental assessment practices. If the end product can be partially automated, we need to shift our focus to evaluating the process itself. Perhaps the research log becomes as important as the final paper, or maybe we need to emphasize in-class activities where students demonstrate their understanding through discussion and debate rather than solely through written work. There is still a place for a unique, polished, and deeply researched paper, but we need to think hard about how we want students to get there.
I know some teachers may take the opposite approach. Most are probably not even aware of what these models are capable of. But I do know I can’t ignore their existence. This is the world in which we live whether we like it or not. I see great potential but only if we hew to our mission to teach to the best of our ability which includes learning about AI.
All of the Deep Models have limits to the number of reports a single user can issue per day. As of this writing, OpenAI’s and Gemini’s are limited to paid users only. Perplexity, Grok, Elicit, Consensus, and Semantic Scholar will let you try them out in typical freemium packages. I am most familiar with OpenAI, Gemini, Perplexity, and Elicit though I’ve created some reports with the others as well. The strict limits, even with paid models, suggest the staggering amount of compute necessary to produce them - another feature of deep research is that the responses can take significantly longer than ordinary prompts. I’ve read that some can take an hour or longer but I have not experienced anything close to that. 10 - 15 minutes is the longest I’ve waited for an excellent report. DeepSeek will also allow you to use DeepThink (r1) which also uses reasoning models (one neat feature is the ability to watch, in real time, as the model walks itself through what it’s doing at any given moment) to produce an output similar to the others though I know a lot of folks are not comfortable with it.
I wrote about my astonishment with these models in a previous post. Today I want to dig in a little deeper. Others have written about Deep Research as well (Andrew Maynard has a terrific post about it when it debuted) but my focus is on 9-12 education.
One caveat - if you generally restrict using AI to areas in which you have some level of expertise, it’s much easier to sense whether or not AI is making something up. I need to create more reports on topics with which I am unfamiliar and do a very detailed analysis of the veracity of those citations. It may be the case the “hallucination” rates are still too high.
My use of Deep Research reports has been generally positive. They vary considerably depending on what you ask for, but deliver solid material from which to start a research project.
I often use Claude Projects, where I have a Prompt Optimizer function, to help build the exact specifications of the Deep Research query I use. The quality of the initial prompt dramatically affects the quality of the resulting research.
Here are sample responses from each model to the following research question: "To what extent did new military technologies used in the Mexican-American War (1846-1848) influence tactics and outcomes in the early years of the Civil War? Consider both successes and failures in your analysis." OpenAI, Google Gemini, Perplexity, Grok.
Also, Deep Research models are dependent on what’s freely accessible on the web, not behind paywalls, so Wikipedia will often be a source listed in their citations, not necessarily a bad thing, but limited. An advantage, however, is that the Deep Research report will extract the most relevant information from the page related to your question. There is a wonderful substack post by Faun Rice using a Deep Research model for a literature review and finding it seriously lacking. Andrew Maynard’s Future of Learning has an equally interesting post demonstrating how he used Deep Research to write a PhD level dissertation. I just feel like the writing is on the wall and we aren’t going to go back. It may not be able to do it yet but in 5 years or less?
Hello Steve, I want to thank you for this thoughtful and expansive essay. I am a retired AP History teacher and administrator. It has been interesting reading the veari9ous essays on AI, and seeing the many positions being taken on its use. I hope you have a productive rest of the year. All the best, Richard
I think the key here is what you encourage vis seeing the research log (though they could likely fake that too) and, crucially, including a live in-class examination/exercise to see if they actually did ANY research on their own. Otherwise, we won't necessarily know and they won't necessarily learn.