I attempted to learn from AI tutors. I hope the test is graded leniently.

Testing Out AI-Powered Homework Helpers

As AI becomes increasingly integrated into education, with major companies like Google, OpenAI, and Microsoft investing in AI tutors for students, I decided to test out the latest cohort of tutor bots against standardized testing. Despite not being a math whiz myself, I was curious to see how these AI study buddies would perform.

Approaching My AI Study Buddies

I selected questions from a variety of sources, including humanities subjects like reading comprehension and art history, to challenge the AI tutors. I wanted to see how they would handle topics that are typically considered more nuanced and subjective. One essay prompt, inspired by Richard Rothstein’s “The Color of Law,” aimed to test how AI tutors would respond to discussions about institutionalized segregation and “Woke AI.”

To ensure a fair evaluation, I initiated each conversation with a basic request for homework assistance without disclosing specific details about my academic background. I followed the chatbot’s responses as a typical student would, only redirecting the conversation when necessary.

A Note on Building and Testing AI Tutors

According to Hamsa Bastani, an associate professor at the University of Pennsylvania, understanding the behavior of the average student is crucial in assessing the effectiveness of AI tutors. Previous studies have shown that highly motivated students may benefit from using such technology, but this often represents only a small percentage of the student population. The challenge lies in designing AI tools that can cater to the needs of all learners, not just the top achievers.

Bastani co-authored a highly cited study on the potential harm AI chatbots pose to learning outcomes. Her team found similar results to pre-generative AI studies. “The really good students, they can use it, and sometimes they even improve. But for the majority of students, their goal is to complete the assignment, so they really don’t benefit.” Bastani’s team built their AI learning tool using GPT-4, loaded with 59 questions and learning prompts designed by a hired teacher who showed how she would help students through common mistakes. They found that even for AI-assisted students who reported much more effective studying experiences than those doing self-study, few performed better than traditional learners on exams without AI help.

Information by itself isn’t enough.

Across the board, Bastani says she has yet to come across an “actually good” generative AI chatbot built for learning. Most studies done are negative or negligible in terms of learning improvement. The science just doesn’t seem to be there yet. In most cases, turning an existing model into an AI tutor involves simply feeding it an extra long prompt in the back-end to ensure it doesn’t spit out an answer right away or that it mimics the cadence of an educator, as Bastani pointed out. Arena from McGraw Hill used a metaphor to describe AI companies as trying to retrofit a 21st-century motor for everyday use, like a hemi engine with a sewing machine stuck to it.

ChatGPT: A grade point maximizer

I’m starting with the big man in the room: ChatGPT’s Study Mode, which I ran on GPT-5 using a standard, free account. Users can turn on Study Mode by clicking the “plus” sign at the bottom of the chatbox. The company announced the new feature in July, saying it was designed to “guide students towards using AI in ways that encourage true, deeper learning.” The first prompt I threw at this go-to bot was a screenshot of a polynomial long division problem from the Algebra II section of the New York State Regents Exam. ChatGPT quickly recognized the problem, asking if I needed a walk-through. What followed was a step-by-step explanation, although with a lot of hand-holding. If I answered correctly, my tutor continued in good form; if not, I was quickly given the answer. The chat ended when I reached my free daily limit.

Next, I tested the bot with a question on ecology from the 2024 AP Biology exam. ChatGPT guided me through what could be on the exam and encouraged me to do a “quick warm-up” by asking about big ideas related to ecology.
The chatbot threw at me its own broad, subject-based short answer questions without even giving it my own test yet. By the time we reached the practice testing phase where I could share my own questions, I was already over it.

### ChatGPT’s Test Prep Capabilities
During the session, ChatGPT seemed to know exactly what I needed for practicing for the Regents Exam English Language Arts section, providing multiple choice and free response questions on author Ted Chiang’s short story, “The Great Silence.” It claimed to use the exact Regents benchmarks and formula to ensure the “best” response, raising hopes for those studying for standardized tests.

### A Mixed Bag Experience
However, the chatbot quickly reverted to its own ways, focusing on central points, themes, and author’s arguments. It insisted on practicing with its own multiple choice and free response questions, rather than addressing the questions I brought. Despite giving what it deemed a “strong draft” response, it missed the mark on providing accurate feedback for the AP Art History test.

Could ChatGPT truly help me master the ultra-subjective AP Art History exam? Despite its eagerness to help craft essays on various topics, I found myself constantly redirecting it back to my specific needs and responses. Although it could generate outlines and citations effortlessly, its focus on grammar and flow detracted from the test’s scoring metrics. In the end, I had to remind ChatGPT to focus on my needs and responses for effective practice.

ChatGPT Study Mode: Pros and Cons

Pros: ChatGPT Study Mode offers succinct interactions and a minimalist user experience, making it easier to process what you are learning. It is especially useful for practice tests, quick overviews, and learners seeking clarification on rubrics and grading standards.

Cons: However, one major drawback is that the chatbot tends to give answers unprompted, acting as a “cheater.” It also fails to allow users to correct mistakes before moving on to the next step. This can be frustrating, particularly when answering free response-style questions, as the chatbot is fixated on making users practice and perfect what they have just learned.