Artificial Intelligence Archives - The Hechinger Report https://hechingerreport.org/tags/artificial-intelligence/ Covering Innovation & Inequality in Education Wed, 10 Jul 2024 15:04:42 +0000 en-US hourly 1 https://hechingerreport.org/wp-content/uploads/2018/06/cropped-favicon-32x32.jpg Artificial Intelligence Archives - The Hechinger Report https://hechingerreport.org/tags/artificial-intelligence/ 32 32 138677242 What aspects of teaching should remain human? https://hechingerreport.org/what-aspects-of-teaching-should-remain-human/ https://hechingerreport.org/what-aspects-of-teaching-should-remain-human/#respond Wed, 10 Jul 2024 05:00:00 +0000 https://hechingerreport.org/?p=101861

ATLANTA — Science teacher Daniel Thompson circulated among his sixth graders at Ron Clark Academy on a recent spring morning, spot checking their work and leading them into discussions about the day’s lessons on weather and water. He had a helper: As Thompson paced around the class, peppering them with questions, he frequently turned to […]

The post What aspects of teaching should remain human? appeared first on The Hechinger Report.

]]>

ATLANTA — Science teacher Daniel Thompson circulated among his sixth graders at Ron Clark Academy on a recent spring morning, spot checking their work and leading them into discussions about the day’s lessons on weather and water. He had a helper: As Thompson paced around the class, peppering them with questions, he frequently turned to a voice-activated AI to summon apps and educational videos onto large-screen smartboards.

When a student asked, “Are there any animals that don’t need water?” Thompson put the question to the AI. Within seconds, an illustrated blurb about kangaroo rats appeared before the class.

Thompson’s voice-activated assistant is the brainchild of computer scientist Satya Nitta, who founded a company called Merlyn Mind after many years at IBM where he had tried, and failed, to create an AI tool that could teach students directly. The foundation of that earlier, ill-fated project was IBM Watson, the AI that famously crushed several “Jeopardy!” champions. Despite Watson’s gameshow success, however, it wasn’t much good at teaching students. After plowing five years and $100 million into the effort, the IBM team admitted defeat in 2017.

“We realized the technology wasn’t there,” said Nitta, “and it’s still not there.”

Daniel Thompson teaches science to middle schoolers at Ron Clark Academy, in Atlanta. Credit: Chris Berdik for The Hechinger Report

Since the November 2022 launch of OpenAI’s ChatGPT, an expanding cast of AI tutors and helpers have entered the learning landscape. Most of these tools are chatbots that tap large language models — or LLMs — trained on troves of data to understand student inquiries and respond conversationally with a range of flexible and targeted learning assistance. These bots can generate quizzes, summarize key points in a complex reading, offer step-by-step graphing of algebraic equations, or provide feedback on the first draft of an essay, among other tasks. Some tools are subject-specific, such as Writable and Photomath, while others offer more all-purpose tutoring, such as Socratic (created by Google) and Khanmigo, a collaboration of OpenAI and Khan Academy, a nonprofit provider of online lessons covering an array of academic subjects.

As AI tools proliferate and their capabilities keep improving, relatively few observers believe education can remain AI free. At the same time, even the staunchest techno optimists hesitate to say that teaching is best left to the bots. The debate is about the best mix — what are AI’s most effective roles in helping students learn, and what aspects of teaching should remain indelibly human no matter how powerful AI becomes?

Skepticism about AI’s place in the classroom often centers on students using the technology to cut corners or on AI’s tendency to hallucinate, i.e. make stuff up, in an eagerness to answer every query. The latter concern can be mitigated (albeit not eliminated) by programming bots to base responses on vetted curricular materials, among other steps. Less attention, however, is paid to an even thornier challenge for AI at the heart of effective teaching: engaging and motivating students.

Nitta said there’s something “deeply profound” about human communication that allows flesh-and-blood teachers to quickly spot and address things like confusion and flagging interest in real time.

He joins other experts in technology and education who believe AI’s best use is to augment and extend the reach of human teachers, a vision that takes different forms. For example, the goal of Merlyn Mind’s voice assistant is to make it easier for teachers to engage with students while also navigating apps and other digital teaching materials. Instead of      being stationed by the computer, they can move around the class and interact with students, even the ones hoping to disappear in the back.

Others in education are trying to achieve this vision by using AI to help train human tutors to have more productive student interactions, or by multiplying the number of students a human instructor can engage with by delegating specific tasks to AI that play to the technology’s strengths. Ultimately, these experts envision a partnership in which AI is not called on to be a teacher but to supercharge the power of humans already doing the job.

Related: Become a lifelong learner. Subscribe to our free weekly newsletter to receive our comprehensive reporting directly in your inbox.

Merlyn Mind’s AI assistant, Origin, was piloted by thousands of teachers nationwide this past school year, including Thompson and three other teachers at the Ron Clark Academy. The South Atlanta private school, where tuition is heavily subsidized for a majority low-income student body, is in a brick warehouse renovated to look like a low-slung Hogwarts, replete with an elaborate clocktower and a winged dragon perched above the main entrance.

As Thompson moved among his students, he wielded a slim remote control with a button-activated microphone he uses to command the AI software. At first, Thompson told the AI to start a three-minute timer that popped up on the smartboard, then he began asking rapid-fire review questions from a previous lesson, such as what causes wind. When students couldn’t remember the details, Thompson asked the AI to display an illustration of airflow caused by uneven heating of the Earth’s surface.

The voice-activated AI assistant by Merlyn Mind is designed to help teachers navigate apps and materials on their computer while moving around the classroom, interacting with students. Credit: Chris Berdik for The Hechinger Report

At one point, he clambered up on a student worktable while discussing the stratosphere, claiming (inaccurately) that it was the atmospheric layer where most weather happens, just to see if any students caught his mistake (several students reminded him that weather happens in the troposphere). Then he conjured a new timer and launched into a lesson on water by asking the AI assistant to find a short educational movie about fresh and saltwater ecosystems. As Thompson moved through the class, he occasionally paused the video and quizzed students about the new content.

Study after study has shown the importance of student engagement for academic success. A strong connection between teachers and students is especially important when learners feel challenged or discouraged, according to Nitta. While AI has many strengths, he said, “it’s not very good at motivating you to keep doing something you’re not very interested in doing.”

“The elephant in the room with all these chatbots is how long will anyone engage with them?” he said.

The answer for Watson was not long at all, Nitta recalled. In trial runs, some students just ignored Watson’s attempts to probe their understanding of a topic, and the engagement level of those who initially did respond to the bot dropped off precipitously. Despite all Watson’s knowledge and facility with natural language, students just weren’t interested in chatting with it.

Related: PROOF POINTS: AI essay grading is ‘already as good as an overburdened’ teacher, but researchers say it needs more work

At a spring 2023 TED talk shortly after launching Khanmigo, Sal Khan, founder and CEO of Khan Academy, pointed out that tutoring has provided some of the biggest jolts to student performance among studied education interventions. But, there aren’t enough human tutors available nor enough money to pay for them, especially in the wake of pandemic-induced learning loss.

Khan envisioned a world where AI tutors filled that gap. “We’re at the cusp of using AI for probably the biggest positive transformation that education has ever seen,” he declared. “And the way we’re going to do that is by giving every student on the planet an artificially intelligent but amazing personal tutor.”

One of Khanmigo’s architects, Khan Academy’s chief learning officer, Kristen DiCerbo, was the vice president of learning research and design for education publisher Pearson in 2016 when it partnered with IBM on the Watson tutor project.

“It was a different technology,” said DiCerbo, recalling the laborious task of scripting Watson’s responses to students.

The Ron Clark Academy, in Atlanta, piloted a voice-activated teaching assistant this school year. Credit: Chris Berdik for The Hechinger Report

Since Watson’s heyday, AI has become a lot more engaging. One of the breakthroughs of generative AI powered by LLMs is its ability to give unscripted, human-like responses to user prompts.

To spur engagement, Khanmigo doesn’t answer student questions directly, but starts with questions of its own, such as asking if the student has any ideas about how to find an answer. Then it guides them to a solution, step by step, with hints and encouragement (a positive tone is assured by its programmers). Another feature for stoking engagement allows students to ask the bot to assume the identity of historical or literary figures for chats about their life and times. Teachers, meanwhile, can tap the bot for help planning lessons and formulating assessments. 

Notwithstanding Khan’s expansive vision of “amazing” personal tutors for every student on the planet, DiCerbo assigns Khanmigo a more limited teaching role. When students are working independently on a skill or concept but get hung up or caught in a cognitive rut, she said, “we want to help students get unstuck.”

Some 100,000 students and teachers piloted Khanmigo this past academic year in schools nationwide, helping to flag any hallucinations the bot makes and providing tons of student-bot conversations for DiCerbo and her team to analyze.

“We look for things like summarizing, providing hints and encouraging,” she explained. “Does [Khanmigo] do the motivational things that human tutors do?”

The degree to which Khanmigo has closed AI’s engagement gap is not yet known. Khan Academy plans to release some summary data on student-bot interactions later this summer, according to DiCerbo. Plans for third-party researchers to assess the tutor’s impact on learning will take longer.

Nevertheless, many tutoring experts stress the importance of building a strong relationship between tutors and students to achieve significant learning boosts. “If a student is not motivated, or if they don’t see themselves as a math person, then they’re not going to have a deep conversation with an AI bot,” said Brent Milne, the vice president of product research and development at Saga Education, a nonprofit provider of in-person tutoring.

Since 2021, Saga has been a partner in the Personalized Learning Initiative (PLI), run by the University of Chicago’s Education Lab, to help scale high-dosage tutoring — generally defined as one-on-one or small group sessions for at least 30 minutes every day. The PLI team sees a big and growing role for AI in tutoring, one that augments but doesn’t replicate human efforts.

For instance, Saga has been experimenting with AI feedback to help tutors better engage and motivate students. Working with researchers from the University of Memphis and the University of Colorado, the Saga team fed transcripts of their math tutoring sessions into an AI model trained to recognize when the tutor was prompting students to explain their reasoning, refine their answers or initiate a deeper discussion. The AI analyzed how often each tutor took these steps.  

When Saga piloted this AI tool in 2023, the nonprofit provided the feedback to their tutor coaches, who worked with four to eight tutors each. Tracking some 2,300 tutoring sessions over several weeks, they found that tutors whose coaches used the AI feedback peppered their sessions with significantly more of these prompts to encourage student engagement.

While Saga is looking into having AI deliver some feedback directly to tutors, it’s doing so cautiously, because, according to Milne, “having a human coach in the loop is really valuable to us.”

Related: How AI could transform the way schools test kids

In addition to using AI to help train tutors, the Saga team wondered if they could offload certain tutor tasks to a machine without compromising the strong relationship between tutors and students. Specifically, they understood that tutoring sessions were typically a mix of teaching concepts and practicing them, according to Milne. A tutor might spend some time explaining the why and how of factoring algebraic equations, for example, and then guide a student through practice problems. But what if the tutor could delegate the latter task to AI, which excels at providing precisely targeted adaptive practice problems and hints?

The Saga team tested the idea in their algebra tutoring sessions during the 2023-24 school year. They found that students who were tutored daily in a group of two had about the same gains in math scores as students who were tutored in a group of four with assistance from ALEKS, an AI-powered learning software by McGraw Hill. In the group of four, two students worked directly with the tutor and two with the AI, switching each day. In other words, the AI assistance effectively doubled the reach of the tutor.

Experts expect that AI’s role in education is bound to grow, and its interactions will continue to seem more and more human. Earlier this year, OpenAI and the startup Hume AI separately launched “emotionally intelligent” AI that analyzes tone of voice and facial expressions to infer a user’s mood and respond with calibrated “empathy.” Nevertheless, even emotionally intelligent AI will likely fall short on the student engagement front, according to Brown University computer science professor Michael Littman, who is also the National Science Foundation’s division director for information and intelligent systems.

No matter how human-like the conversation, he said, students understand at a fundamental level that AI doesn’t really care about them, what they have to say in their writing or whether they pass or fail algebra. In turn, students will never really care about the bot and what it thinks. A June study in the journal “Learning and Instruction” found that AI can already provide decent feedback on student essays. What is not clear is whether student writers will put in care and effort — rather than offloading the task to a bot — if AI becomes the primary audience for their work. 

“There’s incredible value in the human relationship component of learning,” Littman said, “and when you just take humans out of the equation, something is lost.”

This story about AI tutors was produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for the Hechinger newsletter.

The post What aspects of teaching should remain human? appeared first on The Hechinger Report.

]]>
https://hechingerreport.org/what-aspects-of-teaching-should-remain-human/feed/ 0 101861
PROOF POINTS: Asian American students lose more points in an AI essay grading study — but researchers don’t know why https://hechingerreport.org/proof-points-asian-american-ai-bias/ https://hechingerreport.org/proof-points-asian-american-ai-bias/#comments Mon, 08 Jul 2024 10:00:00 +0000 https://hechingerreport.org/?p=101830 global online academy

When ChatGPT was released to the public in November 2022, advocates and watchdogs warned about the potential for racial bias. The new large language model was created by harvesting 300 billion words from books, articles and online writing, which include racist falsehoods and reflect writers’ implicit biases. Biased training data is likely to generate biased […]

The post PROOF POINTS: Asian American students lose more points in an AI essay grading study — but researchers don’t know why appeared first on The Hechinger Report.

]]>
global online academy

When ChatGPT was released to the public in November 2022, advocates and watchdogs warned about the potential for racial bias. The new large language model was created by harvesting 300 billion words from books, articles and online writing, which include racist falsehoods and reflect writers’ implicit biases. Biased training data is likely to generate biased advice, answers and essays. Garbage in, garbage out. 

Researchers are starting to document how AI bias manifests in unexpected ways. Inside the research and development arm of the giant testing organization ETS, which administers the SAT, a pair of investigators pitted man against machine in evaluating more than 13,000 essays written by students in grades 8 to 12. They discovered that the AI model that powers ChatGPT penalized Asian American students more than other races and ethnicities in grading the essays. This was purely a research exercise and these essays and machine scores weren’t used in any of ETS’s assessments. But the organization shared its analysis with me to warn schools and teachers about the potential for racial bias when using ChatGPT or other AI apps in the classroom.

AI and humans scored essays differently by race and ethnicity

“Diff” is the difference between the average score given by humans and GPT-4o in this experiment. “Adj. Diff” adjusts this raw number for the randomness of human ratings. Source: Table from Matt Johnson & Mo Zhang “Using GPT-4o to Score Persuade 2.0 Independent Items” ETS (June 2024 draft)

“Take a little bit of caution and do some evaluation of the scores before presenting them to students,” said Mo Zhang, one of the ETS researchers who conducted the analysis. “There are methods for doing this and you don’t want to take people who specialize in educational measurement out of the equation.”

That might sound self-serving for an employee of a company that specializes in educational measurement. But Zhang’s advice is worth heeding in the excitement to try new AI technology. There are potential dangers as teachers save time by offloading grading work to a robot.

In ETS’s analysis, Zhang and her colleague Matt Johnson fed 13,121 essays into one of the latest versions of the AI model that powers ChatGPT, called GPT 4 Omni or simply GPT-4o. (This version was added to ChatGPT in May 2024, but when the researchers conducted this experiment they used the latest AI model through a different portal.)  

A little background about this large bundle of essays: students across the nation had originally written these essays between 2015 and 2019 as part of state standardized exams or classroom assessments. Their assignment had been to write an argumentative essay, such as “Should students be allowed to use cell phones in school?” The essays were collected to help scientists develop and test automated writing evaluation.

Each of the essays had been graded by expert raters of writing on a 1-to-6 point scale with 6 being the highest score. ETS asked GPT-4o to score them on the same six-point scale using the same scoring guide that the humans used. Neither man nor machine was told the race or ethnicity of the student, but researchers could see students’ demographic information in the datasets that accompany these essays.

GPT-4o marked the essays almost a point lower than the humans did. The average score across the 13,121 essays was 2.8 for GPT-4o and 3.7 for the humans. But Asian Americans were docked by an additional quarter point. Human evaluators gave Asian Americans a 4.3, on average, while GPT-4o gave them only a 3.2 – roughly a 1.1 point deduction. By contrast, the score difference between humans and GPT-4o was only about 0.9 points for white, Black and Hispanic students. Imagine an ice cream truck that kept shaving off an extra quarter scoop only from the cones of Asian American kids. 

“Clearly, this doesn’t seem fair,” wrote Johnson and Zhang in an unpublished report they shared with me. Though the extra penalty for Asian Americans wasn’t terribly large, they said, it’s substantial enough that it shouldn’t be ignored. 

The researchers don’t know why GPT-4o issued lower grades than humans, and why it gave an extra penalty to Asian Americans. Zhang and Johnson described the AI system as a “huge black box” of algorithms that operate in ways “not fully understood by their own developers.” That inability to explain a student’s grade on a writing assignment makes the systems especially frustrating to use in schools.

This table compares GPT-4o scores with human scores on the same batch of 13,121 student essays, which were scored on a 1-to-6 scale. Numbers highlighted in green show exact score matches between GPT-4o and humans. Unhighlighted numbers show discrepancies. For example, there were 1,221 essays where humans awarded a 5 and GPT awarded 3. Data source: Matt Johnson & Mo Zhang “Using GPT-4o to Score Persuade 2.0 Independent Items” ETS (June 2024 draft)

This one study isn’t proof that AI is consistently underrating essays or biased against Asian Americans. Other versions of AI sometimes produce different results. A separate analysis of essay scoring by researchers from University of California, Irvine and Arizona State University found that AI essay grades were just as frequently too high as they were too low. That study, which used the 3.5 version of ChatGPT, did not scrutinize results by race and ethnicity.

I wondered if AI bias against Asian Americans was somehow connected to high achievement. Just as Asian Americans tend to score high on math and reading tests, Asian Americans, on average, were the strongest writers in this bundle of 13,000 essays. Even with the penalty, Asian Americans still had the highest essay scores, well above those of white, Black, Hispanic, Native American or multi-racial students. 

In both the ETS and UC-ASU essay studies, AI awarded far fewer perfect scores than humans did. For example, in this ETS study, humans awarded 732 perfect 6s, while GPT-4o gave out a grand total of only three. GPT’s stinginess with perfect scores might have affected a lot of Asian Americans who had received 6s from human raters.

ETS’s researchers had asked GPT-4o to score the essays cold, without showing the chatbot any graded examples to calibrate its scores. It’s possible that a few sample essays or small tweaks to the grading instructions, or prompts, given to ChatGPT could reduce or eliminate the bias against Asian Americans. Perhaps the robot would be fairer to Asian Americans if it were explicitly prompted to “give out more perfect 6s.” 

The ETS researchers told me this wasn’t the first time that they’ve noticed Asian students treated differently by a robo-grader. Older automated essay graders, which used different algorithms, have sometimes done the opposite, giving Asians higher marks than human raters did. For example, an ETS automated scoring system developed more than a decade ago, called e-rater, tended to inflate scores for students from Korea, China, Taiwan and Hong Kong on their essays for the Test of English as a Foreign Language (TOEFL), according to a study published in 2012. That may have been because some Asian students had memorized well-structured paragraphs, while humans easily noticed that the essays were off-topic. (The ETS website says it only relies on the e-rater score alone for practice tests, and uses it in conjunction with human scores for actual exams.) 

Asian Americans also garnered higher marks from an automated scoring system created during a coding competition in 2021 and powered by BERT, which had been the most advanced algorithm before the current generation of large language models, such as GPT. Computer scientists put their experimental robo-grader through a series of tests and discovered that it gave higher scores than humans did to Asian Americans’ open-response answers on a reading comprehension test. 

It was also unclear why BERT sometimes treated Asian Americans differently. But it illustrates how important it is to test these systems before we unleash them in schools. Based on educator enthusiasm, however, I fear this train has already left the station. In recent webinars, I’ve seen many teachers post in the chat window that they’re already using ChatGPT, Claude and other AI-powered apps to grade writing. That might be a time saver for teachers, but it could also be harming students. 

This story about AI bias was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post PROOF POINTS: Asian American students lose more points in an AI essay grading study — but researchers don’t know why appeared first on The Hechinger Report.

]]>
https://hechingerreport.org/proof-points-asian-american-ai-bias/feed/ 3 101830
TEACHER VOICE: My students are afraid of AI https://hechingerreport.org/teacher-voice-my-are-bombarded-with-negative-ideas-about-ai-and-now-they-are-afraid/ https://hechingerreport.org/teacher-voice-my-are-bombarded-with-negative-ideas-about-ai-and-now-they-are-afraid/#respond Tue, 25 Jun 2024 05:00:00 +0000 https://hechingerreport.org/?p=101668

Since the release of ChatGPT in November 2022, educators have pondered its implications for education. Some have leaned toward apocalyptic projections about the end of learning, while others remain cautiously optimistic. My students took longer than I expected to discover generative AI. When I asked them about ChatGPT in February 2023, many had never heard […]

The post TEACHER VOICE: My students are afraid of AI appeared first on The Hechinger Report.

]]>

Since the release of ChatGPT in November 2022, educators have pondered its implications for education. Some have leaned toward apocalyptic projections about the end of learning, while others remain cautiously optimistic.

My students took longer than I expected to discover generative AI. When I asked them about ChatGPT in February 2023, many had never heard of it.

But some caught up, and now our college’s academic integrity office is busier than ever dealing with AI-related cheating. The need for guidelines is discussed in every college meeting, but I’ve noticed a worrying reaction among students that educators are not considering: fear.

Students are bombarded with negative ideas about AI. Punitive policies heighten that fear while failing to recognize the potential educational benefits of these technologies — and that students will need to use them in their careers. Our role as educators is to cultivate critical thinking and equip students for a job market that will use AI, not to intimidate them.

Yet course descriptions include bans on the use of AI. Professors tell students they cannot use it. And students regularly read stories about their peers going on academic probation for using Grammarly. If students feel constantly under suspicion, it can create a hostile learning environment.

Related: Interested in innovations in the field of higher education? Subscribe to our free biweekly Higher Education newsletter.

Many of my students haven’t even played around with ChatGPT because they are scared of being accused of plagiarism. This avoidance creates a paradox in which students are expected to be adept with these modern tools post-graduation, yet are discouraged from engaging with them during their education.

I suspect the profile of my students makes them more prone to fear AI. Most are Hispanic and female, taking courses in translation and interpreting. They see that the overwhelmingly male and white tech bros” in Silicon Valley shaping AI look nothing like them, and they internalize the idea that AI is not for them and not something they need to know about. I wasn’t surprised that the only male student I had in class this past semester was the only student excited about ChatGPT from the very beginning.

Failing to develop AI literacy among Hispanic students can diminish their confidence and interest in engaging with these technologies. Their fearful reactions will widen the already concerning inequities between Hispanic and non-Hispanic students; the degree completion gap between Latino and white students increased between 2018 and 2021.

The stakes are high. Similar to the internet boom, AI will revolutionize daily activities and, certainly, knowledge jobs. To prepare our students for these changes, we need to help them understand what AI is and encourage them to explore the functionalities of large language models like ChatGPT.

I decided to address the issue head-on. I asked my students to write speeches on a current affairs topic. But first, I asked for their thoughts on AI. I was shocked by the extent of their misunderstanding: Many believed that AI was an omniscient knowledge-producing machine connected to the internet.

After I gave a brief presentation on AI, they expressed surprise that large language models are based on prediction rather than direct knowledge. Their curiosity was piqued, and they wanted to learn how to use AI effectively.

After they drafted their speeches without AI, I asked them to use ChatGPT to proofread their drafts and then report back to me. Again, they were surprised — this time about how much ChatGPT could improve their writing. I was happy (even proud) to see they were also critical of the output, with comments such as “It didn’t sound like me” or “It made up parts of the story.”

Was the activity perfect? Of course not. Prompting was challenging. I noticed a clear correlation between literacy levels and the quality of their prompts.

Students who struggled with college-level writing couldn’t go beyond prompts such as “Make it sound smoother.” Nonetheless, this basic activity was enough to spark curiosity and critical thinking about AI.

Individual activities like these are great, but without institutional support and guidance, efforts toward fostering AI literacy will fall short.

The provost of my college established an AI committee to develop college guidelines. It included professors from a wide range of disciplines (myself included), other staff members and, importantly, students.

Through multiple meetings, we brainstormed the main issues that needed to be included and researched specific topics like AI literacy, data privacy and safety, AI detectors and bias.

We created a document divided into key points that everyone could understand. The draft document was then circulated among faculty and other committees for feedback.

Initially, we were concerned that circulating the guidelines among too many stakeholders might complicate the process, but this step proved crucial. Feedback from professors in areas such as history and philosophy strengthened the guidelines, adding valuable perspectives. This collaborative approach also helped increase institutional buy-in, as everyone’s contribution was valued.

Related: A new partnership paves the way for greater use of AI in higher ed

Underfunded public institutions like mine face significant challenges integrating AI into education. While AI offers incredible opportunities for educators, realizing these opportunities requires substantial institutional investment.

Asking adjuncts in my department, who are grossly underpaid, to find time to learn how to use AI and incorporate it into their classes seems unethical. Yet, incorporating AI into our knowledge production activities can significantly boost student outcomes.

If this happens only at wealthy institutions, we will widen academic performance gaps.

Furthermore, if only students at wealthy institutions and companies get to use AI, the bias inherent in these large language models will continue to grow.

If we want our classes to ensure equitable educational opportunities for all students, minority-serving institutions cannot fall behind in AI adoption.

Cristina Lozano Argüelles is an assistant professor of interpreting and bilingualism at John Jay College, part of the City University of New York, where she researches the cognitive and social dimensions of language learning.

This story about AI literacy was produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Hechinger’s newsletter.

The post TEACHER VOICE: My students are afraid of AI appeared first on The Hechinger Report.

]]>
https://hechingerreport.org/teacher-voice-my-are-bombarded-with-negative-ideas-about-ai-and-now-they-are-afraid/feed/ 0 101668
PROOF POINTS: Teens are looking to AI for information and answers, two surveys show https://hechingerreport.org/proof-points-teens-ai-surveys/ https://hechingerreport.org/proof-points-teens-ai-surveys/#respond Mon, 17 Jun 2024 10:00:00 +0000 https://hechingerreport.org/?p=101528

Two new surveys, both released this month, show how high school and college-age students are embracing artificial intelligence. There are some inconsistencies and many unanswered questions, but what stands out is how much teens are turning to AI for information and to ask questions, not just to do their homework for them. And they’re using […]

The post PROOF POINTS: Teens are looking to AI for information and answers, two surveys show appeared first on The Hechinger Report.

]]>

Two new surveys, both released this month, show how high school and college-age students are embracing artificial intelligence. There are some inconsistencies and many unanswered questions, but what stands out is how much teens are turning to AI for information and to ask questions, not just to do their homework for them. And they’re using it for personal reasons as well as for school. Another big takeaway is that there are different patterns by race and ethnicity with Black, Hispanic and Asian American students often adopting AI faster than white students.

The first report, released on June 3, was conducted by three nonprofit organizations, Hopelab, Common Sense Media, and the Center for Digital Thriving at the Harvard Graduate School of Education. These organizations surveyed 1,274 teens and young adults aged 14-22 across the U.S. from October to November 2023. At that time, only half the teens and young adults said they had ever used AI, with just 4 percent using it daily or almost every day. 

Emily Weinstein, executive director for the Center for Digital Thriving, a research center that investigates how youth are interacting with technology, said that more teens are “certainly” using AI now that these tools are embedded in more apps and websites, such as Google Search. Last October and November, when this survey was conducted, teens typically had to take the initiative to navigate to an AI site and create an account. An exception was Snapchat, a social media app that had already added an AI chatbot for its users. 

More than half of the early adopters said they had used AI for getting information and for brainstorming, the first and second most popular uses. This survey didn’t ask teens if they were using AI for cheating, such as prompting ChatGPT to write their papers for them. However, among the half of respondents who were already using AI, fewer than half – 46 percent – said they were using it for help with school work. The fourth most common use was for generating pictures.

The survey also asked teens a couple of open-response questions. Some teens told researchers that they are asking AI private questions that they were too embarrassed to ask their parents or their friends. “Teens are telling us I have questions that are easier to ask robots than people,”  said Weinstein.

Weinstein wants to know more about the quality and the accuracy of the answers that AI is giving teens, especially those with mental health struggles, and how privacy is being protected when students share personal information with chatbots.

The second report, released on June 11, was conducted by Impact Research and  commissioned by the Walton Family Foundation. In May 2024, Impact Research surveyed 1,003 teachers, 1,001 students aged 12-18, 1,003 college students, and 1,000 parents about their use and views of AI.

This survey, which took place six months after the Hopelab-Common Sense survey, demonstrated how quickly usage is growing. It found that 49 percent of students, aged 12-18, said they used ChatGPT at least once a week for school, up 26 percentage points since 2023. Forty-nine percent of college undergraduates also said they were using ChatGPT every week for school but there was no comparison data from 2023.

Among 12- to 18-year-olds and college students who had used AI chatbots for school, 56 percent said they had used it for help in writing essays and other writing assignments. Undergraduate students were more than twice as likely as 12- to 18-year-olds to say using AI felt like cheating, 22 percent versus 8 percent. Earlier 2023 surveys of student cheating by scholars at Stanford University did not detect an increase in cheating with ChatGPT and other generative AI tools. But as students use AI more, students’ understanding of what constitutes cheating may also be evolving. 

 

More than 60 percent of college students who used AI said they were using it to study for tests and quizzes. Half of the college students who used AI said they were using it to deepen their subject knowledge, perhaps, as if it were an online encyclopedia. There was no indication from this survey if students were checking the accuracy of the information.

Both surveys noticed differences by race and ethnicity. The first Hopelab-Common Sense survey found that 7 percent of Black students, aged 14-22, were using AI every day, compared with 5 percent of Hispanic students and 3 percent of white students. In the open-ended questions, one Black teen girl wrote that, with AI, “we can change who we are and become someone else that we want to become.” 

The Walton Foundation survey found that Hispanic and Asian American students were sometimes more likely to use AI than white and Black students, especially for personal purposes. 

These are all early snapshots that are likely to keep shifting. OpenAI is expected to become part of the Apple universe in the fall, including its iPhones, computers and iPads.  “These numbers are going to go up and they’re going to go up really fast,” said Weinstein. “Imagine that we could go back 15 years in time when social media use was just starting with teens. This feels like an opportunity for adults to pay attention.”

This story about ChatGPT in education was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post PROOF POINTS: Teens are looking to AI for information and answers, two surveys show appeared first on The Hechinger Report.

]]>
https://hechingerreport.org/proof-points-teens-ai-surveys/feed/ 0 101528
PROOF POINTS: AI writing feedback ‘better than I thought,’ top researcher says https://hechingerreport.org/proof-points-writing-ai-feedback/ https://hechingerreport.org/proof-points-writing-ai-feedback/#respond Mon, 03 Jun 2024 10:00:00 +0000 https://hechingerreport.org/?p=101344

This week I challenged my editor to face off against a machine. Barbara Kantrowitz gamely accepted, under one condition: “You have to file early.”  Ever since ChatGPT arrived in 2022, many journalists have made a public stunt out of asking the new generation of artificial intelligence to write their stories. Those AI stories were often […]

The post PROOF POINTS: AI writing feedback ‘better than I thought,’ top researcher says appeared first on The Hechinger Report.

]]>
Researchers from the University of California, Irvine, and Arizona State University found that human feedback was generally a bit better than AI feedback, but AI was surprisingly good. Credit: Getty Images

This week I challenged my editor to face off against a machine. Barbara Kantrowitz gamely accepted, under one condition: “You have to file early.”  Ever since ChatGPT arrived in 2022, many journalists have made a public stunt out of asking the new generation of artificial intelligence to write their stories. Those AI stories were often bland and sprinkled with errors. I wanted to understand how well ChatGPT handled a different aspect of writing: giving feedback.

My curiosity was piqued by a new study, published in the June 2024 issue of the peer-reviewed journal Learning and Instruction, that evaluated the quality of ChatGPT’s feedback on students’ writing. A team of researchers compared AI with human feedback on 200 history essays written by students in grades 6 through 12 and they determined that human feedback was generally a bit better. Humans had a particular advantage in advising students on something to work on that would be appropriate for where they are in their development as a writer. 

But ChatGPT came close. On a five-point scale that the researchers used to rate feedback quality, with a 5 being the highest quality feedback, ChatGPT averaged a 3.6 compared with a 4.0 average from a team of 16 expert human evaluators. It was a tough challenge. Most of these humans had taught writing for more than 15 years or they had considerable experience in writing instruction. All received three hours of training for this exercise plus extra pay for providing the feedback.

ChatGPT even beat these experts in one aspect; it was slightly better at giving feedback on students’ reasoning, argumentation and use of evidence from source materials – the features that the researchers had wanted the writing evaluators to focus on.

“It was better than I thought it was going to be because I didn’t have a lot of hope that it was going to be that good,” said Steve Graham, a well-regarded expert on writing instruction at Arizona State University, and a member of the study’s research team. “It wasn’t always accurate. But sometimes it was right on the money. And I think we’ll learn how to make it better.”

Average ratings for the quality of ChatGPT and human feedback on 200 student essays

Researchers rated the quality of the feedback on a five-point scale across five different categories. Criteria-based refers to whether the feedback addressed the main goals of the writing assignment, in this case, to produce a well-reasoned argument about history using evidence from the reading source materials that the students were given. Clear directions mean whether the feedback included specific examples of something the student did well and clear directions for improvement. Accuracy means whether the feedback advice was correct without errors. Essential Features refer to whether the suggestion on what the student should work on next is appropriate for where the student is in his writing development and is an important element of this genre of writing. Supportive Tone refers to whether the feedback is delivered with language that is affirming, respectful and supportive, as opposed to condescending, impolite or authoritarian. (Source: Fig. 1 of Steiss et al, “Comparing the quality of human and ChatGPT feedback of students’ writing,” Learning and Instruction, June 2024.)

Exactly how ChatGPT is able to give good feedback is something of a black box even to the writing researchers who conducted this study. Artificial intelligence doesn’t comprehend things in the same way that humans do. But somehow, through the neural networks that ChatGPT’s programmers built, it is picking up on patterns from all the writing it has previously digested, and it is able to apply those patterns to a new text. 

The surprising “relatively high quality” of ChatGPT’s feedback is important because it means that the new artificial intelligence of large language models, also known as generative AI, could potentially help students improve their writing. One of the biggest problems in writing instruction in U.S. schools is that teachers assign too little writing, Graham said, often because teachers feel that they don’t have the time to give personalized feedback to each student. That leaves students without sufficient practice to become good writers. In theory, teachers might be willing to assign more writing or insist on revisions for each paper if students (or teachers) could use ChatGPT to provide feedback between drafts. 

Despite the potential, Graham isn’t an enthusiastic cheerleader for AI. “My biggest fear is that it becomes the writer,” he said. He worries that students will not limit their use of ChatGPT to helpful feedback, but ask it to do their thinking, analyzing and writing for them. That’s not good for learning. The research team also worries that writing instruction will suffer if teachers delegate too much feedback to ChatGPT. Seeing students’ incremental progress and common mistakes remain important for deciding what to teach next, the researchers said. For example, seeing loads of run-on sentences in your students’ papers might prompt a lesson on how to break them up. But if you don’t see them, you might not think to teach it. Another common concern among writing instructors is that AI feedback will steer everyone to write in the same homogenized way. A young writer’s unique voice could be flattened out before it even has the chance to develop.

There’s also the risk that students may not be interested in heeding AI feedback. Students often ignore the painstaking feedback that their teachers already give on their essays. Why should we think students will pay attention to feedback if they start getting more of it from a machine? 

Still, Graham and his research colleagues at the University of California, Irvine, are continuing to study how AI could be used effectively and whether it ultimately improves students’ writing. “You can’t ignore it,” said Graham. “We either learn to live with it in useful ways, or we’re going to be very unhappy with it.”

Right now, the researchers are studying how students might converse back-and-forth with ChatGPT like a writing coach in order to understand the feedback and decide which suggestions to use.

Example of feedback from a human and ChatGPT on the same essay

In the current study, the researchers didn’t track whether students understood or employed the feedback, but only sought to measure its quality. Judging the quality of feedback is a rather subjective exercise, just as feedback itself is a bundle of subjective judgment calls. Smart people can disagree on what good writing looks like and how to revise bad writing. 

In this case, the research team came up with its own criteria for what constitutes good feedback on a history essay. They instructed the humans to focus on the student’s reasoning and argumentation, rather than, say, grammar and punctuation.  They also told the human raters to adopt a “glow and grow strategy” for delivering the feedback by first finding something to praise, then identifying a particular area for improvement. 

The human raters provided this kind of feedback on hundreds of history essays from 2021 to 2023, as part of an unrelated study of an initiative to boost writing at school. The researchers randomly grabbed 200 of these essays and fed the raw student writing – without the human feedback – to version 3.5 of ChatGPT and asked it to give feedback, too

At first, the AI feedback was terrible, but as the researchers tinkered with the instructions, or the “prompt,” they typed into ChatGPT, the feedback improved. The researchers eventually settled upon this wording: “Pretend you are a secondary school teacher. Provide 2-3 pieces of specific, actionable feedback on each of the following essays…. Use a friendly and encouraging tone.” The researchers also fed the assignment that the students were given, for example, “Why did the Montgomery Bus Boycott succeed?” along with the reading source material that the students were provided. (More details about how the researchers prompted ChatGPT are explained in Appendix C of the study.)

The humans took about 20 to 25 minutes per essay. ChatGPT’s feedback came back instantly. The humans sometimes marked up sentences by, for example, showing a place where the student could have cited a source to buttress an argument. ChatGPT didn’t write any in-line comments and only wrote a note to the student. 

Researchers then read through both sets of feedback – human and machine – for each essay, comparing and rating them. (It was supposed to be a blind comparison test and the feedback raters were not told who authored each one. However, the language and tone of ChatGPT were distinct giveaways, and the in-line comments were a tell of human feedback.)

Humans appeared to have a clear edge with the very strongest and the very weakest writers, the researchers found. They were better at pushing a strong writer a little bit further, for example, by suggesting that the student consider and address a counterargument. ChatGPT struggled to come up with ideas for a student who was already meeting the objectives of a well-argued essay with evidence from the reading source materials. ChatGPT also struggled with the weakest writers. The researchers had to drop two of the essays from the study because they were so short that ChatGPT didn’t have any feedback for the student. The human rater was able to parse out some meaning from a brief, incomplete sentence and offer a suggestion. 

In one student essay about the Montgomery Bus Boycott, reprinted above, the human feedback seemed too generic to me: “Next time, I would love to see some evidence from the sources to help back up your claim.” ChatGPT, by contrast, specifically suggested that the student could have mentioned how much revenue the bus company lost during the boycott – an idea that was mentioned in the student’s essay. ChatGPT also suggested that the student could have mentioned specific actions that the NAACP and other organizations took. But the student had actually mentioned a few of these specific actions in his essay. That part of ChatGPT’s feedback was plainly inaccurate. 

In another student writing example, also reprinted below, the human straightforwardly pointed out that the student had gotten an historical fact wrong. ChatGPT appeared to affirm that the student’s mistaken version of events was correct.

Another example of feedback from a human and ChatGPT on the same essay

So how did ChatGPT’s review of my first draft stack up against my editor’s? One of the researchers on the study team suggested a prompt that I could paste into ChatGPT. After a few back and forth questions with the chatbot about my grade level and intended audience, it initially spit out some generic advice that had little connection to the ideas and words of my story. It seemed more interested in format and presentation, suggesting a summary at the top and subheads to organize the body. One suggestion would have made my piece too long-winded. Its advice to add examples of how AI feedback might be beneficial was something that I had already done. I then asked for specific things to change in my draft, and ChatGPT came back with some great subhead ideas. I plan to use them in my newsletter, which you can see if you sign up for it here. (And if you want to see my prompt and dialogue with ChatGPT, here is the link.) 

My human editor, Barbara, was the clear winner in this round. She tightened up my writing, fixed style errors and helped me brainstorm this ending. Barbara’s job is safe – for now. 

This story about AI feedback was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post PROOF POINTS: AI writing feedback ‘better than I thought,’ top researcher says appeared first on The Hechinger Report.

]]>
https://hechingerreport.org/proof-points-writing-ai-feedback/feed/ 0 101344
OPINION: It’s not just about tech and anxiety. What are kids learning? https://hechingerreport.org/opinion-its-not-just-about-tech-and-anxiety-what-are-kids-learning/ https://hechingerreport.org/opinion-its-not-just-about-tech-and-anxiety-what-are-kids-learning/#respond Tue, 28 May 2024 05:00:00 +0000 https://hechingerreport.org/?p=101254

Clouds of doom continue to hover over the debate about teens’ mental health and the role of technology. This spring, the warnings come from the bestselling book “The Anxious Generation” by sociologist Jonathan Haidt. Some parents and educators are calling for a ban on smartphones and laptops in schools. Others are trying to press pause […]

The post OPINION: It’s not just about tech and anxiety. What are kids learning? appeared first on The Hechinger Report.

]]>

Clouds of doom continue to hover over the debate about teens’ mental health and the role of technology. This spring, the warnings come from the bestselling book “The Anxious Generation” by sociologist Jonathan Haidt. Some parents and educators are calling for a ban on smartphones and laptops in schools. Others are trying to press pause on the panic by pointing to research that needs a longer look.

People feel forced into binary camps of “ban tech” and “don’t ban tech.”

But there is a way to reset the conversation that could help parents, educators and kids themselves make better choices about technology. As writers and researchers who focus on the science of learning, we see a gaping hole in the debate thus far. The problem is that decision-makers keep relying on only two sets of questions and data: One set focuses on questions about how youth are feeling (not so great). The other focuses on how kids are using their time (spending hours on their phones).

A third set of questions is missing and needs to be asked: What and how are children and youth learning? Is technology aiding their learning or getting in the way? Think of data on tech and learning as the third leg of the stool in this debate. Without it, we can’t find our way toward balance.

Haidt’s book focuses primarily on well-being, and it’s great that he recognizes the research on the importance of play and exploration offline to helping children’s mental health. But play and exploration are also critical for learning, and parents and educators need more examples of the many different places where learning happens, whether on screen, off screen or some hybrid of the two. Parents are at risk of becoming either too protectionist or too permissive if they don’t stop to consider whether technology is affording today’s kids opportunities to explore and stretch their minds.

Related: Become a lifelong learner. Subscribe to our free weekly newsletter to receive our comprehensive reporting directly in your inbox.

Harvard professor Michael Rich, author of the recent book “The Mediatrician’s Guide,” argues that our children are growing up in a world in which they move seamlessly between physical and digital information, with mountains of experiences and learning opportunities at their fingertips. This is their reality. Today, even children from under-resourced environments can virtually visit places that in the past were well beyond their reach.

Many parents and teachers know their kids can gain valuable skills and knowledge from using different forms of tech and media. In fact, they are already factoring in the potential for learning when they make decisions about technology. They restrict phones and laptops in certain contexts and make them available in others, depending on what they believe will provide a good learning environment for their children at different ages and stages.

Sometimes the technology, and the way kids explore and build things with it, is integral to what kids need to learn. This year, for example, students have been working in Seattle public libraries with University of Washington professor Jason Yip to build tools and games intended to help other kids identify and avoid disinformation. One game is an online maze built within the world of “Minecraft” that shows what it feels like to fall down rabbit holes of extreme information. “Digital play can open up a number of potentials that allow children to experience unknown and difficult situations, such as misinformation, and experiment with decision-making,” Yip said.

Related: Horticulture, horses and ‘Chill Rooms’: One district goes all-in on mental health support

More focus on the effect of technology on learning — good and bad — is needed at all ages. Studies of young children show that when parents are distracted by their phones, they are less able to help their kids build the language skills that are key for learning how to read. Maybe parents should model different behaviors with their phone use. Also consider a study at the University of Delaware in which researchers read books to 4-year-olds live, via video chat or in a prerecorded video. No significant differences in learning were found between the children reading live or via video chat. This study and others provide clear evidence that children can learn when people read storybooks to them online.

Instead of fighting with children over smartphone use, we should be making sure that there are enough teachers and mentors to help all kids use those phones and laptops to support learning, whether they are collaborating on science fair projects or creating video book trailers for YouTube. Kids need teachers and parents who can give them opportunities to explore, play and grapple with hard things in both the digital world and the real world.

Our society is good at creating polarization. But we don’t have to devolve into extreme “ban” or “don’t ban” positions on smartphones, laptops or other technology today.

Parents and teachers should make decisions about technology after viewing the issue from three perspectives: how much the kids are using the devices, how the devices are affecting kids’ well-being and — the missing leg — how the devices are affecting their learning. Maybe adding this new piece could even help adults see more than just an “anxious generation” but also one hungry to learn.

Kathy Hirsh-Pasek, Roberta Golinkoff and Lisa Guernsey are authors of several books on children’s learning and founders of The Learning Sciences Exchange, a fellowship program and problem-solving platform at New America that brings together experts in child development research, media and journalism, entertainment, social entrepreneurship and education leadership.

This story about teens and technology was produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Hechinger’s newsletter.

The post OPINION: It’s not just about tech and anxiety. What are kids learning? appeared first on The Hechinger Report.

]]>
https://hechingerreport.org/opinion-its-not-just-about-tech-and-anxiety-what-are-kids-learning/feed/ 0 101254
PROOF POINTS: AI essay grading is already as ‘good as an overburdened’ teacher, but researchers say it needs more work https://hechingerreport.org/proof-points-ai-essay-grading/ https://hechingerreport.org/proof-points-ai-essay-grading/#comments Mon, 20 May 2024 10:00:00 +0000 https://hechingerreport.org/?p=101011

Grading papers is hard work. “I hate it,” a teacher friend confessed to me. And that’s a major reason why middle and high school teachers don’t assign more writing to their students. Even an efficient high school English teacher who can read and evaluate an essay in 20 minutes would spend 3,000 minutes, or 50 […]

The post PROOF POINTS: AI essay grading is already as ‘good as an overburdened’ teacher, but researchers say it needs more work appeared first on The Hechinger Report.

]]>

Grading papers is hard work. “I hate it,” a teacher friend confessed to me. And that’s a major reason why middle and high school teachers don’t assign more writing to their students. Even an efficient high school English teacher who can read and evaluate an essay in 20 minutes would spend 3,000 minutes, or 50 hours, grading if she’s teaching six classes of 25 students each. There aren’t enough hours in the day. 

Could ChatGPT relieve teachers of some of the burden of grading papers? Early research is finding that the new artificial intelligence of large language models, also known as generative AI, is approaching the accuracy of a human in scoring essays and is likely to become even better soon. But we still don’t know whether offloading essay grading to ChatGPT will ultimately improve or harm student writing.

Tamara Tate, a researcher at University California, Irvine, and an associate director of her university’s Digital Learning Lab, is studying how teachers might use ChatGPT to improve writing instruction. Most recently, Tate and her seven-member research team, which includes writing expert Steve Graham at Arizona State University, compared how ChatGPT stacked up against humans in scoring 1,800 history and English essays written by middle and high school students. 

Tate said ChatGPT was “roughly speaking, probably as good as an average busy teacher” and “certainly as good as an overburdened below-average teacher.” But, she said, ChatGPT isn’t yet accurate enough to be used on a high-stakes test or on an essay that would affect a final grade in a class.

Tate presented her study on ChatGPT essay scoring at the 2024 annual meeting of the American Educational Research Association in Philadelphia in April. (The paper is under peer review for publication and is still undergoing revision.) 

Most remarkably, the researchers obtained these fairly decent essay scores from ChatGPT without training it first with sample essays. That means it is possible for any teacher to use it to grade any essay instantly with minimal expense and effort. “Teachers might have more bandwidth to assign more writing,” said Tate. “You have to be careful how you say that because you never want to take teachers out of the loop.” 

Writing instruction could ultimately suffer, Tate warned, if teachers delegate too much grading to ChatGPT. Seeing students’ incremental progress and common mistakes remain important for deciding what to teach next, she said. For example, seeing loads of run-on sentences in your students’ papers might prompt a lesson on how to break them up. But if you don’t see them, you might not think to teach it. 

In the study, Tate and her research team calculated that ChatGPT’s essay scores were in “fair” to “moderate” agreement with those of well-trained human evaluators. In one batch of 943 essays, ChatGPT was within a point of the human grader 89 percent of the time. On a six-point grading scale that researchers used in the study, ChatGPT often gave an essay a 2 when an expert human evaluator thought it was really a 1. But this level of agreement – within one point – dropped to 83 percent of the time in another batch of 344 English papers and slid even farther to 76 percent of the time in a third batch of 493 history essays.  That means there were more instances where ChatGPT gave an essay a 4, for example, when a teacher marked it a 6. And that’s why Tate says these ChatGPT grades should only be used for low-stakes purposes in a classroom, such as a preliminary grade on a first draft.

ChatGPT scored an essay within one point of a human grader 89 percent of the time in one batch of essays

Corpus 3 refers to one batch of 943 essays, which represents more than half of the 1,800 essays that were scored in this study. Numbers highlighted in green show exact score matches between ChatGPT and a human. Yellow highlights scores in which ChatGPT was within one point of the human score. Source: Tamara Tate, University of California, Irvine (2024).

Still, this level of accuracy was impressive because even teachers disagree on how to score an essay and one-point discrepancies are common. Exact agreement, which only happens half the time between human raters, was worse for AI, which matched the human score exactly only about 40 percent of the time. Humans were far more likely to give a top grade of a 6 or a bottom grade of a 1. ChatGPT tended to cluster grades more in the middle, between 2 and 5. 

Tate set up ChatGPT for a tough challenge, competing against teachers and experts with PhDs who had received three hours of training in how to properly evaluate essays. “Teachers generally receive very little training in secondary school writing and they’re not going to be this accurate,” said Tate. “This is a gold-standard human evaluator we have here.”

The raters had been paid to score these 1,800 essays as part of three earlier studies on student writing. Researchers fed these same student essays – ungraded –  into ChatGPT and asked ChatGPT to score them cold. ChatGPT hadn’t been given any graded examples to calibrate its scores. All the researchers did was copy and paste an excerpt of the same scoring guidelines that the humans used, called a grading rubric, into ChatGPT and told it to “pretend” it was a teacher and score the essays on a scale of 1 to 6. 

Older robo graders

Earlier versions of automated essay graders have had higher rates of accuracy. But they were expensive and time-consuming to create because scientists had to train the computer with hundreds of human-graded essays for each essay question. That’s economically feasible only in limited situations, such as for a standardized test, where thousands of students answer the same essay question. 

Earlier robo graders could also be gamed, once a student understood the features that the computer system was grading for. In some cases, nonsense essays received high marks if fancy vocabulary words were sprinkled in them. ChatGPT isn’t grading for particular hallmarks, but is analyzing patterns in massive datasets of language. Tate says she hasn’t yet seen ChatGPT give a high score to a nonsense essay. 

Tate expects ChatGPT’s grading accuracy to improve rapidly as new versions are released. Already, the research team has detected that the newer 4.0 version, which requires a paid subscription, is scoring more accurately than the free 3.5 version. Tate suspects that small tweaks to the grading instructions, or prompts, given to ChatGPT could improve existing versions. She is interested in testing whether ChatGPT’s scoring could become more reliable if a teacher trained it with just a few, perhaps five, sample essays that she has already graded. “Your average teacher might be willing to do that,” said Tate.

Many ed tech startups, and even well-known vendors of educational materials, are now marketing new AI essay robo graders to schools. Many of them are powered under the hood by ChatGPT or another large language model and I learned from this study that accuracy rates can be reported in ways that can make the new AI graders seem more accurate than they are. Tate’s team calculated that, on a population level, there was no difference between human and AI scores. ChatGPT can already reliably tell you the average essay score in a school or, say, in the state of California. 

Questions for AI vendors

At this point, it is not as accurate in scoring an individual student. And a teacher wants to know exactly how each student is doing. Tate advises teachers and school leaders who are considering using an AI essay grader to ask specific questions about accuracy rates on the student level:  What is the rate of exact agreement between the AI grader and a human rater on each essay? How often are they within one-point of each other?

The next step in Tate’s research is to study whether student writing improves after having an essay graded by ChatGPT. She’d like teachers to try using ChatGPT to score a first draft and then see if it encourages revisions, which are critical for improving writing. Tate thinks teachers could make it “almost like a game: how do I get my score up?” 

Of course, it’s unclear if grades alone, without concrete feedback or suggestions for improvement, will motivate students to make revisions. Students may be discouraged by a low score from ChatGPT and give up. Many students might ignore a machine grade and only want to deal with a human they know. Still, Tate says some students are too scared to show their writing to a teacher until it’s in decent shape, and seeing their score improve on ChatGPT might be just the kind of positive feedback they need. 

“We know that a lot of students aren’t doing any revision,” said Tate. “If we can get them to look at their paper again, that is already a win.”

That does give me hope, but I’m also worried that kids will just ask ChatGPT to write the whole essay for them in the first place.

This story about AI essay scoring was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters.

The post PROOF POINTS: AI essay grading is already as ‘good as an overburdened’ teacher, but researchers say it needs more work appeared first on The Hechinger Report.

]]>
https://hechingerreport.org/proof-points-ai-essay-grading/feed/ 2 101011
PROOF POINTS: Many high school math teachers cobble together their own instructional materials from the internet and elsewhere, a survey finds https://hechingerreport.org/proof-points-many-high-school-math-teachers-cobble-together-their-own-instructional-materials-from-the-internet-and-elsewhere-a-survey-finds/ https://hechingerreport.org/proof-points-many-high-school-math-teachers-cobble-together-their-own-instructional-materials-from-the-internet-and-elsewhere-a-survey-finds/#comments Mon, 29 Apr 2024 10:00:00 +0000 https://hechingerreport.org/?p=100387

Writing lesson plans has traditionally been a big part of a teacher’s job.  But this doesn’t mean they should be starting from a blank slate. Ideally, teachers are supposed to base their lessons on the textbooks, worksheets and digital materials that school leaders have spent a lot of time reviewing and selecting.  But a recent […]

The post PROOF POINTS: Many high school math teachers cobble together their own instructional materials from the internet and elsewhere, a survey finds appeared first on The Hechinger Report.

]]>

Writing lesson plans has traditionally been a big part of a teacher’s job.  But this doesn’t mean they should be starting from a blank slate. Ideally, teachers are supposed to base their lessons on the textbooks, worksheets and digital materials that school leaders have spent a lot of time reviewing and selecting. 

But a recent national survey of more than 1,000 math teachers reveals that many are rejecting the materials they should be using and cobbling together their own.

“A surprising number of math teachers, particularly at the high school level, simply said we don’t use the district or school-provided materials, or they claimed they didn’t have any,” said William Zahner, an associate professor of mathematics at San Diego State University, who presented the survey at the April 2024 annual meeting of the American Educational Research Association in Philadelphia. Students, he said, are often being taught through a “bricolage” of materials that teachers assemble themselves from colleagues and the internet. 

“What I see happening is a lot of math teachers are rewriting a curriculum that has already been written,” said Zahner. 

The survey results varied by grade level. More than 75 percent of elementary school math teachers said they used their school’s recommended materials, but fewer than 50 percent of high school math teachers said they did. 

Share of math teachers who use their schools recommended materials

Source: Zahner et al, Mathematics Teachers’ Perceptions of Their Instructional Materials for English Learners: Results from a National Survey, presented at AERA 2024.

The do-it-yourself approach has two downsides, Zahner said, both of which affect students. One problem is that it’s time consuming. Time spent finding materials is time not spent giving students feedback, tailoring existing lessons for students or giving students one-to-one tutoring help. The hunt for materials is also exhausting and can lead to teacher burnout, Zahner said.

Related: Education research, condensed. The free Proof Points newsletter delivers one story every Monday.

The other problem is that teacher-made materials may sacrifice the thoughtful sequencing of topics planned by curriculum designers.  When teachers create or take materials from various sources, it is hard to maintain a “coherent development” of ideas, Zahner explained. Curriculum designers may weave a review of previous concepts to reinforce them even as new ideas are introduced. Teacher-curated materials may be disjointed. Separate research has found that some of the most popular materials that teachers grab from internet sites, such as Teachers Pay Teachers, are not high quality

The national survey was conducted in 2021 by researchers at San Diego State University, including Zahner, who also directs the university’s Center for Research in Mathematics and Science Education, and the English Learners Success Forum, a nonprofit that seeks to improve the quality of instructional materials for English learners. The researchers sought out the views of teachers who worked in school districts where more than 10 percent of the students were classified as English learners, which is the national average. More than 1,000 math teachers, from kindergarten through 12th grade, responded. On average, 30 percent of their students were English learners, but some teachers had zero English learners and others had all English learners in their classrooms.

Teachers were asked about the drawbacks of their assigned curriculum for English learners. Many said that their existing materials weren’t connected to their students’ languages and cultures. Others said that the explanations of how to tailor a lesson to an English learner were too general to be useful.  Zahner says that teachers have a point and that they need more support in how to help English learners develop the language of mathematical reasoning and argumentation.

It was not clear from this survey whether the desire to accommodate English learners was the primary reason that teachers were putting together their own materials or whether they would have done so anyway. 

Related: Most English lessons on Teachers Pay Teachers and other sites are ‘mediocre’ or ‘not worth using,’ study finds

“There are a thousand reasons why this is happening,” said Zahner. One high school teacher in Louisiana who participated in the survey said his students needed a more advanced curriculum. Supervisors inside a school may not like the materials that officials in a central office have chosen. “Sometimes schools have the materials but they’re all hidden in a closet,” Zahner said.

In the midst of a national debate on how best to teach math, this survey is an important reminder of yet another reason why many students aren’t getting the instruction that they need. 

This story about math lessons was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Proof Points and other Hechinger newsletters. 

Talk to us about your college application essay

We want to hear directly from recent college applicants: What did you want to share about yourself with admissions officers? Your replies will help us understand what it’s like to apply to college now. We won’t publish anything you submit without getting your permission first.

1. What is your name?
Accepted file types: docx, jpg, pdf, png, Max. file size: 5 MB.
We won’t publish anything you submit without getting your permission first.
This will allow us to verify anything we receive from you. One of our reporters may also reach out to you for a follow-up conversation.
This field is for validation purposes and should be left unchanged.

The post PROOF POINTS: Many high school math teachers cobble together their own instructional materials from the internet and elsewhere, a survey finds appeared first on The Hechinger Report.

]]>
https://hechingerreport.org/proof-points-many-high-school-math-teachers-cobble-together-their-own-instructional-materials-from-the-internet-and-elsewhere-a-survey-finds/feed/ 9 100387
PROOF POINTS: Stanford’s Jo Boaler talks about her new book ‘MATH-ish’ and takes on her critics https://hechingerreport.org/proof-points-stanfords-jo-boaler-book-math-ish-critics/ https://hechingerreport.org/proof-points-stanfords-jo-boaler-book-math-ish-critics/#comments Mon, 22 Apr 2024 10:00:00 +0000 https://hechingerreport.org/?p=100161

Jo Boaler is a professor at the Stanford Graduate School of Education with a devoted following of teachers who cheer her call to make math education more exciting. But despite all her fans, she has sparked controversy at nearly every stage of her career. Critics say she misrepresents research to make her case and her […]

The post PROOF POINTS: Stanford’s Jo Boaler talks about her new book ‘MATH-ish’ and takes on her critics appeared first on The Hechinger Report.

]]>
“I am the next target,” says Stanford professor Jo Boaler, who is the subject of an anonymous complaint accusing her of a “reckless disregard for accuracy.” Credit: Photo provided by Jo Boaler

Jo Boaler is a professor at the Stanford Graduate School of Education with a devoted following of teachers who cheer her call to make math education more exciting. But despite all her fans, she has sparked controversy at nearly every stage of her career. Critics say she misrepresents research to make her case and her ideas actually impede students. Now, with a new book coming out in May, provocatively titled “MATH-ish,” Boaler is fighting back. 

“This is a whole effort to shut me down, my research and my writing,” said Boaler. “I see it as a form of knowledge suppression.”

Academic fights usually don’t make it beyond the ivory tower. But Boaler’s popularity and influence have made her a focal point in the current math wars, which also seem to reflect the broader culture wars.  In the last few months, tabloids and conservative publications have turned Boaler into something of an education villain who’s captured the attention of Elon Musk and Texas Sen. Ted Cruz on social media. Critics have even questioned Boaler’s association with a former reality TV star.

“I am the next target,” Boaler said, describing the death threats and abusive email she’s been receiving.

This controversy matters on a much larger level because there is a legitimate debate about how math should be taught in American schools. Cognitive science research suggests that students need a lot of practice and memorization to master math. And once students achieve success through practice, this success will motivate them to learn and enjoy math. In other words, success increases motivation at least as much as motivation produces success. 

Yet, from Boaler’s perspective, too many students feel like failures in math class and hate the subject. That leaves us with millions of Americans who are innumerate. Nearly 2 out of every 5 eighth graders don’t even have the most basic math skills, according to the 2022 National Assessment of Educational Progress (NAEP). On the international Program for International Student Assessment (PISA), American 15-year-olds rank toward the bottom of economically advanced nations in math achievement. 

Boaler draws upon a different body of research about student motivation that looks at the root causes of why students don’t like math based on surveys and interviews. Students who are tracked into low-level classes feel discouraged. Struggling math students often describe feelings of anxiety from timed tests. Many students express frustration that math is just a collection of meaningless procedures. 

Boaler seeks to fix these root causes. She advocates for ending tracking by ability in math classes, getting rid of timed tests and starting with conceptual understanding before introducing procedures. Most importantly, she wants to elevate the work that students tackle in math classes with more interesting questions that spark genuine curiosity and encourage students to think and wonder. Her goal is to expose students to the beauty of mathematical thinking as mathematicians enjoy the subject. Whether students actually learn more math the Boaler way is where this dispute centers. In other words, how strong is the evidence base?

The latest battle over Boaler’s work began with an anonymous complaint published in March by the Washington Free Beacon, the same conservative website that first surfaced plagiarism accusations against Claudine Gay, the former president of Harvard University. The complaint accuses Boaler of a “reckless disregard for accuracy” by misrepresenting research citations 52 times and asks Stanford to discipline Boaler, a full professor with an endowed chair. Stanford has said it’s reviewing the complaint and hasn’t decided whether to open an investigation, according to news reports. Boaler stands by her research (other than one citation that she says has been fixed) and calls the anonymous complaint “bogus.” (UPDATE: The Hechinger Report learned after this article was published that Stanford has decided not to open an investigation.)

“They haven’t even got the courage to put their name on accusations like this,” Boaler said. “That tells us something.”

Boaler first drew fire from critics in 2005, when she presented new research claiming that students at a low-income school who were behind grade level had outperformed students at higher-achieving schools when they were taught in classrooms that combined students of different math achievement levels. The supposed secret sauce was an unusual curriculum that emphasized group work and de-emphasized lectures. Critics disparaged the findings and hounded her to release her data. Math professors at Stanford and Cal State University re-crunched the numbers and declared they’d found the opposite result.

Boaler, who is originally from England, retreated to an academic post back in the U.K., but returned to Stanford in 2010 with a fighting spirit. She had written a book, “What’s Math Got to Do with It?: How Parents and Teachers Can Help Children Learn to Love Their Least Favorite Subject,” which explained to a general audience why challenging, open-ended problems would help more children to embrace math and how the current approach of boring drills and formulas was turning too many kids off. Teachers loved it.

Boaler accused her earlier critics of academic bullying and harassment. But she didn’t address their legitimate research questions. Instead, she focused on changing classrooms. Tens of thousands of teachers and parents flocked to her 2013 online course on how to teach math. Building on this new fan base, she founded a nonprofit organization at Stanford called youcubed to train teachers, conduct research and spread her gospel. Boaler says a half million teachers now visit youcubed’s website each month.

Boaler also saw math as a lever to promote social justice. She lamented that too many low-income Black and Hispanic children were stuck in discouraging, low-level math classes. She advocated for change. In 2014, San Francisco heeded that call, mixing different achievement levels in middle school classrooms and delaying algebra until ninth grade. Parents, especially in the city’s large Asian community, protested that delaying algebra was holding their children back. Without starting algebra in middle school, it was difficult to progress to high school calculus, an important course for college applications. Parents blamed Boaler, who applauded San Francisco for getting math right. Ten years later, the city is slated to reinstate algebra for eighth graders this fall. Boaler denies any involvement in the unpopular San Francisco reforms.

Before that math experiment unraveled in San Francisco, California education policymakers tapped Boaler to be one of the lead writers of a new math framework, which would guide math instruction throughout the state. The first draft discouraged tracking children into separate math classes by achievement levels, and proposed delaying algebra until high school. It emphasized “social justice” and suggested that students could take data science instead of advanced algebra in high school. Traditional math proponents worried that the document would water down math instruction in California, hinder advanced students and make it harder to pursue STEM careers. And they were concerned that California’s proposed reforms could spread across the nation. 

In the battle to quash the framework, critics attacked Boaler for trying to institute “woke” mathematics. The battle became personal, with some criticizing her for taking $5,000-an-hour consulting and speaking fees at public schools while sending her own children to private school. 

Critics also dug into the weeds of the framework document, which is how this also became a research story. A Stanford mathematics professor catalogued a list of what he saw as research misrepresentations. Those citations, together with additional characterizations of research findings throughout Boaler’s writings, eventually grew into the anonymous complaint that’s now at Stanford.

By the time the most recent complaint against Boaler was lodged, the framework had already been revised in substantial ways. Boaler’s critics had arguably won their main policy battles. College-bound students still need the traditional course sequence and cannot substitute data science for advanced algebra. California’s middle schools will continue to have the option to track children into separate classes and start algebra in eighth grade. 

But the attacks on Boaler continue. In addition to seeking sanctions from Stanford, her anonymous critics have asked academic journals to pull down her papers, according to Boaler. They’ve written to conference organizers to stop Boaler from speaking and, she says, they’ve told her funders to stop giving money to her. At least one, the Valhalla Foundation, the family foundation of billionaire Scott Cook (co-founder of the software giant Intuit), stopped funding youcubed in 2024. In 2022 and 2023, it gave Boaler’s organization more than $560,000. 

Boaler sees the continued salvos against her as part of the larger right-wing attack on diversity, equity and inclusion or DEI. She also sees a misogynistic pattern of taking down women who have power in education, such as Claudine Gay. “You’re basically hung, drawn and quartered by the court of Twitter,” she said.

From my perch as a journalist who covers education research, I see that Boaler has a tendency to overstate the implications of a narrow study. Sometimes she cites a theory that’s been written about in an academic journal but hasn’t been proven and labels it research. While technically true – most academic writing falls under the broad category of research –  that’s not the same as evidence from a well-designed classroom experiment.  And she tends not to factor in evidence that runs counter to her views or adjust her views as new studies arise. Some of her numerical claims seem grandiose. For example, she says one of her 18-lesson summer courses raised achievement by 2.8 years.

“People have raised questions for a long time about the rigor and the care in which Jo makes claims related to both her own research and others,” said Jon Star, a professor of math education at Harvard Graduate School of Education. 

But Star says many other education researchers have done exactly the same, and the “liberties” Boaler takes are common in the field. “That’s not to suggest that taking these liberties is okay,” Star said, “but she is being called out for it.”

Boaler is getting more scrutiny than her colleagues, he said, because she’s influential, has a large following of devoted teachers and has been involved in policy changes at schools. Many other scholars of math education share Boaler’s views. But Boaler has become the public face of nontraditional teaching ideas in math. And in today’s polarized political climate, that’s a dangerous public face to be.

The citation controversy reflects bigger issues with the state of education research. It’s often not as precise as the hard sciences or even social sciences like economics. Academic experts are prone to make wide, sweeping statements. And there are too few studies in real classrooms or randomized controlled trials that could settle some of the big debates. Star argues that more replication studies could improve the quality of evidence for math instruction. We can’t know which teaching methods are most effective unless the method can be reproduced in different settings with different students. 

Credit: Cover image provided by the author Jo Boaler

It’s also possible that more research may never settle these big math debates and we may continue to generate conflicting evidence. There’s the real possibility that traditional methods could be more effective for short-term achievement gains, while nontraditional methods might attract more students to the subject, and potentially lead to more creative problem-solvers in the future. 

Even if Boaler is loose with the details of research studies, she could still be right about the big picture. Maybe advanced students would be better off slowing down on the current racetrack to calculus to learn math with more depth and breadth. Her fun hands-on approach to math might spark just enough motivation to inspire more kids to do their homework. Might we trade off a bit of short-term math achievement for a greater good of a numerate, civic society?

In her new book, “MATH-ish,” Boaler is doubling down on her approach to math with a title that seems to encourage inexactitude. She argues that approaching a problem in a “math-ish” way gives students the freedom to take a guess and make mistakes, to step back and think rather than jumping to numerical calculations. Boaler says she’s hearing from teachers that “ish” is far more fun than making estimates.

“I’m hoping this book is going to be my salvation,” she said, “that I have something exciting to do and focus on and not focus on the thousands of abusive messages I’m getting.”

This story about Jo Boaler was written by Jill Barshay and produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for the Proof Points newsletter.

The post PROOF POINTS: Stanford’s Jo Boaler talks about her new book ‘MATH-ish’ and takes on her critics appeared first on The Hechinger Report.

]]>
https://hechingerreport.org/proof-points-stanfords-jo-boaler-book-math-ish-critics/feed/ 6 100161
How AI could transform the way schools test kids https://hechingerreport.org/how-ai-could-transform-the-way-schools-test-kids/ https://hechingerreport.org/how-ai-could-transform-the-way-schools-test-kids/#respond Thu, 11 Apr 2024 05:00:00 +0000 https://hechingerreport.org/?p=99994

Imagine interacting with an avatar that dissolves into tears – and being assessed on how intelligently and empathetically you respond to its emotional display. Or taking a math test that is created for you on the spot, the questions written to be responsive to the strengths and weaknesses you’ve displayed in prior answers. Picture being […]

The post How AI could transform the way schools test kids appeared first on The Hechinger Report.

]]>

Imagine interacting with an avatar that dissolves into tears – and being assessed on how intelligently and empathetically you respond to its emotional display.

Or taking a math test that is created for you on the spot, the questions written to be responsive to the strengths and weaknesses you’ve displayed in prior answers. Picture being evaluated on your scientific knowledge and getting instantaneous feedback on your answers, in ways that help you better understand and respond to other questions.

These are just a few of the types of scenarios that could become reality as generative artificial intelligence advances, according to Mario Piacentini, a senior analyst of innovative assessments with the Programme for International Student Assessment, known as PISA.

He and others argue that AI has the potential to shake up the student testing industry, which has evolved little for decades and which critics say too often falls short of evaluating students’ true knowledge. But they also warn that the use of AI in assessments carries risks.

“AI is going to eat assessments for lunch,” said Ulrich Boser, a senior fellow at the Center for American Progress, where he co-authored a research series on the future of assessments. He said that standardized testing may one day become a thing of the past, because AI has the potential to personalize testing to individual students.

PISA, the influential international test, expects to integrate AI into the design of its 2029 test. Piacentini said the Organization for Economic Cooperation and Development, which runs PISA, is exploring the possible use of AI in several realms.

  • It plans to evaluate students on their ability to use AI tools and to recognize AI-generated information.
  • It’s evaluating whether AI could help write test questions, which could potentially be a major money and time saver for test creators. (Big test makers like Pearson are already doing this, he said.)
  • It’s considering whether AI could score tests. According to Piacentini, there’s promising evidence that AI can accurately and effectively score even relatively complex student work.  
  • Perhaps most significantly, the organization is exploring how AI could help create tests that are “much more interesting and much more authentic,” as Piacentini puts it.

When it comes to using AI to design tests, there are all sorts of opportunities. Career and tech students could be assessed on their practical skills via AI-driven simulations: For example, automotive students could participate in a simulation testing their ability to fix a car, Piacentini said.

Right now those hands-on tests are incredibly intensive and costly – “it’s almost like shooting a movie,” Piacentini said. But AI could help put such tests within reach for students and schools around the world.

AI-driven tests could also do a better job of assessing students’ problem-solving abilities and other skills, he said. It might prompt students when they’d made a mistake and nudge them toward a better way of approaching a problem. AI-powered tests could evaluate students on their ability to craft an argument and persuade a chatbot. And they could help tailor tests to a student’s specific cultural and educational context.

“One of the biggest problems that PISA has is when we’re testing students in Singapore, in sub-Saharan Africa, it’s a completely different universe. It’s very hard to build a single test that actually works for those two very different populations,” said Piacentini. But AI opens the door to “construct tests that are really made specifically for every single student.”

That said, the technology isn’t there yet, and educators and test designers need to tread carefully, experts warn. During a recent panel Javeria moderated, Nicol Turner Lee, director of the Center for Technology Innovation at the Brookings Institution, said any conversation about AI’s role in assessments must first acknowledge disparities in access to these new tools.

Many schools still use paper products and struggle with spotty broadband and limited digital tools, she said: The digital divide is “very much part of this conversation.” Before schools begin to use AI for assessments, teachers will need professional development on how to use AI effectively and wisely, Turner Lee said.

There’s also the issue of bias embedded in many AI tools. AI is often sold as if it’s “magic,”  Amelia Kelly, chief technology officer at SoapBox Labs, a software company that develops AI voice technology, said during the panel. But it’s really “a set of decisions made by human beings, and unfortunately human beings have their own biases and they have their own cultural norms that are inbuilt.”

With AI at the moment, she added, you’ll get “a different answer depending on the color of your skin, or depending on the wealth of your neighbors, or depending on the native language of your parents.”  

But the potential benefits for students and learning excite experts such as Kristen Huff, vice president of assessment and research at Curriculum Associates, where she helps develop online assessments. Huff, who also spoke on the panel, said AI tools could eventually not only improve testing but also “accelerate learning” in areas like early literacy, phonemic awareness and early numeracy skills. Huff said that teachers could integrate AI-driven assessments, especially AI voice tools, into their instruction in ways that are seamless and even “invisible,” allowing educators to continually update their understanding of where students are struggling and how to provide accurate feedback.

PISA’s Piacentini said that while we’re just beginning to see the impact of AI on testing, the potential is great and the risks can be managed.  

“I am very optimistic that it is more an opportunity than a risk,” said Piacentini. “There’s always this risk of bias, but I think we can quantify it, we can analyze it, in a better way than we can analyze bias in humans.”

This story about AI testing was produced by The Hechinger Report, a nonprofit, independent news organization focused on inequality and innovation in education. Sign up for Hechinger’s newsletter.

The post How AI could transform the way schools test kids appeared first on The Hechinger Report.

]]>
https://hechingerreport.org/how-ai-could-transform-the-way-schools-test-kids/feed/ 0 99994