Eye-Tracking the Cognitive Load of GPT-4o vs Human-Translated Arabic Subtitles

RealEye
August 11, 2025

As AI makes its way into nearly every corner of media production, subtitling is no exception. But how do viewers actually experience these machine-generated subtitles? A recent peer-reviewed study, "Through the Eyes of the Viewer: The Cognitive Load of LLM-Generated vs. Professional Arabic Subtitles" by Hussein Abu-Rayyash and Isabel Lacruz (Kent State University), put that question to the test - using RealEye's web-based eye-tracking platform to measure the cognitive load of Arabic-speaking viewers watching AI- vs human-translated subtitles.

Here’s what the researchers discovered - and why it matters.

The Research Question

The study set out to compare GPT-4o-generated Arabic subtitles with professionally created human translations, measuring which type imposes more cognitive load on viewers. This matters for two key reasons:

  1. Accessibility: Subtitles are essential for millions of viewers around the world.
  2. Quality vs. Speed: AI-generated subtitles offer speed - but do they support smooth, effortless viewing?

The Setup

Participants (82 native Arabic speakers) were shown the same 10-minute episode from the BBC comedy State of the Union. They were split into two groups:

  • One watched with Amazon Prime’s human-translated Arabic subtitles.
  • The other saw subtitles automatically generated by GPT-4o, using no post-editing.

RealEye was used to track viewers' gaze in real-time via standard webcams. This setup enabled the researchers to remotely and non-invasively collect accurate data on:

  • Fixation count (how often viewers looked at subtitles)
  • Fixation duration (how long they looked)
  • Gaze distribution (subtitle vs picture area)
  • K-coefficient (a proxy for attention intensity)

What Did RealEye Reveal?

The data painted a clear picture: AI-generated subtitles caused significantly more cognitive strain.

  • +48% more fixations in the subtitle area with GPT-4o subtitles.
  • +56% longer fixation durations, suggesting more processing effort.
  • +81.5% more time spent reading subtitles, pulling focus from the visual narrative.
  • The K-coefficient tripled (from 0.10 to 0.30), signaling deeper mental concentration on subtitle reading.

In other words, GPT-4o subtitles didn’t just look readable - they made people work harder to read them.

Eye-Tracking Metrics by Proficiency Level and Condition

Why AI Subtitles Took More Effort?

Despite fluent surface quality, GPT-4o struggled with:

  • Cultural references (“pint of London Pride” became “a large bundle of peanuts”)
  • Idioms and metaphors (e.g., misrendering of “effete” or famous quotes)
  • Sexual and humorous expressions
  • Poetic language (e.g., Dylan Thomas’s “Do not go gentle into that good night”)

These issues forced viewers to pause, reread, and mentally reconcile mismatched meanings - especially those with higher English proficiency, who were more sensitive to translation flaws.

Language Proficiency Amplified the Effect

One of the most striking results: the higher the viewer’s English proficiency, the greater the cognitive disruption from GPT-4o subtitles. Advanced users spotted errors more easily - and spent more time trying to make sense of them. Ironically, AI subtitles may be more frustrating for the people best equipped to understand them.

What This Means for the Future

This study, is one of the first to offer quantitative evidence that LLM-generated subtitles, while fast, still carry a hidden cost: viewer effort. The findings emphasize that:

  • Surface fluency is not enough. Viewers sense when something's “off,” even if it’s grammatically correct.
  • Human subtitlers remain essential - especially for complex, emotional, or culturally nuanced content.
  • Remote eye tracking is a powerful tool for understanding how people really engage with content.

As AI-generated subtitles proliferate across streaming services and educational content, it's essential to ask: Are they actually helping - or are they silently straining our viewers?

This study shows that RealEye can reveal the subtle, invisible ways in which AI output impacts the user experience. It also reminds us that in translation, nuance matters - and the human eye knows it.

Interested in using RealEye for education research?

Check out RealEye offer for Education!

You can run a similar study!

Follow the steps below to start your own experiment with RealEye:

  1. Go to RealEye Dashboard and create or log in to your account.
  2. Purchase the License of your choice (https://www.realeye.io/pricing). If you need any custom adjustments, contact us at contact@realeye.io. We are happy to help!
  3. Activate your license by following the instructions in the RealEye License Activation Guide

Ready to set up your own study? Visit RealEye Support page to learn more and keep us posted on your results! 🚀

Other Blog Posts: