AI vs fake news: how combined text, image, and video analysis is making detection smarter

Access to information has reached an unprecedented scale, speed, and convenience, particularly online. However, while this widespread accessibility brings many benefits, it has also given rise to a growing threat: fake news.

Artificial Intelligence

16 January 2025

Although fake news is far from a 21st-century phenomenon—emerging 500 years ago with the invention of the printing press. However, with over 2 million news articles published on the web daily—it’s hard to deny that the internet has made distinguishing fact from fiction more challenging than ever.

The rise of AI adds another layer of complexity to this problem.

AI doesn’t just accelerate the production of news—it also plays a role in spreading misinformation—with over 1,150 (and counting) unreliable AI-generated news and information websites, underscoring the growing challenge of distinguishing credible sources from misleading ones (NewsGuard).

However, AI is also speeding up the development of faster and more effective solutions to detect and prevent the spread of fake news, particularly through multimodal AI, which combines text, image, and video analysis.

In this blog, we’ll look at:

What is fake news and why is its detection becoming increasingly difficult?

The role of multimodal AI in fake news detection

AI vs humans: Why we need AI to spot fake news

AI and humans: Why AI complements human judgment

PAPER: Multimodal Attention is all you need

What is fake news and why is its detection becoming increasingly difficult?

Fake news refers to false or misleading information presented as news, often produced to deceive, manipulate or simply create sensationalism. Ranging from fabricated stories, distorted facts, or biased reporting, fake news spans various media—including text, video, audio, images, and more.

Particularly with the advancement of technology, fake news is becoming incredibly hard to detect, for a number of reasons:

Volume and speed: With millions of news articles and content generated daily, it’s challenging to keep up with the sheer amount of information circulating online.
Advanced AI and automation: AI tools, including deepfake technology, can create highly convincing fake news articles, images, and videos, making it harder to differentiate between authentic and fabricated content.
Bias and echo chambers: Social media platforms and search engines often prioritize content based on algorithms that reinforce users’ existing beliefs, which can spread misinformation within closed networks.
Multimodal content: Fake news isn’t limited to text. Images, videos, and even audio can be manipulated, complicating detection efforts, especially when different forms of media are used to support misleading narratives.
Human behavior: People are often drawn to sensational or emotionally charged content, which can spread faster than fact-checked news, further complicating efforts to curb misinformation—particularly when it comes to sensitive subjects like politics, health, and social issues.

The role of multimodal AI in fake news detection

Multimodal AI refers to artificial intelligence systems that can process and analyze multiple types of data—such as text, images, audio, and video—simultaneously to gain a more comprehensive understanding of the information.

Unlike traditional AI, which typically focuses on a single modality (like text alone), multimodal AI combines inputs from different sources to create richer, more accurate insights.

When it comes to fake news detection, multimodal AI works by analyzing both the content and context of information across different media formats.

Here’s the way it works:

Text analysis: Multimodal AI starts by processing the written content, looking for patterns, language cues, and inconsistencies that might indicate misinformation. It uses natural language processing (NLP) techniques to detect anomalies, misleading statements, or inconsistencies with verified sources.
Image and video verification: Fake news often includes modified images or videos designed to mislead the audience. Multimodal AI uses computer vision techniques to analyze visual content for signs of manipulation, such as image forensics to detect pixel-level alterations or deepfake technology to spot altered videos.
Cross-referencing multiple modalities: Multimodal AI integrates text, visual, and auditory data to cross-reference and verify information. For instance, if a news article cites a video, the system checks both the credibility of the article and the authenticity of the video to assess the overall reliability of the content.
Contextual understanding: By combining various forms of data, multimodal AI can also understand the context in which information is presented. It can detect whether an image or video is used out of context to support false claims, enhancing its ability to spot misleading or fake news.

AI vs humans: Why we need AI to spot fake news

As the volume of online content continues to explode, humans are struggling to keep up with the sheer scale and speed of information flow.

Traditional fact-checking methods and human intuition alone are no longer sufficient to distinguish between real and fake news at the necessary pace.

This is why AI is proving to be indispensable for fake news detection.

AI can process vast amounts of data in real time, identifying patterns and inconsistencies that might go unnoticed by the human eye.

Machine learning algorithms can quickly scan articles, images, and videos, detecting misleading content, linguistic anomalies, or signs of manipulation.

AI’s ability to sift through enormous volumes of content makes it a crucial tool in the fight against fake news, ensuring faster detection and greater coverage than any human team could achieve on its own.

Moreover, AI can continuously learn and improve its detection methods, adapting to emerging trends in misinformation, which makes it an essential ally in the battle to maintain the integrity of online information.

AI and humans: Why AI complements human judgment

AI is changing the game in fake news detection. However, there’s another crucial element in the equation: human judgement.

While AI is incredibly powerful at detecting patterns and processing data at scale, it lacks the nuanced understanding and contextual awareness that humans bring to the table.

Fake news detection requires more than just spotting inconsistencies or analyzing language—it involves understanding context, emotions, and intent, which is something that AI can struggle with.

This is where human judgment is vital.

Humans can critically evaluate the broader social, cultural, and political implications of a piece of content.

We can interpret complex language nuances, understand humor, sarcasm, and satire, and spot subtleties that AI might misinterpret.

Furthermore, humans can make moral and ethical judgments about what content should be flagged or censored, a decision-making process that goes beyond the capabilities of AI alone.

AI and humans working together create a more robust approach to fake news detection. AI can handle the heavy lifting of processing and analyzing data, while humans can provide the context, critical thinking, and ethical oversight necessary to make accurate, informed decisions about the news we consume.

The following are some of the fields that can benefit from multimodal AI for fake news detection:

Social media platforms: By analyzing text, images, and videos, AI flags manipulated content, helping platforms address misinformation quickly.

News organizations: AI cross-references text, visuals, and audio to verify content and ensure credibility.

Tech companies: AI detects deceptive ads, deepfakes, and manipulated media, maintaining platform integrity.

Public health organizations: AI analyzes posts and media to identify and flag misleading health claims.

E-Commerce platforms: Multimodal AI helps platforms detect fake reviews and counterfeit listings by analyzing text and images together.

Education: Universities use multimodal AI to verify academic work, identifying plagiarism or manipulated visuals in research papers.

Law enforcement: Law enforcement agencies use multimodal AI to uncover disinformation campaigns by linking fake news, images, and videos.

[PAPER] Smarter fake news detection: Main findings from the Multimodal Attention is all you need

“Multimodal Attention is All You Need” is a critical paper co-authored by Cristina Giannone, Lead AI Engineer at Almawave, and Marco Saioni from Marconi University. Presented at the Tenth Anniversary of the Italian Conference on Computational Linguistics (CLiC-it) in Pisa, the paper introduces an innovative approach to fake news detection by integrating both text and images using a cross-attention mechanism.

This cross-attention technique enables the model to simultaneously focus on the most relevant aspects of both textual and visual content, rather than analyzing them separately. By combining these modalities, the model enhances its ability to detect subtle patterns indicative of misinformation.

To build the system, the researchers used BERT for processing and understanding text and ResNet for analyzing images. This dual approach allows the model to detect relationships between text and visuals that might go unnoticed by single-modality models.

The model was tested using the MULTI-Fake-DetectiVE dataset from Evalita 2023, a competition designed to evaluate the truthfulness of multimodal news content. The results showed that the cross-attention model outperformed both text-only and image-only models, as well as the winning model of the Evalita 2023 challenge. By cross-referencing information from both text and images, the model significantly improved accuracy in detecting fake news.

The findings from this paper highlight the value of combining multiple modalities in news analysis and demonstrate how the cross-attention mechanism enhances the system’s ability to understand and verify content. The authors also suggest further exploration of tools like Visual Transformers to refine the model’s capabilities. This research represents a major step forward in the fight against misinformation, showcasing the potential of multimodal AI in tackling one of today’s most pressing challenges.

[DOWNLOAD THE PAPER]

AI is changing the landscape of all industries. Learn more about Almwave’s AI-driven solutions are making it happen.

WE ARE

Highlights Almawave – April 2026

Almawave’s CEO receives the Italia Informa Award for her contribution to Italian Excellence

Central Government

Finance & Banking

Healthcare

Tourism

Municipality

Energy & Utilities

Infrastructure & Transportation

Telco & Media

Highlights

Velvet

AIWave

Generative AI

RAG

Velvet

NLQ

Group Platforms

AIWave

DataPortal.AI

D/AI Destinations

SWMS

SGMS

AIWave Cognitive Services

Omnichannel Exchange

Conversation

Speech & Voice

Discovery

Comprehension

AIWave AI Applications

Case Automation

Conversation Studio

Discovery Experience

Interaction Analytics

Data & GIS

Data & GIS

The Data Appeal Company

Sistemi Territoriali

Trusted Knowledge

Trusted Knowledge

OBDA Systems

Velvet

AIWave

RAG

Velvet

NLQ

AIWave

DataPortal.AI

D/AI Destinations

SWMS

SGMS

Omnichannel Exchange

Conversation

Speech & Voice

Discovery

Comprehension

Case Automation

Conversation Studio

Discovery Experience

Interaction Analytics

Data & GIS

The Data Appeal Company

Sistemi Territoriali

Trusted Knowledge

OBDA Systems

Highlights Almawave – April 2026

Almawave’s CEO receives the Italia Informa Award for her contribution to Italian Excellence

IBM and Almawave: Technology Agreement to Accelerate the Adoption of AI and Data Governance in Italian Enterprises

AI vs fake news: how combined text, image, and video analysis is making detection smarter

What is fake news and why is its detection becoming increasingly difficult?

The role of multimodal AI in fake news detection

AI vs humans: Why we need AI to spot fake news

AI and humans: Why AI complements human judgment

[PAPER] Smarter fake news detection: Main findings from the Multimodal Attention is all you need