Business Daily Media

Times Advertising

.

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.

The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.
A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."


Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.

News from Asia

Student bonds are built at SIM from orientation to graduation

SINGAPORE - Media OutReach Newswire - 22 April 2026 - As universities adapt to hybrid learning and evolving workforce expectations, student relationships are increasingly being treated not as inci...

Tropicana Twister Builds Homes That Could Change Lives for Generations

Gandakan Kebaikan transforms nationwide kindness into real homes, delivering meaningful and lasting impact. KUALA LUMPUR, MALAYSIA - Media OutReach Newswire - 22 April 2026 - Tropicana Twister's ...

Fiuu Now Offers Tap to Pay on iPhone for Merchants in Malaysia to Accept Contactless Payments

An easy, secure, and private way to accept contactless payments with only an iPhone and the Fiuu Virtual Terminal iOS app, no additional hardware needed. SHAH ALAM, MALAYSIA - Media OutReach Newsw...

Sprite Lands Back in the Heart of Street Culture in ASEAN & South Pacific with Launch of New Global Platform ‘It’s That Fresh’

Rolling out across more than 15 diverse markets in ASEAN & South Pacific, Sprite’s new global platform brings a dynamic new look, an iconic brand sound, and bold cultural partnerships rooted in...

Students take climate action into their own hands at NLCS (Singapore)’s Earth Week

Earth Week brings together students and sustainability practitioners through 26 workshops; culminates in a Service and Sustainability Summit to explore how early action can shape a more sustainable...

Dell Technologies Highlights AI PCs and Workstations as the Next Phase of Enterprise AI in Asia Pacific

The AI Compute Continuum: from intelligent endpoints to advanced AI workloads SINGAPORE - Media OutReach Newswire - 22 April 2026 - Dell Technologies (NYSE: DELL) today outlined how enterprise AI ...

TAT unveils the inspiration behind "feel all the feelings," showcasing "LISA's" attire and local artisanal products intertwining Thai heritage and culture within every scene

BANGKOK, THAILAND - Media OutReach Newswire - 22 April 2026 - The Tourism Authority of Thailand (TAT) has unveiled the official behind-the-scenes VDO of its latest TVC, "feel all the feelin...

Media OutReach Newswire Appoints Pamela Phua as Managing Partner, Southeast Asia to Champion Singapore and Southeast Asian Brand Expansion into Global Markets

SINGAPORE - Media OutReach Newswire - 14 April 2026 - Media OutReach Newswire, Asia Pacific's first and only global newswire, has appointed Ms Pamela Phua as Managing Partner, Southeast Asia. This...

Policy20 at Money20/20 Asia 2026: Asia’s Leaders Call for Co-Creation as Finance Enters a New Era of Sovereign Intelligence

BANGKOK, THAILAND - Media OutReach Newswire - 22 April 2026 - Money20/20, the world's leading fintech show and the place where money does business, is hosting Policy20 as part of Money20/20 Asia i...

From Coal Mining to Agriculture: An Ecological Initiative at a Chinese Coal Mine

ORDOS, CHINA - Media OutReach Newswire - 22 April 2026 - On April 20, 2026, a new batch of watermelon seedlings was transplanted in the greenhouses within the reclamation area of Minda Coal Mine, ...

PayNuts Unveils Expanded Integrated Solutions and Refreshed Brand to Support Australian SMEs

PayNuts, one of Australia’s fastest-growing payment service providers, has unveiled a refreshed brand identity and an expanded suite of integrated b...

BizCover Brings Australia’s First AI-Based Insurance Quotes to ChatGPT

Australian small business owners can now receive and compare business insurance quotes directly inside ChatGPT, in a move that signals a major shi...

VistaPrint Research Reveals Australian Small Businesses Face a Succession Cliff

With only 16% of retiring small businesses having a succession plan, tens of thousands risk closure as one in three owners nears retirement.  Ne...

Corporate volunteering grows up: how companies are shifting to meaningful, community-led impact

As workplaces settle into the new year and look for ways to strengthen culture, capability and connection, experts say corporate volunteering is e...

The Rise of Mobile-First Venues

Global Hospitality Platform, Tabit, Reveals Five Ways to Maximise Benefits of Mobile-First Systems  As Australian hospitality venues grapple with...

Why the SME is now the primary engine of global cybercrime

For over a decade, the most practical and effective advice we could offer an employee was to spot the typo. It was practical, it was free, and it wo...