Business Daily Media

Times Advertising

.

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.

The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.
A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."


Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.

News from Asia

Esperanza Partners with One Cool Stage to Open New Financing Frontiers for Hong Kong’s Cultural Assets Esperanza

Hong Kong theatre IP “The Big Big Day” anchors the next phase of fan participation and entertainment asset tokenization, with qualifying projects potential exceeding HK$2 billion in scale HONG KONG...

Bora Navigates A Transitional 1Q26 And Sets A Strong Foundation For Rest Of The Year

Transformational Acquisitions Expected to Contribute to Long Term Growth Starting 2Q26 HONG KONG SAR - Media OutReach Newswire – 13 May 2026 – Bora Pharmaceuticals ("Bora"; TWSE: 6472; OTCQX: BORA...

Masterise Group Launches One Central Saigon

Vietnam’s First Mixed-Use Landmark Destination Set opposite Ben Thanh Market, One Central Saigon is a landmark mixed-use destination where luxury living, hospitality, retail and business converge ...

World’s First Lightweight High Resolution High Precision CO₂ and CH₄ Point Source Detector "MUSICO" Arrives at Tiangong Space Station via Tianzhou-10

HKUST Led Project Marks Hong Kong’s First Scientific Payload on the National Space Station HONG KONG SAR - Media OutReach Newswire - 13 May 2026 - The world's first lightweight, high‑resolution, h...

Student Life and Campus Communities Are Becoming a Key Deciding Factor in Higher Education

SINGAPORE- Media OutReach Newswire - 14 May 2026 - Beyond academic rankings and programme offerings, students today are placing increasing emphasis on campus life, wellbeing support, and a sense o...

Siam Piwat redefines global retail with NEXTOPIA, a future prototype where sustainability is a transformative force for business, people, and the planet

BANGKOK, THAILAND - Media OutReach Newswire - 12 May 2026 - In a bold bid to reshape the future of retail, Siam Piwat Group has unveiled NEXTOPIA at its flagship Siam Paragon, the world's first ...

CPF NH Foods Advances to Production Phase, Launches ‘CP Nippon’ Brand in Premium Processed Pork Partnership

BANGKOK, THAILAND - Media OutReach Newswire - 14 May 2026 - Charoen Pokphand Foods Public Company Limited (CPF) and NH Foods Ltd. today announced that their joint venture, CPF NH Foods Co...

Banyan Group Residences Brings Latest Phuket Property Launches to Singapore

Three new Laguna Phuket developments – spanning lakeside living, golf-front design and Angsana-branded luxury – to be showcased at Fairmont Singapore on 23–24 May SINGAPORE - Media OutReach Newswi...

Kenanga Futures Launches "Shining in Global Futures" Campaign, Expanding Retail Access to Global Derivatives Markets

KUALA LUMPUR, MALAYSIA - Media OutReach Newswire - 14 May 2026 - Kenanga Futures Sdn Bhd ("Kenanga Futures") is proud to launch its inaugural 2026 campaign, "Shining in Global Futures," running un...

Cushman & Wakefield Greater China Wins Six Best Five-Star Awards at 2026 Asia Pacific Property Awards

HONG KONG SAR - Media OutReach Newswire - 14 May 2026 — Cushman & Wakefield, a leading global real estate services firm, has again been recognized at the Asia Pacific Property Awards, scoring ...

Sectorial ATO Tax Debt Disclosures Rise, Overall Business Credit Demand Flattens and High-Risk SME 'Credit Shopping' hits 8-month peak

Q1 2026 Equifax Business Market Pulse shows low-risk borrowers consolidate demand enquiries while sub-prime entities accelerate shopping activity ...

SME support in Federal Budget falls short of easing business pressures

“The Federal Budget delivered several measures aimed at supporting small businesses, including making the instant asset write-off permanent, exten...

Bunji dog treats to hit Ritchies shelves

Cooee Native Superfoods’ Bunji range of dog kibble and treats is rolling out across Ritchies Supermarkets now, with stock already on shelves in se...

Pre-Budget Expectations

“Australian corporates and SMBs are under pressure. Competition from global players is intensifying, margins are under strain, and technology adop...

“Time is running out to get Payday ready,” Brighter Super urges

Superannuation fund Brighter Super is encouraging business owners to prepare now for Payday Super, ahead of the new laws taking effect from 1 July...

PayNuts Unveils Expanded Integrated Solutions and Refreshed Brand to Support Australian SMEs

PayNuts, one of Australia’s fastest-growing payment service providers, has unveiled a refreshed brand identity and an expanded suite of integrated b...