Business Daily Media

Men's Weekly

.

PolyU develops novel multi-modal agent to facilitate long video understanding by AI, accelerating development of generative AI-assisted video analysis

HONG KONG SAR - Media OutReach Newswire - 10 June 2025 - While Artificial Intelligence (AI) technology is evolving rapidly, AI models still struggle with understanding long videos. A research team from The Hong Kong Polytechnic University (PolyU) has developed a novel video-language agent, VideoMind, that enables AI models to perform long video reasoning and question-answering tasks by emulating humans' way of thinking.

The VideoMind framework incorporates an innovative Chain-of-Low-Rank Adaptation (LoRA) strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis. The findings have been submitted to the world-leading AI conferences.

A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.
A research team led by Prof. Changwen Chen, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, has developed a novel video-language agent VideoMind that allows AI models to perform long video reasoning and question-answering tasks by emulating humans’ way of thinking. The VideoMind framework incorporates an innovative Chain-of-LoRA strategy to reduce the demand for computational resources and power, advancing the application of generative AI in video analysis.

Videos, especially those longer than 15 minutes, carry information that unfolds over time, such as the sequence of events, causality, coherence and scene transitions. To understand the video content, AI models therefore need not only to identify the objects present, but also take into account how they change throughout the video. As visuals in videos occupy a large number of tokens, video understanding requires vast amounts of computing capacity and memory, making it difficult for AI models to process long videos.

Prof. Changwen CHEN, Interim Dean of the PolyU Faculty of Computer and Mathematical Sciences and Chair Professor of Visual Computing, and his team have achieved a breakthrough in research on long video reasoning by AI. In designing VideoMind, they made reference to a human-like process of video understanding, and introduced a role-based workflow. The four roles included in the framework are: the Planner, to coordinate all other roles for each query; the Grounder, to localise and retrieve relevant moments; the Verifier, to validate the information accuracy of the retrieved moments and select the most reliable one; and the Answerer, to generate the query-aware answer. This progressive approach to video understanding helps address the challenge of temporal-grounded reasoning that most AI models face.

Another core innovation of the VideoMind framework lies in its adoption of a Chain-of-LoRA strategy. LoRA is a finetuning technique emerged in recent years. It adapts AI models for specific uses without performing full-parameter retraining. The innovative chain-of-LoRA strategy pioneered by the team involves applying four lightweight LoRA adapters in a unified model, each of which is designed for calling a specific role. With this strategy, the model can dynamically activate role-specific LoRA adapters during inference via self-calling to seamlessly switch among these roles, eliminating the need and cost of deploying multiple models while enhancing the efficiency and flexibility of the single model.

VideoMind is open source on GitHub and Huggingface. Details of the experiments conducted to evaluate its effectiveness in temporal-grounded video understanding across 14 diverse benchmarks are also available. Comparing VideoMind with some state-of-the-art AI models, including GPT-4o and Gemini 1.5 Pro, the researchers found that the grounding accuracy of VideoMind outperformed all competitors in challenging tasks involving videos with an average duration of 27 minutes. Notably, the team included two versions of VideoMind in the experiments: one with a smaller, 2 billion (2B) parameter model, and another with a bigger, 7 billion (7B) parameter model. The results showed that, even at the 2B size, VideoMind still yielded performance comparable with many of the other 7B size models.

Prof. Chen said, "Humans switch among different thinking modes when understanding videos: breaking down tasks, identifying relevant moments, revisiting these to confirm details and synthesising their observations into coherent answers. The process is very efficient with the human brain using only about 25 watts of power, which is about a million times lower than that of a supercomputer with equivalent computing power. Inspired by this, we designed the role-based workflow that allows AI to understand videos like human, while leveraging the chain-of-LoRA strategy to minimise the need for computing power and memory in this process."

AI is at the core of global technological development. The advancement of AI models is however constrained by insufficient computing power and excessive power consumption. Built upon a unified, open-source model Qwen2-VL and augmented with additional optimisation tools, the VideoMind framework has lowered the technological cost and the threshold for deployment, offering a feasible solution to the bottleneck of reducing power consumption in AI models.

Prof. Chen added, "VideoMind not only overcomes the performance limitations of AI models in video processing, but also serves as a modular, scalable and interpretable multimodal reasoning framework. We envision that it will expand the application of generative AI to various areas, such as intelligent surveillance, sports and entertainment video analysis, video search engines and more."


Hashtag: #PolyU #AI #LLMs #VideoAnalysis #IntelligentSurveillance #VideoSearch

The issuer is solely responsible for the content of this announcement.

News from Asia

Rhenus 4PL Solutions Brings Digital Logistics Expertise Support To The Circular Economy Initiative Of Looper Textile Co. And REMONDIS

Circular textiles at industrial scale: Looper Textile Co. collects, sorts and processes up to 150 million garments annually for reuse and recycling. Digital coordination by Rhenus...

ISCA and SHICPA Sign MOU to Strengthen Support for Accountancy Professionals and Firms in Shanghai

SINGAPORE - Media OutReach Newswire - 8 July 2025 - The Institute of Singapore Chartered Accountants (ISCA) and the Shanghai Institute of Certified Public Accountants (SHICPA) have signed a Memora...

Proscenic Launches Major Prime Day 2025 Sale with Up to 40% Off Best-Selling Vacuums Starting at €89

SHENZHEN, CHINA - Media OutReach Newswire - 8 July 2025 - Smart home innovator Proscenic is kicking off Prime Day 2025 with one of its biggest sales yet, offering up to 40% off its top-rated vacuu...

Arrow Electronics Launches Engineering Solutions Center to Support Tech Innovation Across India and Southeast Asia

BANGALORE, INDIA - Media OutReach Newswire - 9 July 2025 - Arrow Electronics, a global provider of technology solutions, today announced the launch of its new Engineering Solutions Center(ESC) in ...

XIXILI Introduces Jelly Padded Bras: A Game-Changer in Everyday Comfort

KUALA LUMPUR, MALAYSIA - Media OutReach Newswire - 9 July 2025 - Renowned for empowering women through inclusivity and thoughtful design, XIXILI announces the arrival of its latest bra collection:...

Jurassic World: The Experience Roars Into Bangkok - 8 August 2025 At Asiatique The Riverfront Destination

BANGKOK, THAILAND - Media OutReach Newswire - 9 July 2025 - Asset World Corporation (AWC), Thailand's leading integrated lifestyle real estate group, together with NEON and Universal Destinations ...

Health2Sync and Novo Nordisk Pharma Korea, Ltd. Launch Smart Cap Mallya® for FlexTouch® Insulin Pens Integration in South Korea, Enhancing insulin administration via Digital Diabetes Care

SEOUL, SOUTH KOREA - Media OutReach Newswire - 9 July 2025 - Health2Sync, Asia's leading digital chronic disease management platform, announced the successful integration of Smart Cap Mallya® for ...

E-commerce Platform Wildberries Pilots Its Own Taxi Service

MINSK, BELARUS - Media OutReach Newswire - 9 July 2025 - Wildberries, a leading e-commerce platform in Eurasia, has begun testing its own ride-hailing service in Belarus. The company expects the n...

DYXnet Awarded ISO/IEC 42001 AI Management System Certification by SGS

Leading the New Era of AI in Hong Kong's Telecommunications Industry HONG KONG SAR - Media OutReach Newswire - 9 July 2025 - DYXnet, a wholly-owned subsidiary of VNET Group, Inc...

Prince Foundation Extends Support for Children’s Healthcare in Cambodia Through Cambodia Kantha Bopha Foundation, Reflecting Commitment Guided by Neak Oknha Chen Zhi

PHNOM PENH, CAMBODIA - Media OutReach Newswire - 9 July 2025 - Prince Foundation, the philanthropic arm of Prince Holding Group under the leadership of Neak Oknha Chen Zhi, has reaffirmed its ong...

How to ensure your manufacturing business survives international tariff turmoil

Optimising your operations in FY2026 will help you combat the challenges of a volatile trading environment. Up, down, in out…Since the commence...

Why Apptio is Enhancing Visibility into AI and Hybrid Cloud

AI investments have become a strategic priority for business with the mindset that if you're not using AI, you're falling behind. But according to...

Beyond borders: Building a scalable strategy for international hiring

For many Australian businesses, growth increasingly depends on thinking beyond local borders.  As wage pressures rise, and specialised talent pool...

The Next Generation of Maritime Sustainable Solutions

As organizations globally seek innovative ways to improve sustainability and their impact on Earth, the American Waterways Operators (AWO), a lead...

Demand for Home Batteries surges as Federal Rebate Kicks In

A leading provider of energy solutions VoltX Energy has seen a 400% increase in demand for home batteries in the past three weeks as people put d...

Why Sport Remains the Safest Bet in an Uncertain World

When Rome was in crisis, its leaders did not retreat to the Senate. They went to the circus. To the chariot races. To the gladiators. Sport was no...

Sell by LayBy