🎬 VEFX-Code

Reference code & inference utilities for the VEFX-Bench benchmark — a comprehensive benchmark for evaluating text-driven video editing and visual effects.

📂 Browse Files 📖 Full README 📦 VEFX-Bench Dataset 🤖 VEFX-Reward-4B Model

📊 What's in VEFX-Bench

5,049 annotated examples spanning 9 categories & 32 subcategories, evaluated by VEFX-Reward — a VLM-based reward model that scores edits across three dimensions on a 1–4 scale:

Instructional Following (IF)

Does the edit accurately reflect the editing instruction?

Render Quality (RQ)

Visual clarity, temporal consistency, physical plausibility.

Edit Exclusivity (EE)

Were only the intended regions modified, without side-effects?

🏆 Model Leaderboard

VEFX-Reward scores on 1–4 scale. Ranked by GeoAgg (α=2 for IF, β=1 for RQ, γ=1 for EE). Higher is better.

Rank	Model	Type	IF ↑	RQ ↑	EE ↑	GeoAgg ↑
🥇	Kling o3 Omni	Commercial	3.033	3.588	3.043	3.057
🥈	Kling o1	Commercial	3.040	3.534	2.976	2.985
🥉	Runway Gen-4.5	Commercial	2.817	3.319	2.923	2.912
4	Seedance 2.0	Commercial	2.811	3.421	3.088	2.766
5	Grok Imagine	Commercial	2.606	3.346	3.376	2.723
6	Luma Ray 3	Commercial	2.702	3.403	2.705	2.717
7	UniVideo	Open-source	2.294	3.266	3.091	2.516
8	Wan 2.6	Commercial	2.012	3.317	2.446	2.146
9	Luma Ray 2	Commercial	2.038	2.532	1.363	1.804
10	VACE	Open-source	2.027	3.172	1.180	1.775

🎥 Sample Videos

An example pair from examples/sample_videos/ — original input on the left, edited output on the right.

Original

Edited

🚀 Quick Start

pip install -r requirements.txt
python examples/quick_start.py \
    --original examples/sample_videos/original.mp4 \
    --edited   examples/sample_videos/edited.mp4 \
    --instruction "Change the color of the trailer to bright yellow"

See examples/ for batch & multi-GPU scoring scripts.