๐ŸŽฌ VEFX-Code

Reference code & inference utilities for the VEFX-Bench benchmark โ€” a comprehensive benchmark for evaluating text-driven video editing and visual effects.

๐Ÿ“‚ Browse Files ๐Ÿ“– Full README ๐Ÿ“ฆ VEFX-Bench Dataset ๐Ÿค– VEFX-Reward-4B Model

๐Ÿ“Š What's in VEFX-Bench

5,049 annotated examples spanning 9 categories & 32 subcategories, evaluated by VEFX-Reward โ€” a VLM-based reward model that scores edits across three dimensions on a 1โ€“4 scale:

Instructional Following (IF)

Does the edit accurately reflect the editing instruction?

Render Quality (RQ)

Visual clarity, temporal consistency, physical plausibility.

Edit Exclusivity (EE)

Were only the intended regions modified, without side-effects?

๐Ÿ† Model Leaderboard

VEFX-Reward scores on 1โ€“4 scale. Ranked by GeoAgg (ฮฑ=2 for IF, ฮฒ=1 for RQ, ฮณ=1 for EE). Higher is better.

RankModelTypeIF โ†‘RQ โ†‘EE โ†‘GeoAgg โ†‘
๐Ÿฅ‡Kling o3 OmniCommercial3.0333.5883.0433.057
๐ŸฅˆKling o1Commercial3.0403.5342.9762.985
๐Ÿฅ‰Runway Gen-4.5Commercial2.8173.3192.9232.912
4Seedance 2.0Commercial2.8113.4213.0882.766
5Grok ImagineCommercial2.6063.3463.3762.723
6Luma Ray 3Commercial2.7023.4032.7052.717
7UniVideoOpen-source2.2943.2663.0912.516
8Wan 2.6Commercial2.0123.3172.4462.146
9Luma Ray 2Commercial2.0382.5321.3631.804
10VACEOpen-source2.0273.1721.1801.775

๐ŸŽฅ Sample Videos

An example pair from examples/sample_videos/ โ€” original input on the left, edited output on the right.

Original

Edited

๐Ÿš€ Quick Start

pip install -r requirements.txt
python examples/quick_start.py \
    --original examples/sample_videos/original.mp4 \
    --edited   examples/sample_videos/edited.mp4 \
    --instruction "Change the color of the trailer to bright yellow"

See examples/ for batch & multi-GPU scoring scripts.