zeroscope_v2_576w#
README(From Huggingface)#
pipeline_tag: text-to-video license: cc-by-nc-4.0#

zeroscope_v2 576w#
A watermark-free Modelscope-based video model optimized for producing high-quality 16:9 compositions and a smooth video output. This model was trained from the original weights using 9,923 clips and 29,769 tagged frames at 24 frames, 576x320 resolution.
zeroscope_v2_567w is specifically designed for upscaling with zeroscope_v2_XL using vid2vid in the 1111 text2video extension by kabachuha. Leveraging this model as a preliminary step allows for superior overall compositions at higher resolutions in zeroscope_v2_XL, permitting faster exploration in 576x320 before transitioning to a high-resolution render. See some example outputs that have been upscaled to 1024x576 using zeroscope_v2_XL. (courtesy of dotsimulate)
zeroscope_v2_576w uses 7.9gb of vram when rendering 30 frames at 576x320
Using it with the 1111 text2video extension#
Download files in the zs2_576w folder.
Replace the respective files in the 'stable-diffusion-webui\models\ModelScope\t2v' directory.
Upscaling recommendations#
For upscaling, it's recommended to use zeroscope_v2_XL via vid2vid in the 1111 extension. It works best at 1024x576 with a denoise strength between 0.66 and 0.85. Remember to use the same prompt that was used to generate the original clip.
Usage in 🧨 Diffusers#
Let's first install the libraries required:
$ pip install diffusers transformers accelerate torch
Now, generate a video:
import paddle
from diffusers import DiffusionPipeline, DPMSolverMultistepScheduler
from diffusers.utils import export_to_video
pipe = DiffusionPipeline.from_pretrained("cerspense/zeroscope_v2_576w", dtype=paddle.float16)
pipe.scheduler = DPMSolverMultistepScheduler.from_config(pipe.scheduler.config)
pipe.enable_model_cpu_offload()
prompt = "Darth Vader is surfing on waves"
video_frames = pipe(prompt, num_inference_steps=40, height=320, width=576, num_frames=24).frames
video_path = export_to_video(video_frames)
Here are some results:
Known issues#
Lower resolutions or fewer frames could lead to suboptimal output.
Thanks to camenduru, kabachuha, ExponentialML, dotsimulate, VANYA, polyware, tin2tin
Model Files#
README.md (3.2 KB)
model_index.json (407.0 B)
scheduler/scheduler_config.json (530.0 B)
text_encoder/config.json (655.0 B)
text_encoder/model_state.pdparams (649.3 MB)
tokenizer/merges.txt (512.3 KB)
tokenizer/special_tokens_map.json (377.0 B)
tokenizer/tokenizer_config.json (826.0 B)
tokenizer/vocab.json (1.0 MB)
unet/config.json (845.0 B)
unet/model_state.pdparams (2.6 GB)
vae/config.json (814.0 B)
vae/model_state.pdparams (159.6 MB)