How to Install Z-Image Turbo Locally

This guide explains how to set up Z-Image Turbo on your local machine. This model generates images using a 6B-parameter architecture. The process requires a computer with a powerful graphics card. You will use Python and a terminal interface.

Hardware Requirements

You need specific hardware to run this model effectively. A GPU with 16 GB of VRAM is necessary. Recent consumer cards or data center cards work best. Lower memory devices might work but will be slower. You also need Python 3.9 or a newer version. Ensure you have a working installation of CUDA.

Create a Virtual Environment

Isolate your project dependencies. This prevents conflicts with other Python projects. Open your terminal application. Run the command below to create a new environment named zimage-env.

python -m venv zimage-env

Activate the environment to begin.

# On Linux or macOS
source zimage-env/bin/activate

# On Windows
zimage-env\Scripts\activate

Install PyTorch and Libraries

You must install a version of PyTorch that supports your GPU. The command below targets CUDA 12.4. Adjust the index URL if you use a different CUDA version. You also need to install the diffusers library directly from the source. Install the transformers library. You also need the accelerate and safetensors packages.

pip install torch --index-url https://download.pytorch.org/whl/cu124
pip install git+https://github.com/huggingface/diffusers
pip install transformers accelerate safetensors

Load the Z-Image Turbo Pipeline

Create a Python script to load the model. You will use the ZImagePipeline class. This wrapper handles the necessary components. The code loads the model from the Hugging Face repository. It uses bfloat16 precision to save memory. The final step moves the pipeline to the CUDA device.

import torch
from diffusers import ZImagePipeline

pipe = ZImagePipeline.from_pretrained(
    "Tongyi-MAI/Z-Image-Turbo",
    torch_dtype=torch.bfloat16,
    low_cpu_mem_usage=False,
)
pipe.to("cuda")

Generate an Image

You can now generate an image. Define a text prompt to describe the scene. Set the image dimensions to 1024 by 1024 pixels. The model works well with 9 inference steps. Set the guidance scale to 0.0 for this specific model. Save the resulting image to your disk.

prompt = "City street at night with clear bilingual store signs, warm lighting, and detailed reflections on wet pavement."

image = pipe(
    prompt=prompt,
    height=1024,
    width=1024,
    num_inference_steps=9,
    guidance_scale=0.0,
    generator=torch.Generator("cuda").manual_seed(123),
).images[0]

image.save("z_image_turbo_city.png")

Optimization Options

You can improve performance on supported hardware. Enable Flash Attention 2 to speed up the transformer. You can also compile the transformer module. These steps are optional.

# Switch attention backend
pipe.transformer.set_attention_backend("flash")

# Compile the model
# pipe.transformer.compile()

Low Memory Mode

Computers with limited VRAM can use CPU offloading. This moves parts of the model to system RAM when not in use. It allows the model to run on smaller GPUs. The generation process will take longer.

pipe.enable_model_cpu_offload()