ModelZhipu AI (GLM)Zhipu AI (GLM)published Dec 11, 2025seen 5d

zai-org/RealVideo

Open original ↗

Captured source

source ↗
published Dec 11, 2025seen 5dcaptured 16hhttp 200method plaintask any-to-anylicense mitdownloads 0likes 104

RealVideo

RealVideo is a WebSocket-based video calling system that supports text input. It leverages GLM-4.5-AirX and GLM-TTS models to generate audio responses and utilizes autoregressive diffusion to generate corresponding video frames. The system features a modular design with full functionality and a clean code structure. Visit blog here!

Features

  • Text Input: Supports text message input.
  • AI Voice Response: Integrates GLM-4.5-AirX and GLM-TTS models to generate voice responses.
  • Lip Sync: Generates real-time conversational video based on any input image and audio.
  • Real-time Communication: WebSocket-based real-time bidirectional communication.

Quick Start

you can check in our GitHub.

Technical Highlights

  • Model Integration: Allows for convenient and quick voice cloning, taking text input to generate audio output.
  • Modular Design: Clear code structure, easy to maintain and extend.
  • Real-time Performance: Optimized audio processing and real-time video generation algorithms.

Acknowledgements

This project utilizes the following open-source libraries:

Notability

notability 6.0/10

Notable model release from Zhipu, but unknown traction