https://github.com/Tencent-Hunyuan/HunyuanVideo-Av...
Created on: June 11, 2025
Created on: June 11, 2025
https://github.com/Tencent-Hunyuan/HunyuanVideo-Avatar
Is there any API for this model in repicate/fal.ai or alike platforms, if not, is there any other model API to make one Avator image to video?
Short answer: Fal.ai already exposes an official fal-ai/hunyuan-avatar
endpoint that turns a single avatar image + audio into a talking-head video. Replicate, by contrast, only hosts the text-to-video model tencent/hunyuan-video
and a few unofficial fine-tunes, so there is no first-party HunyuanVideo-Avatar API there. If you just need an “image-to-video avatar” pipeline, several other public APIs—SadTalker, VideoReTalking, AnimateDiff, HeyGen, A2E, and Fal.ai’s Kling or PixVerse—can fill the gap.
Platform | Official Avatar model? | Notes |
---|---|---|
Fal.ai | Yes – fal-ai/hunyuan-avatar | Accepts one image & audio; playground + REST/SDK; ~8 min/5 s video, about $1.40 per run fal.ai |
Replicate | No. Only tencent/hunyuan-video (text→video) and community fine-tunes | tencent/hunyuan-video ≈104 k runs, full API; but it requires a text prompt, not an avatar image replicate.com. A proof-of-concept fine-tune lucataco/hunyuan-heygen-woman exists, but it is not the official Avatar model replicate.com |
Model / Provider | What it does | Typical inputs | Cost / speed (public default) |
---|---|---|---|
SadTalker – Replicate | Audio-driven talking-head from one photo | image + audio | ≈$0.19 / ~2 min for 1 clip replicate.com |
VideoReTalking – Replicate | Lip-sync or re-voice an existing face video; also works from one still | image or video + audio | ≈1 replicate.com |
AnimateDiff – Replicate | Adds short looped motion to a still image (no audio) | image (+prompt) | ≈$0.013, ~10 s per 16-frame clip replicate.com |
Kling 2.1 / PixVerse / Veo 3 – Fal.ai | High-quality image-to-video (no audio) with realistic motion | image (+prompt) | Tiered ($) endpoints shown in Fal gallery fal.ai |
HeyGen API | Commercial avatar generator; create, script, translate & lip-sync videos | image/photo avatar + text or audio | Usage-based pricing; full REST API (/avatar-video ) docs.heygen.com docs.heygen.com docs.heygen.com |
A2E Image-to-Video API | End-to-end avatar, lip-sync, voice-clone | image + audio or text | Free tier + paid plans; REST endpoints a2e.ai |
bashcurl -X POST https://fal.run/fal-ai/hunyuan-avatar \ -H "Authorization: Key $FAL_API_KEY" \ -F "image_url=https://example.com/avatar.jpg" \ -F "audio_url=https://example.com/voice.mp3" \ -F "emotion=happy" \ -F "fps=25"
The response JSON contains an output_with_audio.mp4
URL—download or stream in your app fal.ai.
Use Fal.ai’s hunyuan-avatar
if you want the exact model right now. Otherwise, pick from Replicate (SadTalker, VideoReTalking, AnimateDiff) or commercial APIs like HeyGen and A2E depending on whether you need lip-sync, emotion control, or large-scale production. All provide REST endpoints you can integrate in minutes.
It has been some time, please try to search again, and this time, add https://github.com/deepbeepmeep/Wan2GP too
TL;DR (mid-June 2025):
Fal.ai is still the only public hub that exposes an official API for HunyuanVideo-Avatar (fal-ai/hunyuan-avatar
). fal.ai Replicate has not added a first-party Avatar endpoint; it only offers the core text-to-video model (tencent/hunyuan-video
) and a few hobby finetunes. replicate.com replicate.com
By contrast, the Wan-2.1 / Wan2GP ecosystem now has multiple hosted options: Fal.ai runs several image-to-video endpoints (wan-i2v
, wan-pro/image-to-video
, etc.) fal.ai fal.ai, and Replicate lists high-resolution forks such as wavespeedai/wan-2.1-i2v-720p
and LoRA-enabled variants. replicate.com replicate.com The GitHub repo Wan2GP itself remains a self-host kit (10 GB VRAM) rather than a web API, but its checkpoints are binary-compatible with these services. github.com github.com
Platform | Endpoint ID | Status | Inputs | Typical cost |
---|---|---|---|---|
Fal.ai | fal-ai/hunyuan-avatar | Official | image_url , audio_url , optional emotion , fps | ~$1.40 / 5 s clip fal.ai |
Replicate | none (core text-to-video only) | – | – | – replicate.com |
Replicate (community) | lucataco/hunyuan-heygen-* fine-tunes | unofficial | still + prompt | hobby forks replicate.com |
Hugging Face | tencent/HunyuanVideo-Avatar | weights only | local inference | free download huggingface.co |
bashcurl -X POST https://fal.run/fal-ai/hunyuan-avatar \ -H "Authorization: Key $FAL_API_KEY" \ -F image_url=https://…/face.jpg \ -F audio_url=https://…/voice.mp3 \ -F emotion=happy -F fps=25
The JSON response includes an output_with_audio.mp4
URL. fal.ai
fal-ai/wan-i2v
– 480 p/720 p image-to-video, $0.20–$0.40 per clip fal.aifal-ai/wan-pro/image-to-video
– premium 1080 p, 6 s, 30 fps endpoint fal.aifal-ai/wan-trainer
lets you fine-tune LoRAs on your own clips. fal.aiwavespeedai/wan-2.1-i2v-720p
– optimized 720 p generation replicate.comfofr/wan2.1-with-lora
– LoRA-ready build, H100 backend replicate.comThe GitHub repo packages Wan-2.1 weights plus a TeaCache patch so you can run 720 p clips on a single 10 GB consumer GPU. github.com It also documents how to export Replicate-style LoRAs for cloud inference. github.com If you prefer an HTTP API, point your client to the Fal.ai or Replicate endpoints above—both accept the same latent formats.
Model/API | Hub | Strength | Citation |
---|---|---|---|
SadTalker | Replicate | Audio-driven talking head from one photo | replicate.com |
VideoReTalking | Replicate | Lip-syncs or re-voices still or video | replicate.com |
HeyGen /avatar-video | SaaS | Commercial studio-grade avatar videos | docs.heygen.com |
PixVerse / Kling 2.x | Fal.ai | High-motion silent clips (image-to-video) | fal.ai |
These remain the main drop-in options if the Hunyuan or Wan stacks don’t meet your quality, licensing, or cost targets.
Let me know and I can produce ready-to-run snippets or benchmark tables.