https://github.com/Tencent-Hunyuan/HunyuanVideo-Av...

Question

https://github.com/Tencent-Hunyuan/HunyuanVideo-Avatar

Is there any API for this model in repicate/fal.ai or alike platforms, if not, is there any other model API to make one Avator image to video?

Answer 1

Short answer: Fal.ai already exposes an official fal-ai/hunyuan-avatar endpoint that turns a single avatar image + audio into a talking-head video. Replicate, by contrast, only hosts the text-to-video model tencent/hunyuan-video and a few unofficial fine-tunes, so there is no first-party HunyuanVideo-Avatar API there. If you just need an “image-to-video avatar” pipeline, several other public APIs—SadTalker, VideoReTalking, AnimateDiff, HeyGen, A2E, and Fal.ai’s Kling or PixVerse—can fill the gap.

1 ▪ Status of HunyuanVideo-Avatar on popular inference hubs

Platform	Official Avatar model?	Notes
Fal.ai	Yes – `fal-ai/hunyuan-avatar`	Accepts one image & audio; playground + REST/SDK; ~8 min/5 s video, about $1.40 per run fal.ai
Replicate	No. Only `tencent/hunyuan-video` (text→video) and community fine-tunes	`tencent/hunyuan-video` ≈104 k runs, full API; but it requires a text prompt, not an avatar image replicate.com. A proof-of-concept fine-tune `lucataco/hunyuan-heygen-woman` exists, but it is not the official Avatar model replicate.com

2 ▪ Grab-and-go APIs that animate a single image

Model / Provider	What it does	Typical inputs	Cost / speed (public default)
SadTalker – Replicate	Audio-driven talking-head from one photo	image + audio	≈$0.19 / ~2 min for 1 clip replicate.com
VideoReTalking – Replicate	Lip-sync or re-voice an existing face video; also works from one still	image or video + audio	≈ $0.26 per call, 3 runs /$ 1 replicate.com
AnimateDiff – Replicate	Adds short looped motion to a still image (no audio)	image (+prompt)	≈$0.013, ~10 s per 16-frame clip replicate.com
Kling 2.1 / PixVerse / Veo 3 – Fal.ai	High-quality image-to-video (no audio) with realistic motion	image (+prompt)	Tiered ($) endpoints shown in Fal gallery fal.ai
HeyGen API	Commercial avatar generator; create, script, translate & lip-sync videos	image/photo avatar + text or audio	Usage-based pricing; full REST API (`/avatar-video`) docs.heygen.com docs.heygen.com docs.heygen.com
A2E Image-to-Video API	End-to-end avatar, lip-sync, voice-clone	image + audio or text	Free tier + paid plans; REST endpoints a2e.ai

3 ▪ Choosing the right option

Exact model parity: If you specifically need HunyuanVideo-Avatar, use Fal.ai today—deployment is turnkey and follows the repo’s audio-driven pipeline.
Replicate-only environment: Combine SadTalker for lip-synced faces or VideoReTalking for re-voicing; for silent motion, AnimateDiff works well.
Higher-fidelity cinematic motion: Fal.ai’s Kling 2.1/PixVerse or Veo 3 endpoints deliver 1080 p+ footage, though you must overlay audio separately.
All-in-one SaaS: Platforms like HeyGen or A2E bundle voice cloning, subtitle translation and template scenes—ideal when you want production-ready videos without managing ML infra.
Self-hosting: You can still clone the GitHub repo and run locally, but you’ll need multi-GPU compute and to expose your own inference endpoint—Fal.ai’s hosted API is usually simpler unless you require full control.

Quick start with Fal.ai’s endpoint

bash
curl -X POST https://fal.run/fal-ai/hunyuan-avatar \
  -H "Authorization: Key $FAL_API_KEY" \
  -F "image_url=https://example.com/avatar.jpg" \
  -F "audio_url=https://example.com/voice.mp3" \
  -F "emotion=happy" \
  -F "fps=25"

The response JSON contains an output_with_audio.mp4 URL—download or stream in your app fal.ai.

4 ▪ Future watch-list

Tencent’s team released the avatar paper on May 26 2025; public model ports evolve rapidly, so an official Replicate wrapper may appear soon arxiv.org.
Replicate announced a video playground that simplifies chaining image→video models and could host more avatar checkpoints in coming months replicate.com.

TL;DR

Use Fal.ai’s hunyuan-avatar if you want the exact model right now. Otherwise, pick from Replicate (SadTalker, VideoReTalking, AnimateDiff) or commercial APIs like HeyGen and A2E depending on whether you need lip-sync, emotion control, or large-scale production. All provide REST endpoints you can integrate in minutes.

Answer 2

TL;DR (mid-June 2025):
Fal.ai is still the only public hub that exposes an official API for HunyuanVideo-Avatar (fal-ai/hunyuan-avatar). fal.ai Replicate has not added a first-party Avatar endpoint; it only offers the core text-to-video model (tencent/hunyuan-video) and a few hobby finetunes. replicate.com replicate.com
By contrast, the Wan-2.1 / Wan2GP ecosystem now has multiple hosted options: Fal.ai runs several image-to-video endpoints (wan-i2v, wan-pro/image-to-video, etc.) fal.ai fal.ai, and Replicate lists high-resolution forks such as wavespeedai/wan-2.1-i2v-720p and LoRA-enabled variants. replicate.com replicate.com The GitHub repo Wan2GP itself remains a self-host kit (10 GB VRAM) rather than a web API, but its checkpoints are binary-compatible with these services. github.com github.com

1 HunyuanVideo-Avatar: where you can call it today

Platform	Endpoint ID	Status	Inputs	Typical cost
Fal.ai	`fal-ai/hunyuan-avatar`	Official	`image_url`, `audio_url`, optional `emotion`, `fps`	~$1.40 / 5 s clip fal.ai
Replicate	none (core text-to-video only)	–	–	– replicate.com
Replicate (community)	`lucataco/hunyuan-heygen-*` fine-tunes	unofficial	still + prompt	hobby forks replicate.com
Hugging Face	`tencent/HunyuanVideo-Avatar`	weights only	local inference	free download huggingface.co

Quick Fal.ai call

bash
curl -X POST https://fal.run/fal-ai/hunyuan-avatar \
  -H "Authorization: Key $FAL_API_KEY" \
  -F image_url=https://…/face.jpg \
  -F audio_url=https://…/voice.mp3 \
  -F emotion=happy -F fps=25

The JSON response includes an output_with_audio.mp4 URL. fal.ai

2 Wan-family models (Wan-2.1 & Wan2GP)

2.1 Fal.ai options

fal-ai/wan-i2v – 480 p/720 p image-to-video, $0.20–$0.40 per clip fal.ai
fal-ai/wan-pro/image-to-video – premium 1080 p, 6 s, 30 fps endpoint fal.ai
Training: fal-ai/wan-trainer lets you fine-tune LoRAs on your own clips. fal.ai

2.2 Replicate options

wavespeedai/wan-2.1-i2v-720p – optimized 720 p generation replicate.com
fofr/wan2.1-with-lora – LoRA-ready build, H100 backend replicate.com
Replicate’s blog tutorial shows how to call any Wan-2.1 variant with a single line of code. replicate.com

2.3 Where Wan2GP fits

The GitHub repo packages Wan-2.1 weights plus a TeaCache patch so you can run 720 p clips on a single 10 GB consumer GPU. github.com It also documents how to export Replicate-style LoRAs for cloud inference. github.com If you prefer an HTTP API, point your client to the Fal.ai or Replicate endpoints above—both accept the same latent formats.

3 Still-image-to-video avatar alternatives

Model/API	Hub	Strength	Citation
SadTalker	Replicate	Audio-driven talking head from one photo	replicate.com
VideoReTalking	Replicate	Lip-syncs or re-voices still or video	replicate.com
HeyGen `/avatar-video`	SaaS	Commercial studio-grade avatar videos	docs.heygen.com
PixVerse / Kling 2.x	Fal.ai	High-motion silent clips (image-to-video)	fal.ai

These remain the main drop-in options if the Hunyuan or Wan stacks don’t meet your quality, licensing, or cost targets.

4 Choosing between them

Exact HunyuanAvatar pipeline → use Fal.ai’s managed endpoint.
Higher-resolution cinematic motion without audio → Wan-Pro, PixVerse, or Kling on Fal.ai.
Replicate-only environments → Wan-2.1 forks for motion, plus SadTalker/VideoReTalking for lip-sync.
Self-hosting → Clone Wan2GP or HunyuanVideo-Avatar from GitHub and run locally; you can later containerise with Cog to deploy on Replicate. github.com github.com
Turn-key SaaS → HeyGen if you need an all-in-one script-to-video pipeline.

Need code examples, pricing calculators, or a deeper comparison?

Let me know and I can produce ready-to-run snippets or benchmark tables.