I want to take part in one hackthon in our company...
Créé le : 9 juin 2025
Créé le : 9 juin 2025
I want to take part in one hackthon in our company
I want to make human robot in simulation env, to minic human actions from video, maybe make it play konfu, like taiji or baduanjin or alike
help to search there any existing projects like this, and discuss the possibilty and how to latter
Below is a quick scan of public, open-source work that already tackles “humanoid robot learns motions from a single RGB-video and reproduces them in simulation (or real hardware)”. None of our internal docs mention such a project, so these external references are probably the fastest starting points.
Project | What it shows | Why it’s useful for your idea |
---|---|---|
HumanPlus (Stanford) | Trains a low-level policy in simulation from 40 h of human MoCap; policy then “shadows” a live person in real time using only an RGB camera. Code, URDF files and training scripts are on GitHub. humanoid-ai.github.io github.com | End-to-end pipeline (pose-estimation → retargeting → RL control) you can strip down for a Tai Chi demo; works in Isaac Gym & RSL-RL. |
OKAMI (UT Austin) | Generates a manipulation plan from one RGB-D video, then derives a policy that adapts to new object positions. Emphasises object-aware retargeting of whole-body + hand poses. arxiv.org ut-austin-rpl.github.io | Shows how to go “single-video → generalisable policy”; useful if you want the robot to interact with a Tai Chi fan, sword, etc. |
ASAP / HumanoidVerse | Multi-sim framework (IsaacGym / IsaacSim / Genesis) separating robot model, task, and algorithm; comes with Human2Humanoid tracking code. github.com humanoid-bench.github.io | Ready-made playground for rapid hackathon iteration—swap in your own humanoid mesh, dataset, or control algorithm. |
Smart Baduanjin mobile app | Uses pose-estimation to score users practising Baduanjin. blog.tensorflow.org | Although not robotics, its open-sourced pose pipeline and labelled sequences are handy as a Tai Chi/Baduanjin dataset. |
Stage | Concrete tool / library | Time budget |
---|---|---|
1. Choose sim + robot model | Isaac Gym (GPU-only, lightning-fast) or MuJoCo (CPU OK). Use built-in “humanoid” URDF or Unitree H1 URDF from HumanPlus. | ½ day |
2. Extract 3-D pose from video | MoveNet or VideoPose3D → 17- or 25-keypoint skeleton. For Tai Chi/Baduanjin you can pre-record one good performer. | ¼ day |
3. Retarget to robot joints | Leverage HumanPlus retarget script or Bullet’s IKFast . Pay attention to foot-sliding and joint-limit clamping. | ¼ day |
4. Controller / policy | a) Pure trajectory playback (simplest demo). b) RL fine-tune with RSL-RL or PPO to add balance/adaptability. | ¾ day |
5. Polish & demo | Embed the sim in a simple web viewer (Isaac Sim WebRTC) or record a GIF. Bonus: overlay the source video next to the sim for “wow” effect. | ¼ day |
(Add buffer for GPU setup and driver mismatches.)
Let me know which of those you’d like to dive into, or if you want a deeper technical breakdown of any step!
I wonder maybe we do not need the video at all, as Baduanjin is fixed action sequences, can we make things easier to drop video -> actions directly? And make the robots learn these actions in sim env?
Because the routine is a fixed eight-section sequence, you only need a time-stamped list of target joint poses. There are three practical ways to get those reference trajectories without ever touching RGB video:
Option | How to obtain the data | Pros | Cons |
---|---|---|---|
1. Grab an existing MoCap file | Open-access BVH/AMC clips of slow martial-arts already exist. <br>• CMU Graphics Lab: subject #12 has “tai chi” recordings that cover most of the stance transitions you need mocap.cs.cmu.edu <br>• Bandai Namco Motion-dataset: 36 k frames in BVH, includes several “martial-arts / stylised fighting” clips that can be trimmed into the Eight Brocades tempo github.com <br>• IMU-recorded Baduanjin: an MDPI paper captured the full routine with a 17-sensor Perception Neuron suit and exported BVH files mdpi.com | Fastest: drag-and-drop into Blender/Isaac Gym, resample to 30 Hz. | May need post-edit: clip lengths, foot-skate cleanup, scaling to your humanoid’s limb lengths. |
2. Key-frame it manually | Pose a skeleton in Blender or MotionBuilder at the “book-picture” angles (there are only ~40 distinct key frames across the 8 sections), then spline-interpolate. | Zero licensing worries; easy to exaggerate movements for clarity. | Human effort up-front; hand-tuned timing. |
3. Record it yourself with IMUs or smartphone LiDAR | Borrow a Perception Neuron / Rokoko suit (exports BVH automatically) or use iPhone-based Move.ai. 10 min shoot → instant skeleton file. | Tailor the pace, breathing cues, and mirror/camera angle exactly for the demo. | Requires hardware booking; raw data usually noisy → needs quick filter + re-target pass. |
Pre-process
Parse BVH → continuous joint-angle array (q[t, DoF]
). bvh_tools
or smplx
scripts will do this in <100 lines of Python.
Normalise (root at origin, facing +X) and time-warp so the whole routine is 20–30 s; slower motions are easier for balance.
Retarget to your robot URDF
Choose a control strategy
Quick & dirty | Better for robustness |
---|---|
Trajectory playback: PD controller tracks q_ref(t). Good enough for a hack-day gif. | Reference-tracking RL (DeepMimic style): reward = -‖q – q_ref‖ – COM drift. Use RSL-RL or Isaac Gym’s PPO template; 30 min of GPU training usually nails slow Tai Chi-like motions. |
Finite-state PD: split sequence into 8 states, each with its own PD gains. | Bi-Level optimisation: jointly fine-tune the reference motion and the policy so physically infeasible keyframes are nudged into the robot’s capability envelope. See Bi-Level Motion Imitation for Humanoid Robots (2024) arxiv.org |
Loop and blend
Add a short “return to neutral” transition after each section so the policy can restart cleanly—use a 0.5 s cubic blend or a phase-function gait network.
Package the demo
python demo.py
.Criterion | Video-to-pose pipeline | Direct BVH-to-policy pipeline |
---|---|---|
Setup complexity | Needs good lighting, person detector, depth handling, fail-cases for occlusion. | One data file + parser script. |
Motion accuracy | Suffers from 2-D→3-D ambiguity; wrists/ankles often jitter. | Captured or hand-tuned 3-D joint angles = deterministic. |
Hackathon time budget | ~½ day extra just to calibrate pose-estimation. | Use the ½ day to tweak reward weights or add a fun VFX overlay instead. |
So for a weekend hackathon, yes: pre-recorded or hand-crafted BVH is the cleanest path.
Start by downloading the CMU Tai Chi clip or the Perception-Neuron Baduanjin BVH, retarget it to the Isaac Gym humanoid, and run a trajectory-tracking PPO script—most teams report ⭐ “first crisp playback” within 3-4 hours.
Just let me know which piece you’d like drafted first!