I want to take part in one hackthon in our company...

Créé le : 9 juin 2025

Répondu en utilisant O3 par Chat01

Question

I want to take part in one hackthon in our company

I want to make human robot in simulation env, to minic human actions from video, maybe make it play konfu, like taiji or baduanjin or alike

help to search there any existing projects like this, and discuss the possibilty and how to latter

Réflexion

Refining search approach

It seems the initial search didn’t return any results. Maybe the keywords we used don’t match the relevant projects in the internal corpus. I could adjust the search by including additional terms related to robotics, such as "Human3.6M," "Mimic," "MuJoCo," "OpenAI Gym," and other relevant concepts like "motion capture" and "humanoid." Let me try expanding the search with new queries and see if we can find something more applicable.

Broadening the search

Since we didn't find any internal results, I might expand our search to include external sources. The user's request wasn't limited to internal projects, so exploring open-source projects from the web is a good next step. I'll look up external projects related to humanoid robot imitation, Taichi robot simulation, or even specific imitation learning projects that use tools like MuJoCo. This could help us identify viable starting points. Let me reframe the queries and proceed with the web search.

Thought for 1m 1s

Réponse

Below is a quick scan of public, open-source work that already tackles “humanoid robot learns motions from a single RGB-video and reproduces them in simulation (or real hardware)”. None of our internal docs mention such a project, so these external references are probably the fastest starting points.

Representative open-source / academic baselines

Project	What it shows	Why it’s useful for your idea
HumanPlus (Stanford)	Trains a low-level policy in simulation from 40 h of human MoCap; policy then “shadows” a live person in real time using only an RGB camera. Code, URDF files and training scripts are on GitHub. humanoid-ai.github.io github.com	End-to-end pipeline (pose-estimation → retargeting → RL control) you can strip down for a Tai Chi demo; works in Isaac Gym & RSL-RL.
OKAMI (UT Austin)	Generates a manipulation plan from one RGB-D video, then derives a policy that adapts to new object positions. Emphasises object-aware retargeting of whole-body + hand poses. arxiv.org ut-austin-rpl.github.io	Shows how to go “single-video → generalisable policy”; useful if you want the robot to interact with a Tai Chi fan, sword, etc.
ASAP / HumanoidVerse	Multi-sim framework (IsaacGym / IsaacSim / Genesis) separating robot model, task, and algorithm; comes with Human2Humanoid tracking code. github.com humanoid-bench.github.io	Ready-made playground for rapid hackathon iteration—swap in your own humanoid mesh, dataset, or control algorithm.
Smart Baduanjin mobile app	Uses pose-estimation to score users practising Baduanjin. blog.tensorflow.org	Although not robotics, its open-sourced pose pipeline and labelled sequences are handy as a Tai Chi/Baduanjin dataset.

How you could structure a weekend-hackathon prototype

Stage	Concrete tool / library	Time budget
1. Choose sim + robot model	Isaac Gym (GPU-only, lightning-fast) or MuJoCo (CPU OK). Use built-in “humanoid” URDF or Unitree H1 URDF from HumanPlus.	½ day
2. Extract 3-D pose from video	MoveNet or VideoPose3D → 17- or 25-keypoint skeleton. For Tai Chi/Baduanjin you can pre-record one good performer.	¼ day
3. Retarget to robot joints	Leverage HumanPlus retarget script or Bullet’s `IKFast`. Pay attention to foot-sliding and joint-limit clamping.	¼ day
4. Controller / policy	a) Pure trajectory playback (simplest demo). b) RL fine-tune with RSL-RL or PPO to add balance/adaptability.	¾ day
5. Polish & demo	Embed the sim in a simple web viewer (Isaac Sim WebRTC) or record a GIF. Bonus: overlay the source video next to the sim for “wow” effect.	¼ day

(Add buffer for GPU setup and driver mismatches.)

Feasibility notes

Motion complexity: Tai Chi and Baduanjin are slow, continuous and mostly quasi-static ⇒ easier than fast Wushu kicks.
Contact handling: Pay extra attention to stable foot contacts; consider a simple penalty on Centre-of-Mass deviation.
Hardware future-proofing: If you later move from sim to a real Unitree H1 or Digit, HumanPlus already demonstrates policy transfer.
Dataset shortcut: CMU MoCap has “taichi” clips; Smart Baduanjin gives labelled sequences; you can fine-tune on those rather than collecting fresh motion-capture.
Company fit: Even though we have no in-house project yet, this lines up with our interest in AI-first UX demos—good pitch for the hackathon judging criteria “novelty + tangible demo”.

Next steps I can help with

Grab pose-estimation code snippets and a sample Tai Chi video to test the retargeter.
Draft a minimal Isaac Gym script that plays back joint trajectories.
Brainstorm presentation framing (e.g., “digital physiotherapist” or “culture-tech crossover”).

Let me know which of those you’d like to dive into, or if you want a deeper technical breakdown of any step!

Question

I wonder maybe we do not need the video at all, as Baduanjin is fixed action sequences, can we make things easier to drop video -> actions directly? And make the robots learn these actions in sim env?

Réflexion

Creating adaptive Baduanjin pipeline

We can leverage motion capture datasets like BVH or AMC from Baduanjin or similar Tai Chi motions. Convert the data into joint angle sequences, apply inverse kinematics, and fine-tune using RL for tracking. The steps would include:

Data collection: Acquire pre-recorded Baduanjin sequences (CMU MoCap or smartphone recording).
Preprocessing: Use tools like bvhParser to convert to quaternion sequences.
Retargeting: Map the sequences to a humanoid model.
Tracking: Fine-tune using RL or open-source tracking frameworks. No need for video, but datasets are recommended. recommended.

Summarizing Baduanjin approach

We can skip video but still need reference pose sequences. Two possible approaches:

Use mocap datasets (BVH, AMC) such as CMU's Tai Chi dataset. Write a script to convert to joint angles.
Use analytic keyframes: Manually define key poses for each of the 8 forms using tools like Blender or IMU suits (e.g., Perception Neuron).

For simulation, we can choose trajectory playback or reinforcement learning for tracking. I’d also suggest using frameworks like RoboHive, DeepMimic, or PyBullet to demonstrate, leveraging RL for transitions.

Thought for 1m 17s

Réponse

Yes — for Baduanjin you can skip the video-to-pose step

Because the routine is a fixed eight-section sequence, you only need a time-stamped list of target joint poses. There are three practical ways to get those reference trajectories without ever touching RGB video:

Option	How to obtain the data	Pros	Cons
1. Grab an existing MoCap file	Open-access BVH/AMC clips of slow martial-arts already exist. <br>• CMU Graphics Lab: subject #12 has “tai chi” recordings that cover most of the stance transitions you need mocap.cs.cmu.edu <br>• Bandai Namco Motion-dataset: 36 k frames in BVH, includes several “martial-arts / stylised fighting” clips that can be trimmed into the Eight Brocades tempo github.com <br>• IMU-recorded Baduanjin: an MDPI paper captured the full routine with a 17-sensor Perception Neuron suit and exported BVH files mdpi.com	Fastest: drag-and-drop into Blender/Isaac Gym, resample to 30 Hz.	May need post-edit: clip lengths, foot-skate cleanup, scaling to your humanoid’s limb lengths.
2. Key-frame it manually	Pose a skeleton in Blender or MotionBuilder at the “book-picture” angles (there are only ~40 distinct key frames across the 8 sections), then spline-interpolate.	Zero licensing worries; easy to exaggerate movements for clarity.	Human effort up-front; hand-tuned timing.
3. Record it yourself with IMUs or smartphone LiDAR	Borrow a Perception Neuron / Rokoko suit (exports BVH automatically) or use iPhone-based Move.ai. 10 min shoot → instant skeleton file.	Tailor the pace, breathing cues, and mirror/camera angle exactly for the demo.	Requires hardware booking; raw data usually noisy → needs quick filter + re-target pass.

Turning a BVH file into a sim-learned policy

Pre-process
Parse BVH → continuous joint-angle array (q[t, DoF]). bvh_tools or smplx scripts will do this in <100 lines of Python.
Normalise (root at origin, facing +X) and time-warp so the whole routine is 20–30 s; slower motions are easier for balance.
Retarget to your robot URDF
- Off-the-shelf: HumanPlus retarget utility or Isaac Gym’s built-in IK matcher.
- If joints don’t line up (human wrist vs. robot 2-DoF hand), drop the unused channels and let the controller track only achievable DoFs.
Choose a control strategy

Quick & dirty	Better for robustness
Trajectory playback: PD controller tracks q_ref(t). Good enough for a hack-day gif.	Reference-tracking RL (DeepMimic style): reward = -‖q – q_ref‖ – COM drift. Use RSL-RL or Isaac Gym’s PPO template; 30 min of GPU training usually nails slow Tai Chi-like motions.
Finite-state PD: split sequence into 8 states, each with its own PD gains.	Bi-Level optimisation: jointly fine-tune the reference motion and the policy so physically infeasible keyframes are nudged into the robot’s capability envelope. See Bi-Level Motion Imitation for Humanoid Robots (2024) arxiv.org

Loop and blend
Add a short “return to neutral” transition after each section so the policy can restart cleanly—use a 0.5 s cubic blend or a phase-function gait network.
Package the demo
- Record a side-by-side video: BVH stick figure vs. sim robot.
- Optional: publish the BVH and the trained policy checkpoint so judges can replay with one python demo.py.

Is this really easier than video-based imitation?

Criterion	Video-to-pose pipeline	Direct BVH-to-policy pipeline
Setup complexity	Needs good lighting, person detector, depth handling, fail-cases for occlusion.	One data file + parser script.
Motion accuracy	Suffers from 2-D→3-D ambiguity; wrists/ankles often jitter.	Captured or hand-tuned 3-D joint angles = deterministic.
Hackathon time budget	~½ day extra just to calibrate pose-estimation.	Use the ½ day to tweak reward weights or add a fun VFX overlay instead.

So for a weekend hackathon, yes: pre-recorded or hand-crafted BVH is the cleanest path.
Start by downloading the CMU Tai Chi clip or the Perception-Neuron Baduanjin BVH, retarget it to the Isaac Gym humanoid, and run a trajectory-tracking PPO script—most teams report ⭐ “first crisp playback” within 3-4 hours.

Where I can help next

Provide a tiny Python snippet that converts BVH → NumPy → Isaac Gym tensor.
Sketch out the reward function and PPO hyper-params.
Suggest presentation angles (e.g., “digital physiotherapy coach” with real-time score).

Just let me know which piece you’d like drafted first!

Question

Réflexion

Refining search approach

Broadening the search

Réponse

Representative open-source / academic baselines

How you could structure a weekend-hackathon prototype

Feasibility notes

Next steps I can help with

Question

Réflexion

Creating adaptive Baduanjin pipeline

Summarizing Baduanjin approach

Réponse

Yes — for Baduanjin you can skip the video-to-pose step

Turning a BVH file into a sim-learned policy

Is this really easier than video-based imitation?

Where I can help next

Partager cette Q&R