TiPToP¶
🌐 Project Website · 📝 Paper · 💻 Code
TiPToP is a Task and Motion Planning (TAMP) system that performs complex robot manipulation tasks like sorting, rearranging, and packing from images and natural language instructions. Using a modular architecture that separates perception, planning, and execution, TiPToP works out-of-the-box with zero training, zero demonstrations, and zero object-specific 3D models—yet matches or exceeds vision-language models trained on 350 hours of robot data.
The system combines learned perception models (depth prediction, VLMs, segmentation, grasp detection) with GPU-parallelized TAMP to reason explicitly about physical constraints and object interactions.
These docs make it easy to get TiPToP running on the DROID hardware setup. With a few hours of additional effort, TiPToP can also be adapted to work with new embodiments zero-shot.
Install TiPToP and its modules, including perception models and Task and Motion Planners.
Configure your robot, calibrate cameras, and run your first TiPToP demo!
Run TiPToP in DROID simulation based in IsaacLab.
Set up scenes, run evaluations, and label results.
Add support for a new robot arm or camera to TiPToP.
Detailed documentation for TiPToP CLI commands including helper commands.
Solutions to common issues with cameras, networking, motion planning, and perception.
Limitations of the current TiPToP system.
How to contribute to TiPToP, including development setup and code style.
Blog¶
Nishanth Kumar · May 08, 2026
TiPToP achieves 46.1% on MolmoSpaces — outperforming every approach not trained on MolmoBot data and nearly doubling the next-best result.
License¶
TiPToP is released open-source under the MIT License.
However, key dependencies including cuRobo, cuTAMP, M2T2, and FoundationStereo are licensed under the NVIDIA Source Code License, which permits non-commercial use for research or evaluation purposes only.