TiPToP

🌐 Project Website · 📝 Paper · 💻 Code

TiPToP is a Task and Motion Planning (TAMP) system that performs complex robot manipulation tasks like sorting, rearranging, and packing from images and natural language instructions. Using a modular architecture that separates perception, planning, and execution, TiPToP works out-of-the-box with zero training, zero demonstrations, and zero object-specific 3D models—yet matches or exceeds vision-language models trained on 350 hours of robot data.

The system combines learned perception models (depth prediction, VLMs, segmentation, grasp detection) with GPU-parallelized TAMP to reason explicitly about physical constraints and object interactions.

These docs make it easy to get TiPToP running on the DROID hardware setup. With a few hours of additional effort, TiPToP can also be adapted to work with new embodiments zero-shot.


📦 Installation

Install TiPToP and its modules, including perception models and Task and Motion Planners.

Installation
🚀 Getting Started

Configure your robot, calibrate cameras, and run your first TiPToP demo!

Getting Started
🖥️ Simulation

Run TiPToP in DROID simulation based in IsaacLab.

Running in Simulation
📊 Evaluation Workflow

Set up scenes, run evaluations, and label results.

Evaluation Workflow
🤖 New Embodiment

Add support for a new robot arm or camera to TiPToP.

Adding a New Embodiment
📚 Command Reference

Detailed documentation for TiPToP CLI commands including helper commands.

Command Reference
🔧 Troubleshooting

Solutions to common issues with cameras, networking, motion planning, and perception.

Troubleshooting
⚠️ Limitations

Limitations of the current TiPToP system.

Limitations
🤝 Contributing

How to contribute to TiPToP, including development setup and code style.

Contributing to TiPToP

Blog

Achieving SOTA on the MolmoSpaces benchmark with Inference-Time Search

Nishanth Kumar · May 08, 2026

TiPToP achieves 46.1% on MolmoSpaces — outperforming every approach not trained on MolmoBot data and nearly doubling the next-best result.

中文版

Achieving SOTA on the MolmoSpaces benchmark with Inference-Time Search

See all posts →


License

TiPToP is released open-source under the MIT License.

However, key dependencies including cuRobo, cuTAMP, M2T2, and FoundationStereo are licensed under the NVIDIA Source Code License, which permits non-commercial use for research or evaluation purposes only.