Skip to content
All systems operational0 AI providers monitored, polled every 2 minutes
Live status

Training Datasets

AI pretraining corpora, instruction-tuning datasets, DPO preference data, and multimodal data the open-source community uses to train and fine-tune models. Each entry: size, license, languages, content type.

Machine-readable JSON/api/training-datasets

For agents: same data at /api/training-datasets. Filter with ?stage=pretraining|instruction-tuning|dpo|rlhf|multimodal. Free, no auth, cached 10 min.