🔒 Permission Denied — Role Viewer has limited access. Some actions are disabled.
Distributed Jobs
Manage multi-node distributed training jobs.
🌍
7,742
Total
▶️
4,050
Running
✅
8,240
Completed
❌
9,655
Failed
| Name | Status | Framework | Nodes | GPUs/Node | Total GPUs | Duration | Owner | |
|---|---|---|---|---|---|---|---|---|
| ray-train-21 | Failed | Horovod | 16 | 4 | 85 | alice.liu | ||
| ray-train-46 | Completed | Horovod | 3 | 4 | 43 | henry.zhao | ||
| ray-train-3 | Failed | Horovod | 11 | 4 | 45 | alice.liu | ||
| pretrain-17 | Queued | Horovod | 3 | 8 | 58 | alice.liu | ||
| ray-train-50 | Queued | Horovod | 8 | 4 | 69 | alice.liu | ||
| deepspeed-43 | Failed | Horovod | 7 | 4 | 21 | alice.liu | ||
| deepspeed-27 | Completed | Megatron-LM | 7 | 8 | 80 | bob.zhang | ||
| megatron-38 | Failed | DeepSpeed | 10 | 4 | 113 | bob.zhang | ||
| ray-train-29 | Failed | Megatron-LM | 14 | 4 | 32 | alice.liu | ||
| fsdp-20 | Completed | Horovod | 15 | 8 | 108 | alice.liu |