Page 1 of 1
AI Workload Survey
Hardware
Which hardware do you currently use?
*
Yes
No
Not Sure
NVIDIA GPU DGX/Cloud/PCIe
NVIDIA H100
NVIDIA Grace Hopper (H200)
NVIDIA B100 (GB102 chip)
NVIDIA NVL72 (GB200)
AMD GPU MI300/MI350 etc.
Google TPU
AWS Tranium
AWS Inferentia
Groq
Cerebras
Tenstorrent
SambaNova
Any other hardware you use not mentioned above (please specify):
Which low-level toolchains and technologies do you currently use?
*
Yes
No
Not Sure
CUDA C++
CUDA Graphs
CUDA JIT (NVRTC) - compiles source to PTX
Thrust (header-only parallel primitives library)
CUB (header-only library)
CUTLASS (header-only library)
GPUDirect (GPU-to-GPU)
GPUDirect Storage
NVSHMEM
NCCL
AMD HIP
AMD CK (composable kernels)
SCALE (compiles CUDA source to AMD hardware)
Any other low-level toolchains and technologies you use not mentioned above (please specify):
Performance Libraries
Which performance libraries do you currently use?
*
Yes
No
Not Sure
cuBLAS (dense linear algebra)
cuBLASLt (hybrid precision support)
cuBLASXt (multi-GPU)
cuBLASMp (multi-node)
cuBLASDx (device-side)
cuDNN
cuTENSOR
cuSPARSE
cuFFT
cuRAND
cuSOLVER
cuDSS (direct sparse solver library)
Any other performance libraries you use not mentioned above (please specify):
Elaborate on your use of these libraries - which are used the most, etc.
Python Libraries
*
Yes
No
Not Sure
RAPIDS cuDF
RAPIDS cuML
RAPIDS cuGraph
cuVS
Dask
RAPIDS Accelerator for Apache Spark
Any other python libraries in use by your team not mentioned above (please specify):
Domain-specific languages (DSLs)
*
Yes
No
Not Sure
Triton Language
Mojo
cuTile
Any other DSLs in use by your team not mentioned above (please specify)
Graph Compilers
*
Yes
No
Not Sure
MLIR
XLA (Accelerated Linear Algebra)
TorchInductor
TVM Unity
GLOW
Any other graph compilers in use by your team not mentioned above (please specify)
Frameworks
*
Yes
No
Not Sure
PyTorch
TensorFlow/Keras
JAX
PaddlePaddle
MindSpore
MXNet
Any other frameworks? (please specify)
Distributed Frameworks
*
Yes
No
Not Sure
DeepSpeed
FSDP (Fairscale)
Ray-Train
MetaTron
Any other distributed frameworks? (please specify):
Runtimes
*
Yes
No
Not Sure
Dynamo-Triton (formerly TensorRT / Triton Inference Server)
Dynamo
ONNX Runtime
TVM Runtime
TFLite
IREE
OpenVino
Any other runtimes?
Serving
*
Yes
No
Not Sure
KServe
Dynamo-Triton (formerly Triton Inference Server)
TorchServe
BentoML
MLFlow
Any other server technologies in use
Utility Computing Vendors for ML
*
Yes
No
Not Sure
Amazon Web Services
Microsoft Azure
Google Cloud
CoreWeave
HotAisle
TensorWave
Other computing vendors in use for ML (please specify):
Tools
*
Yes
No
Not Sure
Nsight
cuda-gdb
Any other tools in use (please specify)
Rate the properties of the technologies your team uses on a scale 0-10 where 0 is not important to 10 essential
Ability to perform low-level hardware specific optimizations:
Ability to perform low-level hardware specific optimizations:
0
1
2
3
4
5
6
7
8
9
10
Availability and usability of robust debug tools:
Availability and usability of robust debug tools:
0
1
2
3
4
5
6
7
8
9
10
Ability to run on specific hardware (NVidia, AMD, etc):
Ability to run on specific hardware (NVidia, AMD, etc):
0
1
2
3
4
5
6
7
8
9
10
Ability to run on multiple hardware platforms:
Ability to run on multiple hardware platforms:
0
1
2
3
4
5
6
7
8
9
10
Ability to run compiled binary or binaries unmodified:
Ability to run compiled binary or binaries unmodified:
0
1
2
3
4
5
6
7
8
9
10
Willingness to recompile for another platform with little to no code changes:
Willingness to recompile for another platform with little to no code changes:
0
1
2
3
4
5
6
7
8
9
10
Willingness to refactor if the gains are substantial enough:
Willingness to refactor if the gains are substantial enough:
0
1
2
3
4
5
6
7
8
9
10
Describe the threshold that could motivate significant code changes: For example, 10x performance, 1/2 the cost per token, never in a million years, I do it for fun often
Does your team consider underlying hardware architecture?
Does your team consider underlying hardware architecture?
0
1
2
3
4
5
6
7
8
9
10
Submit