Page 1 of 1

AI Workload Survey

Hardware

Which hardware do you currently use?

Yes
No
Not Sure
NVIDIA GPU DGX/Cloud/PCIe
NVIDIA H100
NVIDIA Grace Hopper (H200)
NVIDIA B100 (GB102 chip)
NVIDIA NVL72 (GB200)
AMD GPU MI300/MI350 etc.
Google TPU
AWS Tranium
AWS Inferentia
Groq
Cerebras
Tenstorrent
SambaNova

Any other hardware you use not mentioned above (please specify):

Which low-level toolchains and technologies do you currently use?

Yes
No
Not Sure
CUDA C++
CUDA Graphs
CUDA JIT (NVRTC) - compiles source to PTX
Thrust (header-only parallel primitives library)
CUB (header-only library)
CUTLASS (header-only library)
GPUDirect (GPU-to-GPU)
GPUDirect Storage
NVSHMEM
NCCL
AMD HIP
AMD CK (composable kernels)
SCALE (compiles CUDA source to AMD hardware)

Any other low-level toolchains and technologies you use not mentioned above (please specify):

Performance Libraries

Which performance libraries do you currently use?

Yes
No
Not Sure
cuBLAS (dense linear algebra)
cuBLASLt (hybrid precision support)
cuBLASXt (multi-GPU)
cuBLASMp (multi-node)
cuBLASDx (device-side)
cuDNN
cuTENSOR
cuSPARSE
cuFFT
cuRAND
cuSOLVER
cuDSS (direct sparse solver library)

Any other performance libraries you use not mentioned above (please specify):

Elaborate on your use of these libraries - which are used the most, etc.

Python Libraries

Yes
No
Not Sure
RAPIDS cuDF
RAPIDS cuML
RAPIDS cuGraph
cuVS
Dask
RAPIDS Accelerator for Apache Spark

Any other python libraries in use by your team not mentioned above (please specify):

Domain-specific languages (DSLs)

Yes
No
Not Sure
Triton Language
Mojo
cuTile

Any other DSLs in use by your team not mentioned above (please specify)

Graph Compilers

Yes
No
Not Sure
MLIR
XLA (Accelerated Linear Algebra)
TorchInductor
TVM Unity
GLOW

Any other graph compilers in use by your team not mentioned above (please specify)

Frameworks

Yes
No
Not Sure
PyTorch
TensorFlow/Keras
JAX
PaddlePaddle
MindSpore
MXNet

Any other frameworks? (please specify)

Distributed Frameworks

Yes
No
Not Sure
DeepSpeed
FSDP (Fairscale)
Ray-Train
MetaTron

Any other distributed frameworks? (please specify):

Runtimes

Yes
No
Not Sure
Dynamo-Triton (formerly TensorRT / Triton Inference Server)
Dynamo
ONNX Runtime
TVM Runtime
TFLite
IREE
OpenVino

Any other runtimes?

Serving

Yes
No
Not Sure
KServe
Dynamo-Triton (formerly Triton Inference Server)
TorchServe
BentoML
MLFlow

Any other server technologies in use

Utility Computing Vendors for ML

Yes
No
Not Sure
Amazon Web Services
Microsoft Azure
Google Cloud
CoreWeave
HotAisle
TensorWave

Other computing vendors in use for ML (please specify):

Tools

Yes
No
Not Sure
Nsight
cuda-gdb

Any other tools in use (please specify)


Rate the properties of the technologies your team uses on a scale 0-10 where 0 is not important to 10 essential

Ability to perform low-level hardware specific optimizations:

Ability to perform low-level hardware specific optimizations:

Availability and usability of robust debug tools:

Availability and usability of robust debug tools:

Ability to run on specific hardware (NVidia, AMD, etc):

Ability to run on specific hardware (NVidia, AMD, etc):

Ability to run on multiple hardware platforms:

Ability to run on multiple hardware platforms:

Ability to run compiled binary or binaries unmodified:

Ability to run compiled binary or binaries unmodified:

Willingness to recompile for another platform with little to no code changes:

Willingness to recompile for another platform with little to no code changes:

Willingness to refactor if the gains are substantial enough:

Willingness to refactor if the gains are substantial enough:

Describe the threshold that could motivate significant code changes: For example, 10x performance, 1/2 the cost per token, never in a million years, I do it for fun often

Does your team consider underlying hardware architecture?

Does your team consider underlying hardware architecture?