tupone@gentoo.org Tupone Alfredo Use the CUDA/HIP Sparse Matrix Multiplication Support distributed applications Use sci-ml/FBGEMM Enable flash attention Use sci-ml/gloo Usesci-ml/kinetoprofiling library Enable mem efficient attention Use dev-libs/mimalloc as replacement for system malloc Use sci-libs/mkl for blas, lapack and sparse blas routines Use dev-libs/rccl (NCCL compatible) backend for distributed operations Use sci-ml/NNPACK Add support for math operations through numpy Use sci-ml/oneDNN Use sci-libs/openblas for blas routines Use QNNPACK Enable ROCm gpu computing support Use sci-ml/XNNPACK pytorch/pytorch