Commit Graph

  • f20f48a255 Move Jokeren 2022-12-06 13:29:29 -08:00
  • 3eff110fbc Restore Jokeren 2022-12-06 13:28:43 -08:00
  • 5f85b79718 Merge branch 'triton-mlir' into keren/insert-slice-other-nonzero Jokeren 2022-12-06 13:25:20 -08:00
  • bab7338965 Fix Jokeren 2022-12-06 13:24:50 -08:00
  • 74f3d7a80f Fix Jokeren 2022-12-06 12:53:25 -08:00
  • 115cd3ac47 [FRONTEND] Added reshape as an alias for view (for now) (#956) Philippe Tillet 2022-12-06 09:57:05 -08:00
  • 532e10cf87 [FRONTEND][BACKEND] Clean-up transpositions (#953) Philippe Tillet 2022-12-06 09:32:13 -08:00
  • 16e973edf2 [BACKEND] Fix dependency analysis in pipeline (#946) Keren Zhou 2022-12-06 09:08:55 -08:00
  • b539e031e8 Add test Jokeren 2022-12-05 23:38:54 -08:00
  • 46fa29496c Init Jokeren 2022-12-05 23:18:13 -08:00
  • 9490252261 [FRONTEND] Support alternative install locations of system libdevice.10.bc (#951) Crutcher Dunnavant 2022-12-05 19:41:44 -08:00
  • e419781978 [Triton-MLIR][BACKEND] Make mmav1 works on basic cases (#944) Yan Chunwei 2022-12-06 10:57:08 +08:00
  • 189491727a [FRONTEND] Extract and unify @builtin/@extern (#913) Crutcher Dunnavant 2022-12-05 14:59:41 -08:00
  • e0072d210a [FRONTEND] Propagate mypy types through @jit, @builtin, etc (#915) Crutcher Dunnavant 2022-12-05 14:41:02 -08:00
  • 2fa17588f7 [FRONTEND] Expand __init__ * imports, add __all__ (#912) Crutcher Dunnavant 2022-12-05 14:22:55 -08:00
  • e057c65cf0 [BACKEND] Porting the legacy heuristic rule in assigning shared layout for A/B of MMAv1 (#948) goostavz 2022-12-06 03:30:23 +08:00
  • 99c7e0e008 [BUILD] Change default build type (#945) Philippe Tillet 2022-12-03 17:47:33 -08:00
  • f2fcaeabf3 [BACKEND] Support dot op when the output is mma encoding and allowtf32 is true (#937) Keren Zhou 2022-12-03 11:14:12 -08:00
  • 8edfe813a5 [FRONTEND][BACKEND] Added trans instruction; made flash attention bwd pass work (#943) Philippe Tillet 2022-12-03 09:58:24 -08:00
  • 4d64589b22 [Triton-MLIR][Backend] Fix the definition of MmaEncodingAttr v1, and the output sequence of DotConversion in MMAv1 (#941) goostavz 2022-12-03 21:12:48 +08:00
  • 8650b4d1cb [DRIVER] Fix typos (#939) legacy-backend Yang Hau 2022-12-03 03:13:46 +08:00
  • 521ff9ad74 [TRITON-MLIR][FRONTEND]fix scf.if to run through layernorm tutorial (#938) donproc 2022-12-02 17:45:29 +08:00
  • c280ebda1b [Triton-MLIR][BACKEND] Fix the membar pass to add missing barriers caused by scf.for (#933) Keren Zhou 2022-12-01 11:54:18 -08:00
  • 9def1bcebf [TRITON-MLIR][FRONTEND]minor fix to run through atomic_cas test (#925) donproc 2022-12-01 21:43:26 +08:00
  • 7d90a07d0b [Triton-MLIR][BACKEND] Refactor decompose insert_slice_async (#929) Keren Zhou 2022-11-30 10:07:34 -08:00
  • 6461254fb5 [BACKEND] Make flash attention forward pass work (#928) Philippe Tillet 2022-11-30 11:13:24 +01:00
  • 4e6a8209ed [Triton-MLIR] Two fixes on allocation and backend related with MMA v1 (#930) goostavz 2022-11-30 17:27:26 +08:00
  • 9bb54402b3 [FRONTEND][BACKEND] Small fixes to multiple_of, num_programs, axisinfo; enable block-sparse tests (#927) Philippe Tillet 2022-11-29 20:00:34 +01:00
  • 66c36c4378 [BACKEND] Fixed bounds-wrapping issues (#926) Philippe Tillet 2022-11-29 17:56:45 +01:00
  • 661be523c0 [Triton-MLIR][BACKEND] Minor fixes of shared memory in ReduceOpConversion (#924) Qingyi Liu 2022-11-29 11:50:31 +08:00
  • c87fbf886e [Triton-MLIR][BACKEND] Remove static and unnamed namespace in Utility.h (#923) Yan Chunwei 2022-11-29 09:06:06 +08:00
  • dfc8e7fb95 Fix keren/perf-debug Jokeren 2022-11-28 13:47:13 -08:00
  • 2f9aef1132 Fix Jokeren 2022-11-28 13:00:26 -08:00
  • f605d95b82 unroll_2 Jokeren 2022-11-28 12:59:05 -08:00
  • b378118647 c64 Jokeren 2022-11-28 12:19:52 -08:00
  • cfcf042e55 Init Jokeren 2022-11-28 11:55:41 -08:00
  • 0c1d4d764e [Triton-MLIR][Backend] support MMA v1 in ConvertLayout (#922) goostavz 2022-11-28 16:10:30 +08:00
  • 9d31998a9d [Triton-MLIR][BACKEND] Add argmin / argmax implementation for ReduceOp (#918) Qingyi Liu 2022-11-28 14:59:27 +08:00
  • 04ec5deb41 [Triton-MLIR][BACKEND] decouple the dot code (#921) Yan Chunwei 2022-11-28 13:30:27 +08:00
  • 630dc315ee [Triton-MLIR] uncomment the UT in test_gemm that has already been fixed (#920) goostavz 2022-11-28 11:23:20 +08:00
  • 35c9ec1103 [Triton-MLIR][Backend] Fix number of warps and threads per warp when matrices are small (#917) Keren Zhou 2022-11-26 12:30:38 -08:00
  • ee098d0341 Merge branch 'master' into keren/improve-hook Jokeren 2022-11-25 15:04:59 -08:00
  • f63be0e9b5 [TRITON-MLIR][BACKEND]support atomic_cas (#914) donproc 2022-11-25 12:02:08 +08:00
  • feef58ee8a Pass fn to CompiliedKernel Jokeren 2022-11-24 14:22:35 -08:00
  • 153aecb339 [Triton-MLIR][BACKEND] insert_slice_async on GPUs < sm80 (#908) Keren Zhou 2022-11-24 14:05:54 -08:00
  • f98aed1258 [Triton-MLIR][RUNTIME] Add /usr/bin/ptxas as a search path (#909) Crutcher Dunnavant 2022-11-24 10:49:16 -08:00
  • ace7d28736 [Triton-MLIR][RUNTIME] Fix ir metadata lookup bug (#910) Crutcher Dunnavant 2022-11-24 00:27:23 -08:00
  • b688f7b7b8 [Triton-MLIR] add_volta_warpsPerTile (#907) ben-zhang-609 2022-11-24 09:44:29 +08:00
  • 8925c2cd11 [TRITON-MLIR][BACKEND]AtomicRMWOp supports scalar (#903) donproc 2022-11-23 15:59:09 +08:00
  • 2e33352419 [Triton-MLIR] Fix side effects (#906) Keren Zhou 2022-11-22 23:29:18 -08:00
  • 037f9efa95 [Triton-MLIR][BACKEND] Fix wpt overflow issue in mma v2 (#904) Yan Chunwei 2022-11-23 11:27:15 +08:00
  • 07786dc932 [Triton-MLIR] Add compute capability (#902) ben-zhang-609 2022-11-23 03:08:23 +08:00
  • 2afebcd79b [Triton-MLIR][Backend] Remove unnecessary barriers (#901) Keren Zhou 2022-11-22 10:03:29 -08:00
  • 136668bac3 [Triton-MLIR][BACKEND] tiny code cleanup (#899) Yan Chunwei 2022-11-21 16:00:46 +08:00
  • 04b852e031 [Triton-MLIR] Fix warnings and variable names (#898) Keren Zhou 2022-11-20 22:25:27 -08:00
  • 85cccfb81f [BUILD] Fix compilation problems in the release build (#897) Keren Zhou 2022-11-20 21:40:36 -08:00
  • 23f71daa27 [OPTIMIZER] Fixed up order of shared layouts (#881) Philippe Tillet 2022-11-21 06:25:02 +01:00
  • 44f577984d Fix format double substitution bug: {i} => {{i}} (#886) Crutcher Dunnavant 2022-11-20 11:44:42 -08:00
  • 4d64ffb5fe [FRONTEND] Handle for loops with negative constant steps (#896) Philippe Tillet 2022-11-20 11:37:38 +01:00
  • 6c5f646f4e [WIP][Triton-MLIR] Prefetch pass fixup (#873) Keren Zhou 2022-11-19 19:57:16 -08:00
  • e8994209f4 [Triton-MLIR][Backend]fix mma-v2 transpose error (#888) Yan Chunwei 2022-11-20 11:29:09 +08:00
  • 8a5647782d [Triton-MLIR][Testing]Fix tests warning, with small code clean-up (#894) Jun Yang 2022-11-19 22:33:59 +08:00
  • afaf59b0c9 [TRITON-MLIR][BACKEND] Atomic support mask (#889) donproc 2022-11-19 19:57:19 +08:00
  • 46fd581b0a Merge pull request #29 from ROCmSoftwarePlatform/parse_amdgcn_from_rocminfo rocm rsanthanam-amd 2022-11-18 12:53:25 -06:00
  • 8cc448d92e Changes to eliminate the need for the MI_GPU_ARCH environment variable. Rohit Santhanam 2022-11-18 12:58:51 +00:00
  • dab4855bdf [TESTING] Added infrastructure for executing TTGIR program and test for layout conversions (#885) Philippe Tillet 2022-11-18 07:46:45 +01:00
  • 9ea6135eb5 [Triton-MLIR][Backend] Some cleanup in getMultiDimIndex/getLinearIndex (#880) goostavz 2022-11-18 09:19:21 +08:00
  • 0e4691e6dd [FRONTEND] Fix ExternLibrary(format=) bug; type annotate build_extern.py (#883) Crutcher Dunnavant 2022-11-17 09:45:30 -08:00
  • 5eee738df7 [Triton-MLIR][FRONTEND] [BACKEND] fix atomics (#879) donproc 2022-11-16 12:25:15 +08:00
  • 37f5846280 [Triton-MLIR][Backend] Minor fix for allocation and backend in handling tt.ptr tensors (#878) goostavz 2022-11-15 18:08:07 +08:00
  • a22ff39017 [Triton-MLIR][BACKEND] Refine/add codegen for get_promgram_id and get_num_programs Op (#877) Yan Chunwei 2022-11-15 15:45:24 +08:00
  • 4c4159c6fa [Triton-MLIR] Add ex2.approx implementation for ExpOp and fix smem allocation for ReduceOpConversion (#875) Qingyi Liu 2022-11-15 09:27:32 +08:00
  • c28cfd821b [Triton-MLIR][Backend] Fix convert_layout blocked->shared in non-default order (#876) goostavz 2022-11-15 09:02:46 +08:00
  • 1eedaf7bec [Triton-MLIR][BACKEND] adapt DotOp layout for FMADot (#872) Yan Chunwei 2022-11-14 16:56:30 +08:00
  • 516a241234 [Triton-MLIR] Fix some typos (#874) Chenggang Zhao 2022-11-14 10:15:53 +08:00
  • f40c63fb03 [Triton-MLIR][OPTIMIZER] Cleaned up swizzling (#869) Philippe Tillet 2022-11-10 12:05:46 -08:00
  • 2aa538ec2e [BACKEND] Added support for mma layouts in reductions (#863) Philippe Tillet 2022-11-10 09:58:07 -08:00
  • 57fd1864a7 [Triton-MLIR] Support FP8 (#864) Chenggang Zhao 2022-11-10 15:53:06 +08:00
  • 4946167241 [Triton-MLIR] tt.dot operands now must have DotOperand layout; also added prefetch pass prototype (#712) Da Yan 2022-11-10 13:57:27 +08:00
  • 8832e32683 [Triton-MLIR][BACKEND] Refine ptxbuilder (#867) Yan Chunwei 2022-11-10 13:41:52 +08:00
  • 4640023d9b [Triton-MLIR][Backend]add atomic rmw without mask (#842) donproc 2022-11-10 08:15:58 +08:00
  • 0c87360657 [Triton-MLIR][Backend] Port FMADot conversion for DotOp (#844) Yan Chunwei 2022-11-09 12:57:50 +08:00
  • de5b84c476 [Triton-MLIR][Backend] Fix mma<v2> int8 precision error (#850) Yan Chunwei 2022-11-09 12:23:43 +08:00
  • e517b58d59 [Triton-MLIR] Minor fixes to enable fused-softmax and layer-norm tutorials (#835) Qingyi Liu 2022-11-09 10:18:56 +08:00
  • 2da71b2aaa [Triton-MLIR] Increase block size K to completely eliminate shared memory bank conflicts (#862) Keren Zhou 2022-11-08 17:39:23 -08:00
  • 080b4addf8 [Triton-MLIR][Backend] Fix the order in linear/delinear and a few bugs in reduce conversion (#851) goostavz 2022-11-09 02:10:09 +08:00
  • 303790da88 [BUILD] use Python Var In Tests (#859) Ian Bearman 2022-11-08 09:44:19 -08:00
  • 137344946f [OPTIMIZER] Fix the load-mask issue with the pipeline pass (#857) Da Yan 2022-11-09 01:29:53 +08:00
  • 976cf12af1 [OPTIMIZER] Fixed memory coalescing (#847) Philippe Tillet 2022-11-07 06:22:18 -08:00
  • b6f15e214b [FRONTEND] Fixed up type cast in atomics codegen (#853) Philippe Tillet 2022-11-07 05:46:24 -08:00
  • 84ad215268 [Triton-MLIR] Enable libdevice for ptx backend when has external functions. (#848) ben-zhang-609 2022-11-07 16:01:50 +08:00
  • fdd59900f7 [Triton-MLIR] Replace triton.extract_slice with tensor.extract_slice and support more general tensor slicing (#837) Keren Zhou 2022-11-06 22:59:03 -08:00
  • a4ff0c362c [FRONTEND] Fix issues with atomics (#849) Philippe Tillet 2022-11-06 20:52:11 -08:00
  • d767919bc1 [OPTIMIZER] Not using MMA on FP32 when allowTF32 is false port-fma Phil Tillet 2022-11-04 23:16:28 -07:00
  • 0d7e753227 [TESTING] use torch.int for autotuning cache (#840) Natalia Gimelshein 2022-11-04 18:05:16 -07:00
  • b6dbe959f0 [RUNTIME] Re-vamped cache so users can manually patch IR / ptx / cubin files (#845) Philippe Tillet 2022-11-04 10:57:29 -07:00
  • b39cc56f93 up Superjomn 2022-11-04 18:04:20 +08:00
  • db64477153 Merge remote-tracking branch 'origin/triton-mlir' into port-fma Superjomn 2022-11-04 17:43:54 +08:00
  • 95d8e383cb add testing Superjomn 2022-11-04 17:43:09 +08:00
  • 1ed6ee34ba finish coding Superjomn 2022-11-04 16:54:05 +08:00