triton

Author	SHA1	Message	Date
Rohit Santhanam	8cc448d92e	Changes to eliminate the need for the MI_GPU_ARCH environment variable. The AMDGPU arch is now parsed out of the rocminfo dump.	2022-11-18 18:51:57 +00:00
Michael Melesse	15886b5ffc	skip segfault	2022-11-01 17:52:18 +00:00
Michael Melesse	d5830b4b6a	Merge branch 'master' into IFU_11_1_2022	2022-11-01 17:29:10 +00:00
Michael Melesse	dfad6bdf36	reduce the skips for test_reduce functions	2022-11-01 15:00:12 +00:00
Michael Melesse	4fb9d4904e	fix 6/7 dot tests	2022-11-01 14:18:06 +00:00
Michael Melesse	d024f0cfb8	update test_dot to use float 32	2022-10-31 18:58:10 +00:00
Michael Melesse	9b3f2487b5	fix minor bug	2022-10-31 18:33:47 +00:00
Michael Melesse	15683986cd	unskip most bfloat tests	2022-10-31 18:04:54 +00:00
Keren Zhou	3ca667dfa8	[Frontend] Return a scalar if all input args are scalar (#816 )	2022-10-28 23:27:06 -07:00
Michael Melesse	8d9572bc63	add similar fixes two addition tests	2022-10-28 20:34:58 +00:00
Michael Melesse	ffb30cdc52	skip ptx assert	2022-10-28 20:23:11 +00:00
rsanthanam-amd	531ef18cb6	Fix for binop % (mod) unit test failures. (#13 ) If the either data type if fp, then fmod should be used for the reference computation.	2022-10-28 15:06:17 -04:00
Michael Melesse	6e50f8b2c0	print irs	2022-10-28 17:46:52 +00:00
Michael Melesse	ed9638801a	fix for test_cast	2022-10-26 21:34:58 +00:00
Michael Melesse	8ecab462f6	skip segfaults on ROCM	2022-10-26 20:46:47 +00:00
Michael Melesse	648e4cfe89	skip test_atomic_rmw on rocm	2022-10-26 18:22:23 +00:00
Michael Melesse	0cae0168ec	fix bfloat failure	2022-10-26 17:40:28 +00:00
Michael Melesse	39381d99f8	send amdgcn to cache	2022-10-26 17:18:33 +00:00
Michael Melesse	61c85c18b2	try to load binary	2022-10-25 20:29:43 +00:00
Michael Melesse	09302f0106	fix linking bug	2022-10-25 18:31:10 +00:00
Yanbo Liang	5ca1ed0101	Add bf16/fp16/fp64 support for ty_to_cpp (#800 ) In ```torch._inductor```, we [convert 0d CPU tensor to scalar during triton codegen](https://github.com/pytorch/pytorch/pull/87329), so need add missing triton support for bf16/fp16/fp64.	2022-10-24 19:41:25 -07:00
Michael Melesse	9184b5cf65	add prints	2022-10-24 18:28:28 +00:00
Michael Melesse	8da4323514	write hipmodule bytes	2022-10-24 17:58:25 +00:00
Michael Melesse	8785793445	fix typo	2022-10-21 17:58:38 +00:00
Michael Melesse	d022f5cf2c	add compiling back to gcn	2022-10-21 17:54:31 +00:00
Michael Melesse	4624fd4e1d	save compiler	2022-10-19 20:39:32 +00:00
Michael Melesse	41144f927f	fix hip launch	2022-10-17 20:41:28 +00:00
Michael Melesse	4d6d4c9431	hip src	2022-10-17 20:18:44 +00:00
Michael Melesse	4f21501def	add fixes	2022-10-17 18:21:14 +00:00
Michael Melesse	5c548fb57e	Merge branch 'master' into rcom52_fixes	2022-10-17 17:53:48 +00:00
Daniil Fukalov	406d03bfaf	Improve ROCm support. (#780 ) - updates to support ROCm 5.2 - workarounds in tests where NV tools were used unconditionally - implemented `get_num_blocks()` and `add_memfence()` for AMD GPU - backported from history some atomics - added bf16 support - minor warnings cleanup - added dockerfile to run on a ROCm enabled machine Co-authored-by: B1tway <andrew.shukshov@gmail.com> Co-authored-by: Andrey Shukshov <36711069+B1tway@users.noreply.github.com>	2022-10-14 11:33:42 -07:00
Keren Zhou	db3aa1d1fb	[FRONTEND] Fix libdevice (#776 ) Fix two problems in libdevice and external dispatch: 1. Use static triton types (e.g., tl.int32) instead of creating new types. Otherwise, `tl.int32` and `tl.dtype('int32')` are not the same thing. 2. The name of an extern inst should be empty but not the symbol name of the inst. TTIR generator will assign names automatically. Otherwise, we have the same variable name when there are multiple same extern insts. Before the PR: ```bash __nv_exp = extern_elementwise f64<1024> %11; __nv_exp = extern_elementwise f64<1024> %11; ``` After the PR: ```bash %12 = extern_elementwise f64<1024> %11; %13 = extern_elementwise f64<1024> %11; ```	2022-10-13 17:18:16 -07:00
Keren Zhou	bc98aead33	[Backend] Fix for mov.u8 (#766 ) Init a potential fix for mov.u8 which is not supported by ptx for now. Use mov.u16 instead and cast it to u8.	2022-10-12 14:32:27 -07:00
Yu Guo	71b46acc42	[IR] Added special-purpose `dequantize` instruction (#759 ) It is currently necessary for optimal performance in quantized workloads to add a special-purpose instruction in the IR. Backward compatibility with this instruction is NOT guaranteed.	2022-10-12 14:14:45 -07:00
Philippe Tillet	af76c989eb	[RUNTIME] Make entry point cache key depend on triton version hash (#765 )	2022-10-11 13:24:30 -07:00
Bin Bao	09cc2d454b	[FRONTEND] Fix a bool tensor storing problem (#746 )	2022-10-10 12:11:50 -07:00
Felipe Petroski Such	5d4b26d380	[RUNTIME] support multiple devices in the same process (#757 )	2022-10-09 20:30:04 -07:00
Chris	9a11a567ce	[DOCS] Fixed typos in 01-vector-add.py (#751 )	2022-10-09 18:12:46 -07:00
Keren Zhou	11345e9b74	[RUNTIME] Add callback functions for external tools (#738 )	2022-10-05 14:46:55 -07:00
Philippe Tillet	bdfdb9a1d2	[RUNTIME] Fixed JIT bug that leg some constexpr values to be overriden by specialization parameters (#742 )	2022-10-05 11:00:32 -07:00
shenggan	77c752dc78	[RUNTIME] remove fixed cu_include_dir (#739 ) Use environment variable `CUDA_HOME` with default value`/usr/local/cuda` for `cu_include_dir` #731	2022-10-04 19:49:57 -07:00
Natalia Gimelshein	d3c925db8a	[FRONTEND] properly broadcast scalar where condition (#736 )	2022-10-04 12:44:03 -07:00
fdrocha	2b0f877fad	[RUNTIME] Support environments with multiple cudalibs (#733 )	2022-10-03 18:36:24 +00:00
Keren Zhou	4a2d3b7d79	[RUNTIME] Dump llvm, ttir, and sass to help debugging (#732 )	2022-10-03 00:39:52 +00:00
Natalia Gimelshein	f55960e773	[FRONTEND] fix broadcasting for where (#729 ) Fixes #532, all 3 inputs to where have to be broadcast together.	2022-10-01 13:18:47 -07:00
Phil Tillet	b244db06da	[TUTORIALS] Attention tutorial fixup	2022-09-30 19:31:43 -07:00
Shintaro Iwasaki	ae59f51c2d	[CODEGEN] Fix an inliner to call a function with a phi-node (#727 )	2022-09-29 21:36:40 -07:00
albanD	f45e31ba7c	[FRONTEND] Make sure to hold the gil when creating python objects (#726 ) Without this patch, a debug version of python complains that: ``` Fatal Python error: Python memory allocator called without holding the GIL Python runtime state: initialized ```	2022-09-29 18:06:22 -07:00
Philippe Tillet	dad97528b2	[TESTING] allclose fixup (#724 )	2022-09-28 22:49:05 +00:00
Jason Ansel	998fd5f9af	[FRONTEND] Make triton.compile work without a cuda context (#708 ) This allows compiling in a subprocess. I'm not seeing a ton of speedup from this, but figure it is a good change anyway.	2022-09-24 13:41:47 -07:00

1 2 3 4 5 ...

429 Commits