Rohit Santhanam
8cc448d92e
Changes to eliminate the need for the MI_GPU_ARCH environment variable.
...
The AMDGPU arch is now parsed out of the rocminfo dump.
2022-11-18 18:51:57 +00:00
Michael Melesse
15886b5ffc
skip segfault
2022-11-01 17:52:18 +00:00
Michael Melesse
d5830b4b6a
Merge branch 'master' into IFU_11_1_2022
2022-11-01 17:29:10 +00:00
Michael Melesse
dfad6bdf36
reduce the skips for test_reduce functions
2022-11-01 15:00:12 +00:00
Michael Melesse
4fb9d4904e
fix 6/7 dot tests
2022-11-01 14:18:06 +00:00
Michael Melesse
d024f0cfb8
update test_dot to use float 32
2022-10-31 18:58:10 +00:00
Michael Melesse
9b3f2487b5
fix minor bug
2022-10-31 18:33:47 +00:00
Michael Melesse
15683986cd
unskip most bfloat tests
2022-10-31 18:04:54 +00:00
Keren Zhou
3ca667dfa8
[Frontend] Return a scalar if all input args are scalar ( #816 )
2022-10-28 23:27:06 -07:00
Michael Melesse
8d9572bc63
add similar fixes two addition tests
2022-10-28 20:34:58 +00:00
Michael Melesse
ffb30cdc52
skip ptx assert
2022-10-28 20:23:11 +00:00
rsanthanam-amd
531ef18cb6
Fix for binop % (mod) unit test failures. ( #13 )
...
If the either data type if fp, then fmod should be used for the
reference computation.
2022-10-28 15:06:17 -04:00
Michael Melesse
6e50f8b2c0
print irs
2022-10-28 17:46:52 +00:00
Michael Melesse
ed9638801a
fix for test_cast
2022-10-26 21:34:58 +00:00
Michael Melesse
8ecab462f6
skip segfaults on ROCM
2022-10-26 20:46:47 +00:00
Michael Melesse
648e4cfe89
skip test_atomic_rmw on rocm
2022-10-26 18:22:23 +00:00
Michael Melesse
0cae0168ec
fix bfloat failure
2022-10-26 17:40:28 +00:00
Michael Melesse
39381d99f8
send amdgcn to cache
2022-10-26 17:18:33 +00:00
Michael Melesse
61c85c18b2
try to load binary
2022-10-25 20:29:43 +00:00
Michael Melesse
09302f0106
fix linking bug
2022-10-25 18:31:10 +00:00
Yanbo Liang
5ca1ed0101
Add bf16/fp16/fp64 support for ty_to_cpp ( #800 )
...
In ```torch._inductor```, we [convert 0d CPU tensor to scalar during
triton codegen](https://github.com/pytorch/pytorch/pull/87329 ), so need
add missing triton support for bf16/fp16/fp64.
2022-10-24 19:41:25 -07:00
Michael Melesse
9184b5cf65
add prints
2022-10-24 18:28:28 +00:00
Michael Melesse
8da4323514
write hipmodule bytes
2022-10-24 17:58:25 +00:00
Michael Melesse
8785793445
fix typo
2022-10-21 17:58:38 +00:00
Michael Melesse
d022f5cf2c
add compiling back to gcn
2022-10-21 17:54:31 +00:00
Michael Melesse
4624fd4e1d
save compiler
2022-10-19 20:39:32 +00:00
Michael Melesse
41144f927f
fix hip launch
2022-10-17 20:41:28 +00:00
Michael Melesse
4d6d4c9431
hip src
2022-10-17 20:18:44 +00:00
Michael Melesse
4f21501def
add fixes
2022-10-17 18:21:14 +00:00
Michael Melesse
5c548fb57e
Merge branch 'master' into rcom52_fixes
2022-10-17 17:53:48 +00:00
Daniil Fukalov
406d03bfaf
Improve ROCm support. ( #780 )
...
- updates to support ROCm 5.2
- workarounds in tests where NV tools were used unconditionally
- implemented `get_num_blocks()` and `add_memfence()` for AMD GPU
- backported from history some atomics
- added bf16 support
- minor warnings cleanup
- added dockerfile to run on a ROCm enabled machine
Co-authored-by: B1tway <andrew.shukshov@gmail.com >
Co-authored-by: Andrey Shukshov <36711069+B1tway@users.noreply.github.com >
2022-10-14 11:33:42 -07:00
Keren Zhou
db3aa1d1fb
[FRONTEND] Fix libdevice ( #776 )
...
Fix two problems in libdevice and external dispatch:
1. Use static triton types (e.g., tl.int32) instead of creating new
types. Otherwise, `tl.int32` and `tl.dtype('int32')` are not the same
thing.
2. The name of an extern inst should be empty but not the symbol name of
the inst. TTIR generator will assign names automatically. Otherwise, we
have the same variable name when there are multiple same extern insts.
Before the PR:
```bash
__nv_exp = extern_elementwise f64<1024> %11;
__nv_exp = extern_elementwise f64<1024> %11;
```
After the PR:
```bash
%12 = extern_elementwise f64<1024> %11;
%13 = extern_elementwise f64<1024> %11;
```
2022-10-13 17:18:16 -07:00
Keren Zhou
bc98aead33
[Backend] Fix for mov.u8 ( #766 )
...
Init a potential fix for mov.u8 which is not supported by ptx for now.
Use mov.u16 instead and cast it to u8.
2022-10-12 14:32:27 -07:00
Yu Guo
71b46acc42
[IR] Added special-purpose dequantize
instruction ( #759 )
...
It is currently necessary for optimal performance in quantized workloads to add a special-purpose instruction in the IR. Backward compatibility with this instruction is *NOT* guaranteed.
2022-10-12 14:14:45 -07:00
Philippe Tillet
af76c989eb
[RUNTIME] Make entry point cache key depend on triton version hash ( #765 )
2022-10-11 13:24:30 -07:00
Bin Bao
09cc2d454b
[FRONTEND] Fix a bool tensor storing problem ( #746 )
2022-10-10 12:11:50 -07:00
Felipe Petroski Such
5d4b26d380
[RUNTIME] support multiple devices in the same process ( #757 )
2022-10-09 20:30:04 -07:00
Chris
9a11a567ce
[DOCS] Fixed typos in 01-vector-add.py ( #751 )
2022-10-09 18:12:46 -07:00
Keren Zhou
11345e9b74
[RUNTIME] Add callback functions for external tools ( #738 )
2022-10-05 14:46:55 -07:00
Philippe Tillet
bdfdb9a1d2
[RUNTIME] Fixed JIT bug that leg some constexpr values to be overriden by specialization parameters ( #742 )
2022-10-05 11:00:32 -07:00
shenggan
77c752dc78
[RUNTIME] remove fixed cu_include_dir ( #739 )
...
Use environment variable `CUDA_HOME` with default value`/usr/local/cuda` for `cu_include_dir` #731
2022-10-04 19:49:57 -07:00
Natalia Gimelshein
d3c925db8a
[FRONTEND] properly broadcast scalar where condition ( #736 )
2022-10-04 12:44:03 -07:00
fdrocha
2b0f877fad
[RUNTIME] Support environments with multiple cudalibs ( #733 )
2022-10-03 18:36:24 +00:00
Keren Zhou
4a2d3b7d79
[RUNTIME] Dump llvm, ttir, and sass to help debugging ( #732 )
2022-10-03 00:39:52 +00:00
Natalia Gimelshein
f55960e773
[FRONTEND] fix broadcasting for where ( #729 )
...
Fixes #532 , all 3 inputs to where have to be broadcast together.
2022-10-01 13:18:47 -07:00
Phil Tillet
b244db06da
[TUTORIALS] Attention tutorial fixup
2022-09-30 19:31:43 -07:00
Shintaro Iwasaki
ae59f51c2d
[CODEGEN] Fix an inliner to call a function with a phi-node ( #727 )
2022-09-29 21:36:40 -07:00
albanD
f45e31ba7c
[FRONTEND] Make sure to hold the gil when creating python objects ( #726 )
...
Without this patch, a debug version of python complains that:
```
Fatal Python error: Python memory allocator called without holding the GIL
Python runtime state: initialized
```
2022-09-29 18:06:22 -07:00
Philippe Tillet
dad97528b2
[TESTING] allclose fixup ( #724 )
2022-09-28 22:49:05 +00:00
Jason Ansel
998fd5f9af
[FRONTEND] Make triton.compile work without a cuda context ( #708 )
...
This allows compiling in a subprocess. I'm not seeing a ton of speedup from this, but figure it is a good change anyway.
2022-09-24 13:41:47 -07:00