Michael Melesse
6e50f8b2c0
print irs
2022-10-28 17:46:52 +00:00
Michael Melesse
aa556d4f1b
update script
2022-10-26 21:51:15 +00:00
Michael Melesse
61e88efb23
ignore logs
2022-10-26 21:42:41 +00:00
Michael Melesse
ed9638801a
fix for test_cast
2022-10-26 21:34:58 +00:00
Michael Melesse
8ecab462f6
skip segfaults on ROCM
2022-10-26 20:46:47 +00:00
Michael Melesse
648e4cfe89
skip test_atomic_rmw on rocm
2022-10-26 18:22:23 +00:00
Michael Melesse
abe0d3e1b1
cast to amd device when as_nvidia shows up
2022-10-26 18:12:18 +00:00
Michael Melesse
4464dfcc18
save scripts
2022-10-26 17:42:58 +00:00
Michael Melesse
0cae0168ec
fix bfloat failure
2022-10-26 17:40:28 +00:00
Michael Melesse
88d57ef9c9
add cache print
2022-10-26 17:19:30 +00:00
Michael Melesse
39381d99f8
send amdgcn to cache
2022-10-26 17:18:33 +00:00
Michael Melesse
df925f7187
add cache print script
2022-10-25 20:48:36 +00:00
Michael Melesse
e84297ca79
print cache
2022-10-25 20:44:42 +00:00
Michael Melesse
61c85c18b2
try to load binary
2022-10-25 20:29:43 +00:00
Michael Melesse
da5c24ffcb
just clean cache
2022-10-25 20:27:13 +00:00
Michael Melesse
09302f0106
fix linking bug
2022-10-25 18:31:10 +00:00
Michael Melesse
9184b5cf65
add prints
2022-10-24 18:28:28 +00:00
Michael Melesse
8da4323514
write hipmodule bytes
2022-10-24 17:58:25 +00:00
Michael Melesse
eb89e9bdd9
fix generator.cc: generator::visit_function: segfault
2022-10-24 17:41:20 +00:00
Michael Melesse
56a06f7a06
add debug steps
2022-10-21 20:17:30 +00:00
Michael Melesse
6a31c43774
update batcktrace
2022-10-21 19:56:19 +00:00
Michael Melesse
8785793445
fix typo
2022-10-21 17:58:38 +00:00
Michael Melesse
d022f5cf2c
add compiling back to gcn
2022-10-21 17:54:31 +00:00
Michael Melesse
4624fd4e1d
save compiler
2022-10-19 20:39:32 +00:00
Michael Melesse
41144f927f
fix hip launch
2022-10-17 20:41:28 +00:00
Michael Melesse
4d6d4c9431
hip src
2022-10-17 20:18:44 +00:00
Michael Melesse
32dbc08c05
fix llvm build errors
2022-10-17 18:29:15 +00:00
Michael Melesse
4f21501def
add fixes
2022-10-17 18:21:14 +00:00
Michael Melesse
5c548fb57e
Merge branch 'master' into rcom52_fixes
2022-10-17 17:53:48 +00:00
Michael Melesse
fa4d0fd1ef
add scripts
2022-10-17 17:28:48 +00:00
Daniil Fukalov
406d03bfaf
Improve ROCm support. ( #780 )
...
- updates to support ROCm 5.2
- workarounds in tests where NV tools were used unconditionally
- implemented `get_num_blocks()` and `add_memfence()` for AMD GPU
- backported from history some atomics
- added bf16 support
- minor warnings cleanup
- added dockerfile to run on a ROCm enabled machine
Co-authored-by: B1tway <andrew.shukshov@gmail.com >
Co-authored-by: Andrey Shukshov <36711069+B1tway@users.noreply.github.com >
2022-10-14 11:33:42 -07:00
Keren Zhou
db3aa1d1fb
[FRONTEND] Fix libdevice ( #776 )
...
Fix two problems in libdevice and external dispatch:
1. Use static triton types (e.g., tl.int32) instead of creating new
types. Otherwise, `tl.int32` and `tl.dtype('int32')` are not the same
thing.
2. The name of an extern inst should be empty but not the symbol name of
the inst. TTIR generator will assign names automatically. Otherwise, we
have the same variable name when there are multiple same extern insts.
Before the PR:
```bash
__nv_exp = extern_elementwise f64<1024> %11;
__nv_exp = extern_elementwise f64<1024> %11;
```
After the PR:
```bash
%12 = extern_elementwise f64<1024> %11;
%13 = extern_elementwise f64<1024> %11;
```
2022-10-13 17:18:16 -07:00
Twizzes
ddae106c0e
[DOCS] Update installation.rst to fix windows build error ( #747 )
2022-10-13 13:27:15 -07:00
Keren Zhou
bc98aead33
[Backend] Fix for mov.u8 ( #766 )
...
Init a potential fix for mov.u8 which is not supported by ptx for now.
Use mov.u16 instead and cast it to u8.
2022-10-12 14:32:27 -07:00
Yu Guo
71b46acc42
[IR] Added special-purpose dequantize
instruction ( #759 )
...
It is currently necessary for optimal performance in quantized workloads to add a special-purpose instruction in the IR. Backward compatibility with this instruction is *NOT* guaranteed.
2022-10-12 14:14:45 -07:00
Philippe Tillet
33e6f0df7f
[DRIVER] Bumped CUDA requirement to 11.4+. This is to avoid bad performance surprises as older ptxas
are much slower. ( #769 )
...
This also makes codegen simpler by avoiding special handling of eviction policies
2022-10-12 12:02:30 -07:00
Philippe Tillet
af76c989eb
[RUNTIME] Make entry point cache key depend on triton version hash ( #765 )
2022-10-11 13:24:30 -07:00
Bin Bao
09cc2d454b
[FRONTEND] Fix a bool tensor storing problem ( #746 )
2022-10-10 12:11:50 -07:00
Felipe Petroski Such
5d4b26d380
[RUNTIME] support multiple devices in the same process ( #757 )
2022-10-09 20:30:04 -07:00
Chris
9a11a567ce
[DOCS] Fixed typos in 01-vector-add.py ( #751 )
2022-10-09 18:12:46 -07:00
Keren Zhou
11345e9b74
[RUNTIME] Add callback functions for external tools ( #738 )
2022-10-05 14:46:55 -07:00
Philippe Tillet
bdfdb9a1d2
[RUNTIME] Fixed JIT bug that leg some constexpr values to be overriden by specialization parameters ( #742 )
2022-10-05 11:00:32 -07:00
shenggan
77c752dc78
[RUNTIME] remove fixed cu_include_dir ( #739 )
...
Use environment variable `CUDA_HOME` with default value`/usr/local/cuda` for `cu_include_dir` #731
2022-10-04 19:49:57 -07:00
Natalia Gimelshein
d3c925db8a
[FRONTEND] properly broadcast scalar where condition ( #736 )
2022-10-04 12:44:03 -07:00
fdrocha
2b0f877fad
[RUNTIME] Support environments with multiple cudalibs ( #733 )
2022-10-03 18:36:24 +00:00
Keren Zhou
4a2d3b7d79
[RUNTIME] Dump llvm, ttir, and sass to help debugging ( #732 )
2022-10-03 00:39:52 +00:00
Natalia Gimelshein
f55960e773
[FRONTEND] fix broadcasting for where ( #729 )
...
Fixes #532 , all 3 inputs to where have to be broadcast together.
2022-10-01 13:18:47 -07:00
Phil Tillet
b244db06da
[TUTORIALS] Attention tutorial fixup
2022-09-30 19:31:43 -07:00
Shintaro Iwasaki
7b61303ea1
[CODEGEN] Fix extract_N_bufferable in layout analysis ( #728 )
2022-09-30 12:21:22 -07:00
Shintaro Iwasaki
ae59f51c2d
[CODEGEN] Fix an inliner to call a function with a phi-node ( #727 )
2022-09-29 21:36:40 -07:00