Michael Melesse
277b712284
save changes
2022-10-31 19:11:58 +00:00
Michael Melesse
d024f0cfb8
update test_dot to use float 32
2022-10-31 18:58:10 +00:00
Michael Melesse
1811791665
add failures in report
2022-10-31 18:39:58 +00:00
Michael Melesse
9b3f2487b5
fix minor bug
2022-10-31 18:33:47 +00:00
rsanthanam-amd
14730a2352
Merge pull request #15 from ROCmSoftwarePlatform/bfloat_enable
...
unskip most bfloat tests
2022-10-31 13:10:30 -05:00
Michael Melesse
15683986cd
unskip most bfloat tests
2022-10-31 18:04:54 +00:00
rsanthanam-amd
48fcd8c987
Merge pull request #14 from ROCmSoftwarePlatform/fix_vectorization
...
fix test_vectorization and test_load_cache_modifier
2022-10-28 16:12:57 -05:00
Michael Melesse
8d9572bc63
add similar fixes two addition tests
2022-10-28 20:34:58 +00:00
Michael Melesse
ffb30cdc52
skip ptx assert
2022-10-28 20:23:11 +00:00
Michael Melesse
7fce2bc5f1
add print_llvm_module
2022-10-28 20:07:35 +00:00
rsanthanam-amd
531ef18cb6
Fix for binop % (mod) unit test failures. ( #13 )
...
If the either data type if fp, then fmod should be used for the
reference computation.
2022-10-28 15:06:17 -04:00
Michael Melesse
5f0d90db7e
tab prints
2022-10-28 19:05:42 +00:00
Michael Melesse
03ae41b310
add print helper
2022-10-28 17:55:28 +00:00
Michael Melesse
bd61338b31
update scripts
2022-10-28 17:48:26 +00:00
Michael Melesse
6e50f8b2c0
print irs
2022-10-28 17:46:52 +00:00
Michael Melesse
aa556d4f1b
update script
2022-10-26 21:51:15 +00:00
Michael Melesse
61e88efb23
ignore logs
2022-10-26 21:42:41 +00:00
Michael Melesse
ed9638801a
fix for test_cast
2022-10-26 21:34:58 +00:00
Michael Melesse
8ecab462f6
skip segfaults on ROCM
2022-10-26 20:46:47 +00:00
Michael Melesse
648e4cfe89
skip test_atomic_rmw on rocm
2022-10-26 18:22:23 +00:00
Michael Melesse
abe0d3e1b1
cast to amd device when as_nvidia shows up
2022-10-26 18:12:18 +00:00
Michael Melesse
4464dfcc18
save scripts
2022-10-26 17:42:58 +00:00
Michael Melesse
0cae0168ec
fix bfloat failure
2022-10-26 17:40:28 +00:00
Michael Melesse
88d57ef9c9
add cache print
2022-10-26 17:19:30 +00:00
Michael Melesse
39381d99f8
send amdgcn to cache
2022-10-26 17:18:33 +00:00
Michael Melesse
df925f7187
add cache print script
2022-10-25 20:48:36 +00:00
Michael Melesse
e84297ca79
print cache
2022-10-25 20:44:42 +00:00
Michael Melesse
61c85c18b2
try to load binary
2022-10-25 20:29:43 +00:00
Michael Melesse
da5c24ffcb
just clean cache
2022-10-25 20:27:13 +00:00
Michael Melesse
09302f0106
fix linking bug
2022-10-25 18:31:10 +00:00
Michael Melesse
9184b5cf65
add prints
2022-10-24 18:28:28 +00:00
Michael Melesse
8da4323514
write hipmodule bytes
2022-10-24 17:58:25 +00:00
Michael Melesse
eb89e9bdd9
fix generator.cc: generator::visit_function: segfault
2022-10-24 17:41:20 +00:00
Michael Melesse
56a06f7a06
add debug steps
2022-10-21 20:17:30 +00:00
Michael Melesse
6a31c43774
update batcktrace
2022-10-21 19:56:19 +00:00
Michael Melesse
8785793445
fix typo
2022-10-21 17:58:38 +00:00
Michael Melesse
d022f5cf2c
add compiling back to gcn
2022-10-21 17:54:31 +00:00
Michael Melesse
4624fd4e1d
save compiler
2022-10-19 20:39:32 +00:00
Michael Melesse
41144f927f
fix hip launch
2022-10-17 20:41:28 +00:00
Michael Melesse
4d6d4c9431
hip src
2022-10-17 20:18:44 +00:00
Michael Melesse
32dbc08c05
fix llvm build errors
2022-10-17 18:29:15 +00:00
Michael Melesse
4f21501def
add fixes
2022-10-17 18:21:14 +00:00
Michael Melesse
5c548fb57e
Merge branch 'master' into rcom52_fixes
2022-10-17 17:53:48 +00:00
Michael Melesse
fa4d0fd1ef
add scripts
2022-10-17 17:28:48 +00:00
Daniil Fukalov
406d03bfaf
Improve ROCm support. ( #780 )
...
- updates to support ROCm 5.2
- workarounds in tests where NV tools were used unconditionally
- implemented `get_num_blocks()` and `add_memfence()` for AMD GPU
- backported from history some atomics
- added bf16 support
- minor warnings cleanup
- added dockerfile to run on a ROCm enabled machine
Co-authored-by: B1tway <andrew.shukshov@gmail.com >
Co-authored-by: Andrey Shukshov <36711069+B1tway@users.noreply.github.com >
2022-10-14 11:33:42 -07:00
Keren Zhou
db3aa1d1fb
[FRONTEND] Fix libdevice ( #776 )
...
Fix two problems in libdevice and external dispatch:
1. Use static triton types (e.g., tl.int32) instead of creating new
types. Otherwise, `tl.int32` and `tl.dtype('int32')` are not the same
thing.
2. The name of an extern inst should be empty but not the symbol name of
the inst. TTIR generator will assign names automatically. Otherwise, we
have the same variable name when there are multiple same extern insts.
Before the PR:
```bash
__nv_exp = extern_elementwise f64<1024> %11;
__nv_exp = extern_elementwise f64<1024> %11;
```
After the PR:
```bash
%12 = extern_elementwise f64<1024> %11;
%13 = extern_elementwise f64<1024> %11;
```
2022-10-13 17:18:16 -07:00
Twizzes
ddae106c0e
[DOCS] Update installation.rst to fix windows build error ( #747 )
2022-10-13 13:27:15 -07:00
Keren Zhou
bc98aead33
[Backend] Fix for mov.u8 ( #766 )
...
Init a potential fix for mov.u8 which is not supported by ptx for now.
Use mov.u16 instead and cast it to u8.
2022-10-12 14:32:27 -07:00
Yu Guo
71b46acc42
[IR] Added special-purpose dequantize
instruction ( #759 )
...
It is currently necessary for optimal performance in quantized workloads to add a special-purpose instruction in the IR. Backward compatibility with this instruction is *NOT* guaranteed.
2022-10-12 14:14:45 -07:00
Philippe Tillet
33e6f0df7f
[DRIVER] Bumped CUDA requirement to 11.4+. This is to avoid bad performance surprises as older ptxas
are much slower. ( #769 )
...
This also makes codegen simpler by avoiding special handling of eviction policies
2022-10-12 12:02:30 -07:00