Yan Da
2239ac1998
more progress on TritonGPU
2022-04-28 18:51:31 +08:00
Philippe Tillet
012e8c5b2b
fixup
2022-04-27 16:39:27 -07:00
Philippe Tillet
513bcaee50
Added some ASCII art for encoding documentation
2022-04-27 16:28:27 -07:00
Yan Da
38d13ae618
Some progress on TritonGPU
2022-04-27 21:16:45 +08:00
Yan Da
edca91bf8f
Update traits (NoSideEffect)
2022-04-27 19:41:07 +08:00
Yan Da
8dfe78f6cf
Add TritonCombineOps
2022-04-27 19:28:21 +08:00
Yan Da
c70f6b666e
Merge previous changes
2022-04-27 14:06:55 +08:00
Yan Da
74585fb970
Add Triton CombineOps
2022-04-27 13:45:56 +08:00
Philippe Tillet
81001d318c
Putting Triton dialect in its own folder
2022-04-26 14:39:27 -07:00
Philippe Tillet
62a64ff29b
Fixed Python link bug in CMakeLists
2022-04-26 14:39:18 -07:00
Yan Da
fcbbb3c10e
Fix visit_While issues
2022-04-10 16:16:13 +08:00
Yan Da
f1cc67bbc3
triton -> tt
2022-04-10 12:07:19 +08:00
Yan Da
28e96bbfd1
Remove the dependency on TensorDialect
2022-04-08 19:43:09 +08:00
Yan Da
62f7609612
More on type inference & assembly format
2022-04-08 19:37:57 +08:00
Yan Da
13aead4808
Use TableGen to define new types
2022-04-08 16:32:46 +08:00
Yan Da
6002340456
Better textual representation
2022-04-07 20:44:41 +08:00
Yan Da
62f772123c
now kernel functions return nothing (instead of none)
2022-04-07 20:22:17 +08:00
Yan Da
040a2b6c75
Fix OpBuilder
2022-04-07 20:01:31 +08:00
Yan Da
6b4da6f016
Documentation
2022-04-07 16:00:53 +08:00
Yan Da
16d44e5c4c
Verify power-of-2
2022-04-07 15:28:02 +08:00
Yan Da
9cf4107990
Add TensorSizeTrait
2022-04-07 15:18:43 +08:00
Yan Da
9dafa0e2e3
Update trtion dependencies
2022-04-01 20:16:07 +08:00
Yan Da
2041b67fbf
Now vecadd works
2022-03-30 20:21:47 +08:00
Yan Da
38e67b4293
Add more Ops
2022-03-28 19:50:23 +08:00
Yan Da
0d139ec460
Introducing SCF
2022-03-26 17:02:32 +08:00
Yan Da
5e117966d0
CatOp
2022-03-25 14:17:17 +08:00
Yan Da
f2ab318614
New python binding
2022-03-22 21:53:22 +08:00
Yan Da
419bbe0f6e
Reverts back to MLIR 14 & updates CMakeLists
2022-03-20 16:41:48 +08:00
Yan Da
a2c31ff434
Init commit
2022-03-17 20:40:55 +08:00
daadaada
539961072c
[FRONTEND] Semantic analysis refactor ( #473 )
...
Moved dispatch.cc to semantic.py
Integer signedness now moved from C++ to python
Cleaner frontend type
Co-authored-by: Phil Tillet <phil@openai.com >
2022-03-16 21:25:30 -07:00
Philippe Tillet
bb5765df5c
[CODEGEN] Now padding shared memory for layout conversion ( #468 )
2022-03-03 22:19:05 -08:00
daadaada
d9dd97492f
Use unique_ptr in ir::context_impl ( #462 )
...
Co-authored-by: Philippe Tillet <Phil.Tillet@gmail.com >
2022-02-24 16:07:10 -08:00
Philippe Tillet
98ed7db8c1
[CODEGEN] Improvements and bugfixes ( #463 )
2022-02-24 14:56:24 -08:00
Philippe Tillet
807d8a1945
[ALL] Merge master ( #447 )
2022-01-30 20:21:20 -08:00
Philippe Tillet
bef76b142a
[BACKEND] float division is now approximate by default ( #446 )
2022-01-29 18:29:29 -08:00
daadaada
59d371c6eb
[BACKEND] Added Int8 mma ( #440 )
2022-01-27 09:12:44 -08:00
Benjamin Lefaudeux
3a23c1dd33
[BACKEND] minor, hotfix for gcc compilation ( #439 )
2022-01-23 14:24:02 -08:00
Philippe Tillet
4c94359199
[FRONTEND] Alignment fix-up ( #428 )
2022-01-11 23:11:58 -08:00
daadaada
94a2e10fe5
[BACKEND] Add bf16 & tf32 mma supports (on A100) ( #426 )
2022-01-11 10:20:31 -08:00
Madeleine Thompson
0ab9d67bad
uint8, uint16, uint32, and uint64 in kernels ( #413 )
...
A forthcoming PR will update the RNG to use these types.
Also:
- Add tests for the `//`, `<<`, and `>>` operators.
- Change `TensorWrapper` to unwrap objects when the resulting object would be simpler.
- Clean up `throw_unreachable`, since it was triggering compiler warnings.
2022-01-05 15:27:17 -08:00
Philippe Tillet
03f1256f60
[FRONTEND] Added volatile
flag for load ( #407 )
2021-12-30 22:33:24 -08:00
Madeleine Thompson
985798f101
add missing bfloat16 repr and improve assertions ( #403 )
...
- `BF16TyID` was missing a repr implementation.
- Throw a better exception on impossible casts.
- Add a few assertions. Tested with a debug build.
- Add `pointer_dtype.__str__` to aid kernel debugging.
2021-12-23 17:01:17 -08:00
daadaada
39d4bfed83
[OPS] Add performance model for gemm/gemv ( #397 )
...
Significantly improves the performance of `triton.ops.matmul` in memory-bound settings via the use of many more block configs coupled with a performance model to drive the auto-tuning process.
2021-12-21 09:56:10 -08:00
Madeleine Thompson
fa62b4a8f6
[FRONTEND] better stringification ( #394 )
...
- Don't override `self.args` in `CompilationError`, and show the line number and column in error messages. This causes it to generate an easier-to-read backtrace.
- Better `__str__` on `TensorWrapper`, `dtype`, and `block`.
2021-12-17 20:11:45 -08:00
Philippe Tillet
558555630f
[FRONTEND] Added xor_sum
2021-12-16 17:55:35 -08:00
Philippe Tillet
5ce1b726dc
[CODEGEN] Various bugfixes that make it possible to fuse RNG in a matmul epilogue ( #356 )
2021-10-24 02:30:46 -07:00
daadaada
858dec8372
[CODEGEN] Add cache modifier to tl.load ( #351 )
...
* Add cache modifier to tl.load
* Add comment to cache_modifier
* Remove force_nc_cache
* Update test
2021-10-17 22:14:04 -07:00
Stephen McGroarty
c2e6b90ff1
[CODEGEN] Fixes masked load exception ( #342 )
2021-10-13 13:31:52 -07:00
Philippe Tillet
6e5b0b4301
[FRONTEND] Added on-disk cache for compiled kernels ( #287 )
2021-09-18 22:48:26 -07:00
Philippe Tillet
94c83d30ce
[GENERAL] Removed deprecated driver files and added basic compatibility with rocm ( #268 )
...
- Removed driver module -- accelerator runtime is handled by pytorch
- Added basic support for ROCM based on @micmelesse 's PR -- now can execute empty kernel on AMD devices without any compile-time changes
- Now only using PREFER_SHARED for kernels when the size of shared memory is greater than 49k. Otherwise there can be poor L1 performance for broadcast tensors
2021-09-09 00:04:28 -07:00