Philippe Tillet
|
25e9a10917
|
changed auto-tuner parameter ranges
|
2019-06-25 19:27:49 -07:00 |
|
Philippe Tillet
|
d945ce5e1b
|
Now showing valid parameter for NN
|
2019-06-25 19:18:43 -07:00 |
|
Philippe Tillet
|
64513fb407
|
[codegen] added fallback when tensor cores cannot be used
|
2019-06-25 15:49:58 -07:00 |
|
Philippe Tillet
|
62000738f0
|
[codegen] renamed axis_info -> alignment_info
|
2019-06-25 15:10:47 -07:00 |
|
Philippe Tillet
|
d52abc9379
|
[codegen] bugfix in alignment inference
|
2019-06-25 15:06:15 -07:00 |
|
Philippe Tillet
|
edc31cabb0
|
[codegen] rough template for axis_info pass
|
2019-06-24 18:57:32 -07:00 |
|
Philippe Tillet
|
72867d17d4
|
more cleaning
|
2019-06-24 12:37:13 -07:00 |
|
Philippe Tillet
|
f257884eb7
|
some cleaning
|
2019-06-24 09:31:34 -07:00 |
|
Philippe Tillet
|
67989e7d18
|
fixup
|
2019-06-13 20:03:28 -07:00 |
|
Philippe Tillet
|
f7dcea1187
|
Now doing double-buffering
|
2019-06-13 19:48:02 -07:00 |
|
Philippe Tillet
|
36e3667a9a
|
removed shared conflicts for 8x32x4 and 32x8x4 configurations
|
2019-06-13 17:51:54 -07:00 |
|
Philippe Tillet
|
21a9b92c87
|
disabling interleaving
|
2019-06-13 17:16:00 -07:00 |
|
Philippe Tillet
|
d487cf31ce
|
trying 128 bits loads
|
2019-06-12 21:07:01 -07:00 |
|
Philippe Tillet
|
1c6372711b
|
added interleaving
|
2019-06-12 20:30:28 -07:00 |
|
Philippe Tillet
|
a6b580ec05
|
interleaving fails with B
|
2019-06-12 19:46:43 -07:00 |
|
Philippe Tillet
|
1b5a742a88
|
[triton/codegen] added shared memory padding for HMMA arguments and vectorized loads
|
2019-06-11 19:51:08 -07:00 |
|
Philippe Tillet
|
7d50b87681
|
[selection/codegen] bugfix in distributed tile indices initialization
|
2019-06-11 10:45:19 -07:00 |
|
Philippe Tillet
|
06b5992509
|
[feature] added basic tensor core support
|
2019-06-11 10:24:49 -07:00 |
|
Philippe Tillet
|
d074a166e2
|
[feature] basic tensor core utilization works
|
2019-06-08 14:39:45 -07:00 |
|
Philippe Tillet
|
5f3d48c1d0
|
[tensor cores] added basic codegen template for using wmma
|
2019-06-07 21:19:47 -07:00 |
|
Philippe Tillet
|
ec4c6aaaaa
|
Added inline PTX for mma.sync
|
2019-06-07 19:39:33 -07:00 |
|
Philippe Tillet
|
6fce9f28ae
|
added fragmented axis
|
2019-06-07 10:32:56 -07:00 |
|
Philippe Tillet
|
6045209d5b
|
Now find correct tuning configuration
|
2019-06-06 20:13:26 -07:00 |
|
Philippe Tillet
|
0a0b48e9a2
|
adding hmma tuning parameters
|
2019-06-06 19:51:02 -07:00 |
|
Philippe Tillet
|
81eba3e1ec
|
ugh
|
2019-06-06 19:36:41 -07:00 |
|
Philippe Tillet
|
cdf5a0d011
|
[codegen/tune]: added fragmentation types
|
2019-06-06 16:48:32 -07:00 |
|
Philippe Tillet
|
f6fe9492e4
|
[dnn/conv] added triton-c code for wgrad
|
2019-05-11 18:09:23 -04:00 |
|
Philippe Tillet
|
f80441017c
|
[codegen] added leading dimension padding for transposition in shared
memory
|
2019-05-06 11:53:35 -04:00 |
|
Philippe Tillet
|
4813bb007c
|
[codegen] bugfix in builder insert point for predicated instructions
|
2019-05-04 12:09:27 -04:00 |
|
Philippe Tillet
|
30833c18f1
|
[codegen/tune] bugfix in heuristics for nano-tile sizes
|
2019-05-04 01:32:34 -04:00 |
|
Philippe Tillet
|
af58b8bd81
|
[triton-c] predicate in assignment statement now propagates to rhs
computations
|
2019-04-27 14:00:15 -04:00 |
|
Philippe Tillet
|
4b77b764ba
|
[triton-c] added support for while loops
|
2019-04-26 15:08:02 -04:00 |
|
Philippe Tillet
|
3413aad582
|
[general] major overhaul of triton-c/triton-ir/triton-jit:
- Added alloc const
- Added atomics
- Pruning tuning space
- Added example for dot/conv/shift
- Bugfixes
|
2019-04-25 16:18:15 -04:00 |
|
Philippe Tillet
|
0c607c9392
|
[examples] normalize benchmark by max_clock / current_clock
|
2019-03-28 07:58:37 -04:00 |
|
Philippe Tillet
|
2c3ae0675e
|
[JIT] re-added nvidia compatibility
|
2019-03-27 21:12:01 -04:00 |
|
Philippe Tillet
|
fdf8559806
|
[general] added missing files
|
2019-03-27 20:01:35 -04:00 |
|
Philippe Tillet
|
bc2a257d5c
|
[code generation] more flexibility in backend selection
|
2019-03-27 11:29:42 -07:00 |
|
Philippe Tillet
|
e04253c0dd
|
[code generation] basic CPU backend
|
2019-03-27 11:13:36 -07:00 |
|
Philippe Tillet
|
9d6fc1c051
|
[code generation] bugfix in single buffering
|
2019-03-26 15:55:48 -07:00 |
|
Philippe Tillet
|
8d35c98920
|
[code generation] search space pruning
|
2019-03-25 14:10:24 -07:00 |
|
Philippe Tillet
|
deb7a1cc5c
|
Hack to make OpenCL for AMD work
|
2019-03-23 18:58:25 -07:00 |
|
Philippe Tillet
|
9de9feff4a
|
[jit] added runtime for host but compilation still needs to be implemented
|
2019-03-23 13:40:42 -07:00 |
|
Philippe Tillet
|
49fd6ece99
|
some cleaning
|
2019-03-21 23:51:47 -07:00 |
|
Philippe Tillet
|
b73c3bdd25
|
[examples] removed dependency on isaac for auto-tuning
|
2019-03-11 22:22:43 -04:00 |
|
Philippe Tillet
|
87c85ed50d
|
[code generation] reparameterization
|
2019-03-11 19:30:21 -04:00 |
|
Philippe Tillet
|
614f83baee
|
[jit] basic auto-tuning support
|
2019-03-11 12:00:50 -04:00 |
|
Philippe Tillet
|
94e315ea8a
|
Reparameterized in terms of micro- and nano- tiles
|
2019-03-10 23:10:17 -04:00 |
|
Philippe Tillet
|
c96a263896
|
[jit] changed default metaparameter ranges
|
2019-03-10 10:45:21 -04:00 |
|
Philippe Tillet
|
9a3537662d
|
[jit] can now infer launch parameters from triton module
|
2019-03-09 14:44:13 -05:00 |
|
Philippe Tillet
|
b721202812
|
[code generation] uniformized shape and layout metaparameters
|
2019-03-09 12:31:21 -05:00 |
|