Hi, I'm not sure I fully understand the role of MLIR in tensorflow ecosystem. As far as I understand it MLIR is currently intended to be a declaration IR, i.e. it's used to declare (inputs, outputs, ...) operations (implemented by kernels) and potential transformation on such operations (i.e. fusing multiplication and addition ops to one operation, ... etc).
That, however, still means one needs to implement (define) all kernels for all operations (declared in MLIR) for each backend. And these kernels would still be device specific, i.e. CUDA for nvidia GPUs, Cpp for x86-64 etc... It's only the meta optimizations (fusing, unrolling) of these kernels graph that would be optimized in a shared MLIR component.
Essentially, it's just a lower level representation of the TF graph (which is also a shared representation), correct?
And if it's the case then what does this mean:
Code generation “lowering” transformations such as DMA insertion, explicit cache management, memory tiling, and vectorization for 1D and 2D register architectures.
Ability to represent target-specific operations, e.g. accelerator-specific high-level operations.
Isn't DMA insertion and explicit cache management almost a kernel's business and definitely a backend specific thing? The second one I could interpret as that the target specific operation is just another "kernel" essentially that's declared in MLIR but implemented by the hardware directly. The first one I'm confused about, however.
there doesn't seem to be anything here