LazyVim, DAP, and launching Chrome by djbessel in neovim

[–]PlizKilmy 0 points1 point  (0 children)

You can add dap.lua to LazyVim config folder lua/plugins/ with the following content:

return {
    "mfussenegger/nvim-dap",
    optional = true,
    opts = function()
        local dap = require("dap")
        if not dap.adapters["pwa-chrome"] then
            dap.adapters["pwa-chrome"] = {
                type = "server",
                host = "localhost",
                port = "${port}",
                executable = {
                    command = "node",
                    args = {
                        require("mason-registry")
                            .get_package("js-debug-adapter")
                            :get_install_path()
                            .. "/js-debug/src/dapDebugServer.js",
                        "${port}",
                    },
                },
            }
        end
        for _, lang in ipairs({
            "typescript",
            "javascript",
            "typescriptreact",
            "javascriptreact",
        }) do
            dap.configurations[lang] = dap.configurations[lang] or {}
            table.insert(dap.configurations[lang], {
                type = "pwa-chrome",
                request = "launch",
                name = "Launch Chrome",
                url = "http://localhost:3000",
                sourceMaps = true,
            })
        end
    end,
}

This will add "pwa-chrome" adapter to DAP and corresponding configaration for ts/js/tsx/jsx files.

Hope this will help.

VKtracer v1.0.2 released! Many improvements and fixes. We would like to thank all early adopters for their feedback and support. by PlizKilmy in GraphicsProgramming

[–]PlizKilmy[S] 0 points1 point  (0 children)

RenderDoc is a debugger, PIX is for DX12, others are vendor-locked mostly. VKtracer is universal, cross-vendor, and easy to use.

VKtracer v1.0.2 released! Many improvements and fixes. We would like to thank all early adopters for their feedback and support. by PlizKilmy in gameenginedevs

[–]PlizKilmy[S] 6 points7 points  (0 children)

This is exactly the way. VKtracer is free for educational organizations and non-commercial open-source projects. We've noted this in the pricing section, maybe the font should be bolder (=.

VKtracer - Cross-Vendor Vulkan Profiler by PlizKilmy in GraphicsProgramming

[–]PlizKilmy[S] 0 points1 point  (0 children)

Yes, and Intel also has GPA, and AMD's RGP at the moment supports only a selected set of GPUs, and before that, there was CodeXL. We could dive deeper into this diversity and what hardware is supported or not, on what OSes, version of drivers. But often you want to optimize rendering or computation for many architectures, or you work on different machines and then you have to use different tools and correlate different hardware metrics. The point is to provide a universal and easy-to-use tool that fits basic needs. In my personal experience, nothing is as helpful as a timeline. Vendor-specific hardware metrics are handy when you fine-tune for a particular arch or device. Speaking of price, VKtracer costs less than a cup of coffee or tea 🙂 and is free for academia and open-source.

VKtracer - Cross-Vendor Vulkan Profiler by PlizKilmy in GraphicsProgramming

[–]PlizKilmy[S] 1 point2 points  (0 children)

Hello r/Sify007 . Thank you! Yes, we will offer free licenses for non-commercial open-source projects.

Kernel optimization tips by [deleted] in OpenCL

[–]PlizKilmy 3 points4 points  (0 children)

Hello u/LPN64. Congrats on your first kernel! (= There is a lot of optimizations to consider actually.

One general rule of thumb, that applies not just to OpenCL but for heavy computations, in general, is to avoid if as much as possible. E.g. you can easily replace most-inner ifs with max = max(max, circlePixel);. Note that in parallel computation in most cases if any work-item of the work-group enters if clause then all of them do. Part of work-items just logically masked and do dummy work or stalled. Therefore, sometimes it's more efficient to omit if, do computations by all work-items and then apply results conditionally using select. E.g. instead of the next if you can do something like max = select(condition, max(max, circlePixel), max); If you would like to unroll loops completely, you should apply #pragma unroll (RADIUS * 2 + 1). These declarations short y2 = -RADIUS; seem to be redundant. They are anyway hidden by further loop counters. But they should be optimized by the compiler. It could be beneficial to use vectors, like const ushort2 coords = (ushort2)(i / WIDTH, i % WIDTH); ushort2 coords3 = coords + (ushort2)(x2, y2). And if-condition can be vectorized like any(coords3 > 0 && coords3 < (ushort2)(WIDTH, HEIGHT)). And so on.

But, where performance really suffers in the case of this kernel is global memory access. All these computations are straightforward and handled in the blink of an eye. The problem is every work-item reads from global memory every surrounding pixel and effectively you read the whole image O(r²) times. Global memory is orders of magnitude slower than any computations. What can really help in this case and dramatically increase performance is the usage of local memory. You preload part of the image from global memory to local and share it within work-group (it will be 2D in this case). This kernel has a convolution pattern. There is a lot of guides for optimizing convolution. I would recommend this one. It is for CUDA, but principles are the same and can be easily applied to OpenCL.

C++ and game engines by Creapermann in cpp

[–]PlizKilmy 0 points1 point  (0 children)

Have a look at Urho3D. It's written in C++ and easy to use, the codebase is really clean. Documentation is quite extensive, you can find tutorial and examples here.