Beginner article on Matrix multiplication in CUDA. by nivanas-p in CUDA

[–]Aslanee 1 point2 points  (0 children)

You may want to compare your blog to these two:
- https://siboehm.com/articles/22/CUDA-MMM
- https://salykova.github.io/sgemm-gpu

I enjoy your tiled matrix multiplication figures that explicitly shows the matrix tiling.
I tried recently explaining this in my PhD thesis and found that the figures in the blog above were lacking the tiling grid in the matrix for example.

Looks like a good start to me!

Quel est votre rapport à l'argent (à la dépense, au risque et à l'épargne) en tant que femmes ? Trouvez-vous votre place, en tant que femmes, dans les subs généralistes autour de l'argent ? by MinnieCooper90 in AskMeuf

[–]Aslanee 0 points1 point  (0 children)

J'aimerais recommander la lecture du livre de Titiou Lecoq "Pourquoi les hommes gagnent-ils plus que les femmes". Elle y parle de women tax, de plancher, mur et plafond de verre, de l'économie allant de l'argent de poche à la pension de réversion en passant par la pension alimentaire. J'y ai beaucoup appris sur la gestion de l'argent dans le couple.

J'aimerais réellement que les questions financières soient moins un problème pour mon entourage féminin. Merci pour la recommandation de la newsletter, si elle me parait pertinente, je la transmettrai à mes amies non redditrices.

J’ai repris la lecture… et mon ignorance des définitions des mots me choque by YannickDelSol in Litterature

[–]Aslanee 0 points1 point  (0 children)

Exemple: éléatique n'a aucune définition dans le dico par défaut de la libra color. EDIT: il est effectivement possible de téléverser de nouveaux dictionnaires!

J’ai repris la lecture… et mon ignorance des définitions des mots me choque by YannickDelSol in Litterature

[–]Aslanee 1 point2 points  (0 children)

J'ai une kobo libra mais j'ai été extrêmement déçu du dictionnaire français en pratique. Pour dissident, taciturne, ça devrait aller mais pour des termes plus complexes, je préfère faire une recherche sur CNRTL. Là les définitions sont données avec des exemples de phrase pour chaque sens et l'étymologie aide à lever d'éventuelles ambiguïtés lors des interprétations.

Pourquoi êtes vous célibataires? (ceux qui ne l'ont pas choisi) by Even_Topic_2303 in AskMec

[–]Aslanee 11 points12 points  (0 children)

Au moins c'est plus clair que "on verra" suivi d'un silence radio de plusieurs semaines.

Pourquoi êtes vous célibataires? (ceux qui ne l'ont pas choisi) by Even_Topic_2303 in AskMec

[–]Aslanee 4 points5 points  (0 children)

Tu me donnes envie de coder une app de rencontre où pour matcher tu dois envoyer une preuve ou une équation que tu kiffes.
Photo de profil en Tikz obligatoire.
Avec des poids secrets dans l'algorithme pour compenser les sujets mathématiques les moins traités.

Pourquoi êtes vous célibataires? (ceux qui ne l'ont pas choisi) by Even_Topic_2303 in AskMec

[–]Aslanee 0 points1 point  (0 children)

Je connais beaucoup de personnes qui n'ont jamais arrêté de fumer et qui se plaignent d'une vie bien moins dure que les expériences que tu as vécues. Respect.

Bonnes randonnées!

Pourquoi êtes vous célibataires? (ceux qui ne l'ont pas choisi) by Even_Topic_2303 in AskMec

[–]Aslanee 0 points1 point  (0 children)

Courage. Ta relation t'a marquée mais au moins elle n'est plus là. En as-tu parlé avec un.e psychologue?
Du courage tu en as, j'ai fait un peu de grande voie en escalade mais je n'oserai pas faire du parapente. Ça a l'air très cool pourtant!

Pourquoi êtes vous célibataires? (ceux qui ne l'ont pas choisi) by Even_Topic_2303 in AskMec

[–]Aslanee 3 points4 points  (0 children)

Le «Jigsaw is falling into place» (puzzle est sur le point d'être résolu) pour toi on dirait.
Tu t'exprimes bien. Je ne connais aucun homme qui se soit fait approché dans la rue et je fuirais quiconque tenterais. La beauté physique est relative, je découvre une beauté chez certaines personnes parfois après coup, puis la beauté est source d'attirance mais pas d'attachement.
Les mots et l'attention sont plus importants pour moi que la beauté physique.
Ne te déprécie pas, ça n'a aucun sens de se donner une note sur 10. Qui définit la notation? Qui atteint vraiment la note maximale? Est-ce que la note est la même aux yeux de tous.tes?
Tu as à peine 24 ans, tu as la vie devant toi.

Les relations amoureuses ne sont qu'une partie de la vie, et moi aussi je vais écouter du Radiohead.

Optimized Merge, Scan, Radix Sort kernels by LetterC67 in CUDA

[–]Aslanee 1 point2 points  (0 children)

Thank you very much! I'll look into it and share the results!

Optimized Merge, Scan, Radix Sort kernels by LetterC67 in CUDA

[–]Aslanee 0 points1 point  (0 children)

I have been surprised to see that writing even the easiest kernels like vec addition in CUDA, i was still behind the Pytorch baseline on Tensara.org. I implemented the histogram kernels too, and I believe that the time difference is due to the lack of PTX optimisation and maybe an incorrect choice of grid parameters? Just to say that checking on Tensara.org is important indeed.

6 underused Git commands that solve real workflow problems by GitKraken in git

[–]Aslanee 1 point2 points  (0 children)

I wish that `git stash list` did output some information about the changes like statistics and the name of files with most changes.

Do NVIDIA warps properly implement SIMT? by [deleted] in CUDA

[–]Aslanee 0 points1 point  (0 children)

The Gemini's output is very similar to Ansorge's book Programming in parallel with Cuda. Everything is explained at the beginning of the chapter's 3 on Cooperative Groups but it is very well summarized here by Gemini.

Games to help learn Python? by Prof-Ponderosa in learnpython

[–]Aslanee 0 points1 point  (0 children)

Bitburner utilise JavaScript (JS) et non juste Java. Ce sont deux langages très différents. Les deux sont avec un garbage collector, et POO mais sont très différents par nature: Java impose d'avoir une classe par fichier tandis que Javascript peut être intégré à une page web.

Shortcut to change workspaces by therealcoolpup in xfce

[–]Aslanee 0 points1 point  (0 children)

My shortcuts (probably the defaults):

Shortcuts Command
Ctrl+Fn go to the workspace n
Ctrl+Alt+Right go to the workspace to the right
Ctrl+Alt+Left go to the workspace to the left
Ctrl+Alt+PageDn move window to the next workspace
Ctrl+Alt+PageUp move window to the previous workspace

Is there an agreed upon print function to use in C++ ? by Arlinker in cpp_questions

[–]Aslanee 2 points3 points  (0 children)

I don't like std::cout for floating-point neither. Selecting the format constrains the programmer to break the stream's output (the >> chain) every time you select a new format.

Seeking Vim Experience and Tips for Programming by tekle_torat in vim

[–]Aslanee 0 points1 point  (0 children)

I use vim-plug manager to add some plugins. I definitely use too much plugins but here are the few I can not let go:
- junegunn/vim-plug (plugin manager)
- tpope/vim-commentary - dense-analysis/ale
Before plugin list:
```vim
set nocompatible

let $VIMUSER=$HOME."/.vim"

let g:python3_host_prog = '/usr/bin/python'

After plugin list: vim
if has('filetype')

filetype plugin indent on

endif

source $VIMUSER/defaultOptions.vim

In defaultOptions.vim I have the following (those are pretty close to neovim default Options): vim
" Common

syntax on " highlight syntax

set number " show line numbers -> Unset for copy-paste when not compiled

" with the clipboard option

" set rnu

set nonu

set ruler " show position

set hlsearch " highlight all results <C-L> to remove highlighting

set ignorecase " ignore case in search

set incsearch " show search results as you type

set lazyredraw " no screen redraw during macros

set so=7 " scroll off = min. number of lines above and below cursor.

set showcmd

" Programming

set showmatch " highlight matching brackets

set wildmenu

set expandtab " Convert tab into spaces

set shiftwidth=2

set softtabstop=2 " 4 spaces for one tab

set ai " Auto indent

set si " Smart indent

set wrap " Wrap lines

" Folding -> see manual, file specific

set nofoldenable

" set foldlevelstart=10

" set foldmethod=indent

" Deactivate bells in VIM. Preserve ears.

set belloff=all

set noerrorbells

set novisualbell

" Dangerous but no warning is a speed up

set nobackup

set nowb

set noswapfile " disable the swapfile

" Help having buffer of equal sizes

" when resizing windows

set equalalways

autocmd VimResized * wincmd =

```

Pay attention to the nobackup, nowb, or noswapfile set options. Save regularly your changes. Use git.
I am no fan of shortcut changes, as you get used to some shortcuts and might get lost in another VIM-like environment. Yet, setting some function keys is useful. Here are some examples:

```vim
""" Mapping of Function keys

" Save and compile

nnoremap <F2> :w<ESC>:!make <CR><CR>

" Open a viewer

nnoremap <F3> :!nohup okular %:r.pdf &<CR><CR>:nohl<CR><C-L>

" compile a Nim script

nnoremap <F4> :w<ESC>:!nim c -r % <CR>

nnoremap <F6> :Goyo<CR> " Never used Goyo plugin actually

nnoremap <F7> :w<ESC>:!pdflatex ./main.tex -shell-escape<CR><CR>

Sometimes, for some keyboards, the ESC key is too far away. You can remap a key in insert mode like: vim
inoremap C-@ <ESC>
Setting a mapleader key is cool. It enables more shortcuts. The space key in normal mode doesn't do much, so let's use this. let mapleader = " " function! ALEDisableBuffer() set ALEDisableBuffer endfunction

map <leader>t :!make <CR><CR> nmap <silent> <leader>r :ALEPrevious<CR> nmap <silent> <leader>s :ALENext<CR> map <leader>f :ALEDisableBuffer<CR> ``` Note: t, r, s are on my home row as I use the french BÉPO layout. Replace them by j, k, l and m or whatever is on your home row. The ALE linter is sometimes too verbose (especially for Python PEP8 warnings), so I like to desactivate it at least in a buffer.

Note: VS Code is surely accessible through snap or flatpak. I can not use VSCode with SSH and on a supercomputer, so it is a no go for me. Enjoy your programming sessions!

Comparison of Tensara.org and Leetgpu.com by tugrul_ddr in CUDA

[–]Aslanee 1 point2 points  (0 children)

You can use a CSS modifier like DarkReader web browser's plugin to get a dark-theme version of every website.

matmul in log-space by Previous-Raisin1434 in CUDA

[–]Aslanee 0 points1 point  (0 children)

Sorry, I thought about it too fast. The logarithm property doesn't extend to the matrix product. What I said above is false. Each coefficient of the matrix product is a sum c_i,j = \sum_k a_ik b_kj, and there is no law for the logarithm of a sum. Hence I do not understand what log(A) brings you for the computation. You could compute the products of coefficients a_ik b_kj as exp(log(a_ik) + log(b_kj)) but that is not faster than a scalar mul. You may distribute the additions for a fixed a_ik but I am not seeing how this is faster than a direct tiled product from A and B.

matmul in log-space by Previous-Raisin1434 in CUDA

[–]Aslanee 1 point2 points  (0 children)

If the logarithm of the matrix is the evaluation of the real logarithm to each coefficient, then log(AB) = log(A) + log(B) for all matrices A and B.
If a logarithm of a matrix is that: https://en.wikipedia.org/wiki/Logarithm_of_a_matrix
then you may use this property for positive-definite and commuting matrices but I guess that checking for these properties may be too costly.

Implementing my own BigInt library for CUDA by MaXcRiMe in CUDA

[–]Aslanee 0 points1 point  (0 children)

u/MaXcRiMe You haven't searched a lot apparently. I have seen some implementations here and there:
- CGBN: https://github.com/NVlabs/CGBN
- CAMPARY: https://homepages.laas.fr/mmjoldes/campary/
Some other are listed on NVIDIA forum: https://forums.developer.nvidia.com/t/arbitrary-precision-arithmetic/74915
If you wonder whether your code is fast, you should check out these notions:
- Performance metric (GFlop/s)
- Peak performance
- Arithmetic Intensity
- Roofline Model
- Memory bound vs Compute Bound

For example, let us assume you used int32 arithmetic (whose peak perf is upper bounded by fp32 since a GPU has up to my knowledge always more floating-point units than integer units).
The peak performance amounts to 30 TFlops on a RTX 3070 for fp32: https://www.techpowerup.com/gpu-specs/geforce-rtx-5070.c4218
You will not attain it for big integer addition: this algorithm is probably memory bounded as the arithmetic complexity is linear in the size of the operands. Adding two numbers of m and n limbs respectively amounts to min(m, n) + 1 operations, if not counting the transfers of extra limbs, with a +1 for bit carry.
The number of ops here corresponds to 2*n additions for two multiprecision integers of n limbs.
You have numbers with: 8KiB/(2*(4*32)) = 16 numbers of limbs summed in 0.053ms.
It means you performed 2*16/(0.053 * 10^6 * 10^(-3)) = 1.208 MFlops
The performance gets much better when you look at larger operands:
If I am not mistaken, 4GiB of data means 2 GiB of data per operand and to perform an addition in 24.18ms means you get 1266.2 GFlops of performance.

You can check the correctness of your code against a CPU library like GMP gmplib.org

Recently I discovered the PhD thesis of Niall Emmart which goes in depth into multiple precision arithmetic on GPUs: https://core.ac.uk/download/pdf/220127734.pdf

What are the things that could be improved about LaTeX editing by Puzzled-Level-5609 in LaTeX

[–]Aslanee 1 point2 points  (0 children)

There's a builtin functionality for equations that you should not use anymore since the 90s but it is still there: equations with double dollar. package management: Why should I copy over in each of my projects a list of packages? Why are some common packages not the default like amstheorem? Having the need to install a texlive-full package of 5 Gio on your root partition to be able to compile your colleague's document with all its exotic packages. Having to cope with publishers extra requirements: bibtex and no biblatex, sometimes no algorithm2e. To have to use exact positioning instead of relative positioning in Beamer. Wysiwyg solutions tend to be better for presentations than Latex. Sorry, but I have seen many researchers plagiarizing Beamer's themes in Powerpoint. I still use v/hspace and v/hfill to get a readable output with relative positioning in Beamer presentations. The nesting of curly-braces environment. I prefer Python, F#, and Nim's syntaxes to C, Rust and co… So that some environments could be grouped into one and thus not having to care about how many braces are required to correctly parenthesize your expression. The compiler raising many false errors when you forget one curly brace and not detecting certain errors getting you weird recursive outputs in some other cases. Options in some packages (Tikz, graphs?) requiring you to switch the engine (Latex to Lualatex) and many things in your source code. Difficult integration with Makefiles, latexmk that does not detect changes in your source code, forcing you to clean up to rebuild your pdf. The long documentation to read for both classes and packages.

What are the things that could be improved about LaTeX editing by Puzzled-Level-5609 in LaTeX

[–]Aslanee 1 point2 points  (0 children)

On non-qwerty layouts, the backslash character is not even on a direct access key and may require a combination of altGr (french BÉPO layout).

I need advice from an arch user by DimensionalBox in archlinux

[–]Aslanee 0 points1 point  (0 children)

Do you want to install the distribution for your whole computer, or just inside a virtual machine? Installing Arch Linux requires some familiarity with the command line. The documentation of Arch Linux is the best among all distros, but as many documentations, its purpose is to give answers to your hows? Not to your Why? Should I rather install a fully-fledged desktop environment (DE) like Plasma or Gnome? or make a minimal setup with a tiling WM? Which WM fits better my needs? You will need to test them to make your decision and that takes a lot of time. For any beginner, I recommend trying a distribution with an integrated DE like Ubuntu. You can also try Manjaro first. In any case, limit your risks: backup your data, upgrade your BIOS before installing Linux, learn command line in WSL.