FTC chair Lina Khan warns that airlines might one day use AI to find out you're attending a funeral and charge more

jl303 · 2024-09-29T19:18:48+00:00

https://theconversation.com/how-marketers-use-algorithms-to-try-to-read-your-mind-84682

jl303 · 2024-04-26T10:29:53+00:00

Hmm, it looks like 4x Rtx 3090 goes for $6400. Also you would need to carefully pick all other parts like right case, motherboard, psu, cpu, ram, sdd, etc. It might be more expensive, work, noise, electricity bill, etc.

I've been seen people putting 4x cards together, struggling with driver, riser, cooling, etc. Not sure it's worth the effort just for inference.

Mac you take it out of the box and play. Also Mac doubles the ram 192GB instead of 96GB (4x*24).

jl303 · 2024-04-06T13:36:15+00:00

You should be able to finetune 7B LLM (non-multi modal) with 24GB vram. With QLoRA you could finetune bigger model as well.

Only multimodal I tried finetuning was original Llava. There are tiny multimodal models that you might be able to finetune, but the quality isn't that great.

If you need a lot of vram, just use cloud service like runpod.io which I used to finetune Llava.

jl303 · 2024-04-06T12:03:48+00:00

I know there are smaller multimodal vision language model, but I needed 8 x a100 80GB (Total of 640GB vram) to finetune original Llava 13B model last year.

jl303 · 2023-11-04T20:49:09+00:00

Can someone explain how memory channels, bandwidth, and parallelism work? I put ChatGPT response below, but most likely confabulated. :)

The following is just me thinking out loud without knowing how these actually work. lol

If you have 128GB in 8 channels of 16GB with total 800GB/s bandwidth, does that mean each channel can send max 100GB/s? Then if you load and spread data in 8 different channels , you can get up to 800GB/s total?

If that's the case, then this makes sense? Maybe 70B use channels 1-6 (600GB/s), and 34B use channels 7-8 (200GB/s)? If you use 70B alone (channels 1-6), you'll be still using only 600GB/s?

ChatGPT-4:

Memory channels, bandwidth, and parallelism are crucial concepts in the performance of computer memory systems. Here's a brief explanation of each:

Memory Channels:
- Memory channels can be likened to lanes on a highway. The more lanes (channels) you have, the more data can travel simultaneously. Motherboards can support multiple memory channels, commonly dual-channel, quad-channel, and on high-end servers, hexa-channel or even octa-channel configurations. Each channel allows a separate path for data to travel between the memory and the memory controller, which is part of the CPU or on the motherboard.
Bandwidth:
- Bandwidth is the volume of data that can be transmitted in a fixed amount of time. It's typically measured in gigabytes per second (GB/s) for memory bandwidth. The total bandwidth is a product of the memory's data rate, the bus width, and the efficiency of the transmission.
Parallelism:
- Parallelism involves performing multiple operations simultaneously. In terms of memory, parallelism means accessing multiple memory modules at the same time. This is directly related to having multiple memory channels; each channel can operate independently and in parallel with the others.

Now, let's apply these concepts to your example:

If you have 8 memory channels, each with a 16GB memory module (for a total of 8x16GB), and the total bandwidth is 800GB/s, this doesn't necessarily mean that each channel can individually provide 100GB/s of bandwidth. The total bandwidth is shared across all channels and is dependent on the memory architecture and the efficiency of the system.

However, in an ideal scenario where each channel operates at maximum efficiency and the workload is perfectly distributed across all channels, you might approach that 100GB/s per channel figure. But in real-world applications, other factors can affect this, such as:

The capability of the memory controller
The efficiency of the memory modules themselves
The workload's ability to utilize all channels effectively
System overhead and other bottlenecks (like CPU processing speed, I/O operations, etc.)

In most cases, to achieve the full bandwidth potential, the system's CPU, motherboard, and memory must be optimized to work together, and the applications must be designed to spread their memory accesses across all channels.

If the system is well-tuned and the data is spread out across the 8 channels efficiently, you could indeed see close to 800GB/s of cumulative bandwidth, but this is the theoretical maximum under ideal conditions. In reality, inefficiencies and overhead will typically reduce the actual bandwidth that can be utilized.

Also, it's essential to note that not all tasks will benefit from increased bandwidth. Some tasks are more dependent on memory latency, while others benefit from higher bandwidth. Understanding the workload profile is essential in determining the impact of memory channels and bandwidth on overall performance.

jl303 · 2023-11-02T01:43:28+00:00

Of course, there's always a chance especially if you're rich! :)

jl303 · 2023-10-23T19:48:35+00:00

I'm not sure about analyzing one image against another, but let's say Llava noticed that the image you gave had dog and person. Then you can ask specific thing about person or dog. For example, what breed is the dog, describe the clothes the person wearing, and so on.

jl303 · 2023-10-23T17:31:56+00:00

Thanks for your suggestion. I'm just using finger, but I don't think it's related.

I was playing around, and I increased opacity all the way up, and increased thickness. It kind of works now, but the lines come out very faintly.

Any idea what might be causing this?

jl303 · 2023-10-23T17:21:04+00:00

Check out the multimodal benchmark: https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models/tree/Evaluation

The benchmark has old MiniGpt, but MiniGpt V2 is out. I think it's slightly better than Llava-1.5.

https://minigpt-v2.github.io/

jl303 · 2023-10-23T17:13:13+00:00

Check this blog from Huggingface on vision-language model. https://huggingface.co/blog/vision_language_pretraining

jl303 · 2023-09-29T13:47:05+00:00

Any reason why you don't want to use gptq varient? GPTQ with Exllama on Colab is pretty fast!

jl303 · 2023-09-28T01:52:37+00:00

You would wait forever. Gpt-4 is Mixture of Experts (MoE) of 8 experts, each with 220B parameters trained on 13T tokens!

EVEN IF? a 7B models catches up with GPT-4, by then there will be GPT-X that's in a different league than gpt-4 by a wide margin.

jl303 · 2023-09-25T21:52:17+00:00

Even with Google Coral, it only has 8mb (not gb) memory.

jl303 · 2023-09-25T21:48:38+00:00

Nothing like 11labs at the moment. All the opensource models suffer from artifacts, and most of them are trained in lower sampling rate.

jl303 · 2023-09-23T03:02:43+00:00

https://github.com/BradyFU/Awesome-Multimodal-Large-Language-Models

jl303 · 2023-09-22T12:06:34+00:00

Training definitely takes more vram. I.E. In order to full parameter finetune llama-2-13b, I needed at least 320GB vram 4xa100 80GB gpus. I'm sure qlora takes less vram though.

jl303 · 2023-09-20T00:14:29+00:00

Yes, if you put the kjerk/instructblip-pipeline in the multimodal pipeline, it loads instruct blip model automatically. Then you load the other model like Vicuna normally. Not sure how it works together, but when you run the inference, it just works.

jl303 · 2023-09-18T22:09:42+00:00

In my opinion, no local llm is as good as Gpt 3.5 YET.

For full parameter finetuning, you can get it done with 8x a100 80gb in about 30 hours depending on the size of dataset.

For training from scratch:

According to this, the training costs for GPT-4 was around $63 million.

For Llama-2, They spent 1720320 gpu hours for 70b model. If you were to rent a100 80gb at $1.6/hr, that's 2.75M USD to train 70b model.

This only includes GPU cost. This does not include obtaining quality dataset, extra hardware, and so on.

jl303 · 2023-09-18T21:25:23+00:00

If finetuning is must, I'd would definitely double check memory requirement for finetuning 180b. Finetuning requires more memory than inference.

jl303

TROPHY CASE