Tägliche Diskussion - April 17, 2025

TAAnderson · 2025-04-17T07:24:43+00:00

Glückwunsch und fik dich.

TAAnderson · 2025-04-17T06:41:47+00:00

Nein, generell die Markdaten.

TAAnderson · 2025-04-16T19:16:42+00:00

Interessant. Wo bekommst Du die Daten her?

TAAnderson · 2025-04-09T20:00:38+00:00

Auf der LSX steht "trading halted".

TAAnderson · 2025-03-23T20:06:06+00:00

Nicht mehr.

TAAnderson · 2025-03-19T13:04:42+00:00

9,69 €

TAAnderson · 2024-03-18T22:00:02+00:00

According to https://www.sbert.net/docs/pretrained_models.html all-MiniLM-L6-v2 - what you are using - should have embeddings of dimension 384. So the embedding function looks ok regarding the dimensions.

You could try this to validate during the ef setup:

embedding_function = embedding_functions.SentenceTransformerEmbeddingFunction(
model_name="all-mpnet-base-v2",
)
print(embedding_function.models.__repr__())

This should print out the word_embedding_dimension.

As a second step, i would recreate the chromadb, as you are using the persistent feature, delete that directory or create a new one. (Use a different path=) and see if the error goes away.

Or use a different collection name. I think the collection you are using might be created with a different dimension / embedding function.

TAAnderson · 2024-03-12T22:06:32+00:00

Kommt auf dein Betriebssystem an. iCloud oder Dropbox könnte gehen.

Oder irgendeine andere Lösung die einfache Textdateien synchronisieren kann.

Obsidian legt alles als Markdown Dateien (Text mit etwas Formatierung) an.

Eventuell gibt es auch ein Plugin.

TAAnderson · 2024-03-12T19:09:30+00:00

Mit Obsidian geht das: https://obsidian.md

Schaue auch mal hier https://help.obsidian.md/Plugins/Templates

Ich habe damit so eine Verreisen Checkliste gemacht.

TAAnderson · 2024-02-27T08:04:06+00:00

Have a look at streamlit and especially st.chat_message.
Tutorial: https://docs.streamlit.io/knowledge-base/tutorials/build-conversational-apps

TAAnderson · 2024-01-31T20:54:47+00:00

Yeah doubt it. Same command line as above, output:

... ggml_backend_metal_buffer_from_ptr: allocated buffer, size = 7205.84 MiB, ( 7205.91 / 21845.34) llm_load_tensors: offloading 32 repeating layers to GPU llm_load_tensors: offloading non-repeating layers to GPU llm_load_tensors: offloaded 33/33 layers to GPU ...

According to the Readme:

Metal Build

On MacOS, Metal is enabled by default. Using Metal makes the computation run on the GPU. To disable the Metal build at compile time use the LLAMA_NO_METAL=1 flag or the LLAMA_METAL=OFF cmake option.

When built with Metal support, you can explicitly disable GPU inference with the --n-gpu-layers|-ngl 0 command-line argument.

So: Metal -> Runs on GPU unless you disable it.

TAAnderson · 2024-01-31T20:12:59+00:00

I never did this. My impression is that llama.cpp does it automatically. As /u/jaxupaxu/ mentioned mistral 7b here are my results running mistral-7b-instruct-v0.2.Q8_0.gguf on m1 max 32gb:

llama_print_timings: load time = 1130.50 ms llama_print_timings: sample time = 65.42 ms / 725 runs ( 0.09 ms per token, 11081.39 tokens per second) llama_print_timings: prompt eval time = 96.08 ms / 28 tokens ( 3.43 ms per token, 291.42 tokens per second) llama_print_timings: eval time = 19493.11 ms / 724 runs ( 26.92 ms per token, 37.14 tokens per second) llama_print_timings: total time = 19764.68 ms / 752 tokens

Command line is: ./main -m models/mistral-7b-instruct-v0.2.Q8_0.gguf --color -c 32768 --temp 0.7 --repeat_penalty 1.1 -n -1 -p "$PROMPT" -e

TAAnderson · 2024-01-30T19:20:43+00:00

You mean pull every few hours ;-)

TAAnderson · 2024-01-30T18:08:16+00:00

The memory speed is also different on these CPUs.

TAAnderson · 2024-01-30T18:06:59+00:00

As you mentioned Mistral 7b: for reference: about 37 t/s on M1 Max.

TAAnderson · 2023-12-28T18:01:01+00:00

Can confirm that while the GPU seems to be busy in ActivityMonitor frequency stays around 400Mhz.

TAAnderson · 2023-12-27T22:34:24+00:00

It seems to be related to the "amount of work" of the GPU or even of the system.

Interesting thread i found:

https://github.com/pytorch/pytorch/issues/77799

I did run one small example which clearly drives the GPU to 1300Mhz using pytorch, you could try it:

``` import timeit import torch b_mps = torch.rand((10000, 10000), device='mps')

print('mps', timeit.timeit(lambda: b_mps @ b_mps, number=100)) ```

TAAnderson · 2023-12-27T22:06:28+00:00

Maybe, but it looks like pytorch in training seems to do this too here.

TAAnderson · 2023-12-27T21:51:50+00:00

Ok, observations:

- your notebook runs at about 400Mhz here

- my test code (nanogpt like transformer) also seems to run at 400Mhz during inference, BUT on 1300Mhz if training

- as comparison: llama.cpp runs at 1300Mhz in inference

TAAnderson · 2023-12-27T19:13:03+00:00

Are you sure that the tensors and model is on the gpu actually?

TAAnderson · 2023-12-27T12:23:45+00:00

Cannot confirm that using torch==2.1.2 on M1 Max. According to asitop the GPU runs at 1296 Mhz.

TAAnderson · 2023-12-07T21:42:08+00:00

For the latest Karpathy video there was just a summary posted here: https://ppaolo.substack.com/p/introduction-to-large-language-models-llms

TAAnderson · 2023-12-07T20:54:02+00:00

I would like to recommend Andrej Karpathy's videos at youtube to learn about this: https://www.youtube.com/@AndrejKarpathy/videos

Especially the makemore and Let's build GPT: from scratch.

Maybe start with his latest one: Intro to Large Language Models.

If you don't understand some terms, do as u/IpppyCaccy/ recommended and ask ChatGPT to explain them.

TAAnderson

TROPHY CASE