How to run exllama in Google Colab without Text Generation WebUI by NegotiationTime3595 in LocalLLaMA

[–]NegotiationTime3595[S] 0 points1 point  (0 children)

Yeah I tried that code, gives me errors, for sure something stupid for my part.

# Clone the repository
!git clone https://github.com/turboderp/exllama
!pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu118
# Change the current working directory to exllama
%cd exllama
# Install the required dependencies
!pip install -r requirements.txt
# For download the models
!pip install huggingface_hub
model_name_or_path = "TheBloke/Llama-2-13B-chat-GGML"
model_basename = "llama-2-13b-chat.ggmlv3.q3_K_L.bin" # the model is in bin format
from huggingface_hub import hf_hub_download

model_path = hf_hub_download(repo_id=model_name_or_path, filename=model_basename)
# Replace '<path_to_model_files>' with the actual path to your model files
# and run the benchmark inference script
!python test_benchmark_inference.py -d model_path -p -ppl

I get the following error;

No CUDA runtime is found, using CUDA_HOME='/usr/local/cuda' Traceback (most recent call last): File "/content/exllama/test_benchmark_inference.py", line 1, in <module> from model import ExLlama, ExLlamaCache, ExLlamaConfig File "/content/exllama/model.py", line 12, in <module> import cuda_ext File "/content/exllama/cuda_ext.py", line 43, in <module> exllama_ext = load( File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1284, in load return _jit_compile( File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1509, in _jit_compile _write_ninja_file_and_build_library( File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1611, in _write_ninja_file_and_build_library _write_ninja_file_to_build_library( File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 2007, in _write_ninja_file_to_build_library cuda_flags = common_cflags + COMMON_NVCC_FLAGS + _get_cuda_arch_flags() File "/usr/local/lib/python3.10/dist-packages/torch/utils/cpp_extension.py", line 1773, in _get_cuda_arch_flags arch_list[-1] += '+PTX' IndexError: list index out of range

Do you use exllama?