saahityaedams

26 Dec 2024

llm.c rough notes

I - Getting the defaults working

  • Fix linux-tools broken install with sudo dpkg -i --force-overwrite /var/cache/apt/archives/linux-tools-common_*.deb and then sudo apt --fix-broken install

  • Get nvcc (CUDA compiler driver) with sudo apt install nvidia-cuda-toolkit

  • Initial command ./train_gpt2fp32cu doesn’t work due to OOM issue.

  • Decreasing batch size and sequence length doesn’t work ./train_gpt2fp32cu -b 1 -t 512

  • See GPU usage stats with nvidia-smi and sudo fuser -v /dev/nvidia* and kill process using the significant GPU resources with kill -9 <pid>. In my case, it was ollama.

  • Finally get it running with ./train_gpt2fp32cu -b 1 -t 512

II - Fine tuning with a dataset (likes and bookmarks from personal twitter)

  • Use Twitter Web Exporter tampermonkey script to export likes and bookmarks from twitter into a csv file.
  • Make folder called tweets in dev/data and move csv files to that folder
  • Get only tweet text and make it into a big formatted text file
    1. python -c "import csv; print('\n'.join(row[2] for row in csv.reader(open('twitter-Bookmarks-1724168766206.csv'))))" > bookmark.txt
    2. python -c "import csv; print('\n'.join(row[2] for row in csv.reader(open('twitter-Likes-1724178823579.csv'))))" > like.txt
    3. awk '{print $0 "\n"}' like.txt bookmark.txt > tweets.txt
  • Refactor tinyshakespeare.py into new file tweets.py to work on tweets
writing 32,768 tokens to /home/saahityaedams/workspace/llm.c/dev/data/tweets/tweets_val.bin (66,560 bytes) in the gpt-2 format
writing 247,119 tokens to /home/saahityaedams/workspace/llm.c/dev/data/tweets/tweets_train.bin (495,262 bytes) in the gpt-2 format
  • Run train_gpt2fp32cu with appropriate flags ie. ./train_gpt2fp32cu -b 1 -t 512 -i dev/data/tweets/tweets_train.bin -j dev/data/tweets/tweets_val.bin
  • Modify dev/eval/export_hf.py so that attn_implementation=eager in spin function. Also change the test prompt in same function to see new behaviour. Then run python dev/eval/export_hf.py -i gpt2_124M.bin -o gpt2_tweets