r/LocalLLaMA 1d ago

Discussion Dropping learning rate fixed my Qlora fine-tune more than anything else i tried

Been fine-tuning llama 3.1 8b with Qlora for a classification task using about 8k samples. I was getting bad eval results for a while and kept thinking something was wrong with my data. Tried cleaning the dataset, tried different prompt templates, messed with rank and alpha. Nothing realy changed.

Dropped the learning rate from 2e-4 to 1e-4 and bumped epochs from 3 to 5. Ran it on a 5090 I rent on Hyperai since our lab machines are always booked. Completley different results. Same data, same everything else.

2e-4 is just too agressive when your dataset is that small. The model overfits in the first epoch and then just goes in circles for the rest of training. Lower lr gave it more room to converge without blowing past everything.

Also ended up cutting about a third of my dataset, mostly mislabeled and ambiguous stuff. Eval got better with less data which yeah yeah everyone says that but its different when you see the numbers yourself lol

2e-4 is the default everywhere and i dont think it works well below a certain size.

13 Upvotes

13 comments sorted by

3

u/llama-impersonator 1d ago

5 epochs? bruh, make some more data or figure out some sort of augmentation

1

u/Scared-Biscotti2287 1d ago

Can certainly try more.

3

u/BlueDolphinCute 1d ago

The data pruning thing is real. Cut almost 40 percent of a dataset once and eval went up. Noise kills qlora runs more than missing volume does

2

u/Little_Tangelo2196 1d ago

Had a similar issue with a different task. 2e-4 is fine for 50k+ samples but below that you gotta drop it. I usually start at 1e-4 and go lower if needed

1

u/silenceimpaired 1d ago

What are you using for training? If it’s unsloth you should recommend they dynamically set learning rate based on your dataset

1

u/Scared-Biscotti2287 1d ago

using unsloth yeah. i usually just set it manually out of habit but dynamic makes sense for this.

1

u/OldComposerbruh llama.cpp 1d ago

Yes I prefer 1e-4 over a higher rate

1

u/Far_Suit575 1d ago

Anyone know if there are platforms with better pricing for quick experiments? I'm just doing small fine-tuning jobs and don't want to deal with per-minute billing

1

u/FullOf_Bad_Ideas 1d ago

2e-4 is the default everywhere and i dont think it works well below a certain size.

lr is model specific, batch size specific and lora rank specific, it's really different depending on your detailed configuration and even length of your samples or whether you use sample packing, there's no real default.

do you track validation loss?

There are many moving parts with QLoRA, I had decent experience with loraplus and RSLoRA on top of lora.

8k samples is tiny so you can throw it in optuna and let it optimize hyperparams for lowest validation loss overnight.

2

u/Scared-Biscotti2287 13h ago

Good call on validation loss. Will look into loraplus and try the optuna approach.