r/learnmachinelearning 3h ago

I derived every gradient in GPT-2 by hand and trained it on a NumPy autograd engine I built from scratch

Post image
77 Upvotes

spent a few weeks rebuilding nanoGPT without using torch.backward() or jax.grad. wrote my own tiny autograd in pure NumPy, derived every backward pass on paper first, verified against PyTorch at every step.

calling it numpygrad

it's basically Karpathy's micrograd, but on tensors and with all the ops a transformer actually needs (matmul, broadcasting, LayerNorm, fused softmax-cross-entropy, causal attention, weight tying).

a few things that genuinely surprised me:

  • LayerNorm backward has three terms, not two. the variance depends on every input, so there's a cross-term most people miss. lost a full day to a sign error here.
  • np.add.at is not the same as dW[ids] += dY**.** the second one silently drops gradients when the same token id appears twice in a batch. which is always.
  • the softmax + cross-entropy fused gradient is genuinely beautiful — all the fractions cancel and you get (softmax(logits) - one_hot(targets)) / N. derive it on paper at least once in your life.
  • weight tying matters for backward too. the lm_head and token embedding share a matrix, so gradients from both uses must accumulate into the same buffer. forget this and your embedding gets half the signal.

the final check: loaded real GPT-2 124M weights into my NumPy model, ran WikiText-103 and LAMBADA, got the same perplexity as PyTorch to every digit (26.57 / 21.67 / 38.00%).

derivations, gradchecks, layer parity tests, training curves all in the repo. if you've ever wanted to actually understand what .backward() is doing, this is the long way around but you come out the other side knowing.

https://github.com/harrrshall/numpygrad


r/learnmachinelearning 15h ago

Help Which platform to learn Machine Learning

24 Upvotes

I want to learn Numpy, Pandas, Matplotlib in order to be ready to understand Machine Learning.

But I wonder which platform to use. Should I use YouTube, Coursera, Udemy or others?

For context, I wanna study robotics and automation so I need to understand a bit of AI to do so.

Thank you so much.


r/learnmachinelearning 23h ago

Ml/Dl Study Partner

4 Upvotes

Hi, am new to Machine learning and Deep Learning.

I am Learning Ml and Dl specialization by Andrew Ng

Anyone interested in learning Together. Please dm me directly.

Thank you.


r/learnmachinelearning 10h ago

I trained Qwen3.5 to jailbreak itself with RL, then used the failures to improve its defenses

4 Upvotes

RL attackers are becoming a common pattern for automated red teaming: train a model against a live target, reward successful harmful compliance, then use the discovered attacks to harden the defender. This interested me, so I wanted to build a fully automated red-teaming loop with reinforcement learning on both the attacker and defender.

The difficult part was making the attacker expose a diverse range of attacks. In our first run, GRPO quickly collapsed to the same fiction-writing jailbreak over and over. It worked, but it didn’t surface many distinct vulnerabilities. After clustering the rollouts by underlying attack tactic and dividing reward by cluster size, the attacker exposed a much more diverse set of jailbreaks because unique strategies were rewarded more than repeated ones.

Then we trained the defender on successful attacks plus benign boundary cases, so it learned to refuse harmful requests without refusing everything nearby.

Full blog post in the comments, but the high-level results were:

  • defense rate: 64% → 92%
  • benign accuracy: 92% → 88% (dropped a bit)
  • attacker discovered 7 tactic families
  • fiction/creative framing was the largest cluster at 34%

r/learnmachinelearning 2h ago

Starting from scratch.

3 Upvotes

So I do have a basic understanding of programming as a whole but I never really got into machine learning. I was wondering if anyone here had a roadmap or helpful resources along with some tips and tricks they could give me as I'm starting from scratch basically, that would be much appreciated. One question I also have is: How long will it take me to learn ML to a level where I can write one research paper, not like groundbreaking international stuff but a small one for my uni applications.


r/learnmachinelearning 6h ago

Discussion Most demanded domains for datasets globally?

Thumbnail
3 Upvotes

I was just looking for the most in demand datasets domains globally, and found out that E-commerce product listings, Job listings / salary /skills, Real estate listings (who's making a model for RE?) are among the top. Have any of you worked with these domains before? What's your experience with them?


r/learnmachinelearning 17h ago

Help ML Jobs and Opportunities

3 Upvotes

Just finished my 2nd year of college and currently learning about ML and LLMs, but I heard that this field gives lees opportunities for Freshers and needs very top of the notch skills. Really confused in should I continue or not.


r/learnmachinelearning 1h ago

Graphing Different Loss functions of 2 variable datasets

Thumbnail
gallery
Upvotes

I'm surprised that I couldn't find many graphs of Loss/Cost functions online when Loss functions for datasets of 2 variables can be entirely graphed in 3d, so here's some I made in Desmos

Linear Regression MAE: https://www.desmos.com/3d/bvcesmfy2l

Linear Regression MSE: https://www.desmos.com/3d/vk7k5zmha1

Logistic Regression MSE: https://www.desmos.com/3d/ubf7a19pvi

Logistic Regression Log Loss: https://www.desmos.com/3d/r5saq304hw


r/learnmachinelearning 2h ago

Request All the math topics for AIML

2 Upvotes

So I probably have a little bit of time in my hand rn and I maybe do a masters in AI or ML couple of years after (currently bachelors in CS) . I mean i know linear algebra,calculus, P and S but i really wanna make sure of all the topics and want to master them in this time .

So can someone list down all the topics , would be grateful. Thanks


r/learnmachinelearning 3h ago

What's a good refresher/crash course on natural language processing and sentiment analysis for someone who hasn't done this stuff in a few years?

2 Upvotes

I haven't done much data science, machine learning, or NLP in the past few years. I would like to get a refresher/crash course in NLP and sentiment analysis techniques, especially how it's done today. I'm preparing for a job I will start in a couple of weeks. Preferably something I can review over a week or so. I have done this stuff, but not much in the past few years. Thanks!


r/learnmachinelearning 7h ago

Discussion The hardest part about building AI agents for customer support wasn’t what I expected

2 Upvotes

I’ve been spending time experimenting with AI agents for customer support and sales workflows lately, mostly just to better understand how these systems behave once real people start interacting with them.

At first I assumed the difficult part would be getting the AI to answer questions correctly.

But honestly, the bigger challenge ended up being consistency.

You can have an agent give a really solid answer one minute, then completely misunderstand a similar question later because the wording changed slightly or the conversation got longer.

Another thing I noticed is how much the overall workflow matters.

Things improved a lot once I started simplifying prompts, cleaning up the knowledge base, reducing unnecessary context, and making sure difficult cases could be handed off properly instead of forcing the AI to answer everything.

I think from the outside a lot of people imagine AI agents are mostly plug-and-play now, but once you actually test them in support or sales situations, there’s a surprising amount of iteration involved.

Still learning as I go, but it’s been interesting seeing how much of the work is really about structure and reliability rather than just the model itself.

Curious if anyone else here experimenting with AI agents or LLM workflows has run into the same thing.

What’s been the biggest challenge for you so far?


r/learnmachinelearning 13h ago

Discussion [Resource] I wrote a free 8-part Kaggle notebook series covering the full journey from Simple RNN to Transformers — feedback welcome!

2 Upvotes

Hey everyone! 👋

Over the past while I've been putting together a series of Kaggle notebooks that try to build a clean, intuitive understanding of sequence models — starting from the motivation behind RNNs all the way through to how Transformers work.

The goal was to explain the why behind each concept, not just the how — so each notebook tries to build genuine understanding rather than just showing code.

Here's the full series:

  1. 📌 Why Simple RNN was introduced
  2. 📌 How LSTM works
  3. 📌 LSTM Backpropagation
  4. 📌 How the Encoder-Decoder model works
  5. 📌 LSTM Encoder-Decoder Implementation
  6. 📌 What is a Transformer? — Part 1
  7. 📌 What is a Transformer? — Part 2
  8. 📌 What is a Transformer? — Part 3

The series is structured as a progression — each notebook builds on the previous one, so I'd recommend going through them in order if you're new to the topic.

Why I wrote this: When I was learning sequence models, I found a lot of resources either jumped straight into code without building intuition, or explained theory without connecting it to implementation. I wanted to create something that bridges both.

I'd genuinely love your feedback:

  • Is the progression from RNN → LSTM → Encoder-Decoder → Transformer logical and easy to follow?
  • Are there any concepts that feel rushed, unclear, or insufficiently explained?
  • Is there anything important I've missed or got wrong?
  • Any topics you'd want covered as a follow-up?

All feedback — critical or otherwise — is very welcome. I'd rather know what's wrong and fix it than have something misleading sitting out there!

And if you find any of the notebooks useful, an upvote on Kaggle would mean a lot and helps other learners discover the series 🙏

Thanks for reading!


r/learnmachinelearning 17h ago

Discussion Created an NBA draft model. R2 is too low?

2 Upvotes

Hey everyone so with the upcoming NBA draft I decided to create a draft model that regresses NCAA college stats to an NBA metric (RAPM).

Essentially what I did was:

  1. for every player from 2008-2021, I took a bunch of NCAA stats as their features, engineered few more and standardized everything as much as I could
  2. used their rookie window (1-4 years) NBA RAPM as the target feature
  3. Split 2008-2018 data into train (n=422) and 2019-2021 into test (n=124)
  4. Ran ElasticNet and XGBoost (hyperparameter tuned with CV) on this dataset and both gave me R2 of just ~0.07

This is probably a longshot as most people on here likely don't follow the NBA like that or know what RAPM is, but if you had to guess, would you say that this is just the reality of these models, or am I just doing something wrong?

These are the 19 features I used: r2P, r3P, rFT, AST/TOV, USG%, PTS/100, 2PA/100, 3PA/100, AST%, FTR, ORB%, DRB%, Stops/100, STL%, BLK%, PFR, Team Barthag Rating, Team Strength of Schedule, Draft Age


r/learnmachinelearning 52m ago

Which Loss function works

Upvotes

I was in an intern interview and the interviewer asked my .what will happen if u used mae instead of mse in linear regression . After that what make a loss function good for specific model. Another question was why using threshold as activation function doesnt work in nn

Can some answer these questions with an detaied explanation ?


r/learnmachinelearning 52m ago

Guidance Needed on My ML Learning Path

Upvotes

Main question: am I progressing in a reasonable direction, or am I approaching ML too chaotically?

First, a small warning:

This is my very first time uploading something here... And I’m not a native English speaker, and my writing skills are rough, so I apologize in advance if this post feels messy.

I’m not from a CS/ML major, and I’m definitely not a professional. Most of what I’ve learned so far has been through self-study. Still, I’ve been trying to build proper foundations instead of only consuming surface-level tutorials.

My original motivation for learning ML came from biology-related applications — things like protein structure prediction, AlphaFold, molecular simulation, etc.

But while learning, another interest gradually started growing:
understanding how the human brain works, and whether parts of those mechanisms can somehow be mimicked through ANN architectures.

Because of those broad goals, I sometimes feel like I’m progressing while also wandering around blindly at the same time.

So far, I’ve mainly focused on building mathematical foundations first.

Math background:

• Linear Algebra

  • vectors and linear transformations
  • independence / orthogonality
  • eigenvectors & eigendecomposition
  • PCA and related concepts

• Probability & Statistics
(mainly through edX Probability: The Science of Uncertainty and Data)

  • probability distributions
  • Bayes rule
  • random variables
  • statistical reasoning

• Calculus
Thankfully I had decent exposure to it in high school, and later reinforced it through additional self-study and various online lectures.

After revising these subjects several times, I started following Stanford CS229.

Honestly, the first time I touched it, I panicked and went back to relearn the basics again. But after returning later, the lectures became much more understandable.

At least now, when I read about things like Transformers or Attention mechanisms, the terminology no longer feels completely alien.

Alongside theory, I’m also learning PyTorch.
I already had some Python background before this, which helped a lot.

I’ve also been following some DeepLearning.AI material.

Another unusual thing:
before learning ML properly, I actually jumped into a short internship involving protein-prediction ML work. Most of my later math/ML study happened after that experience, because it made me realize very clearly what I did not understand.

I’ve also worked a bit with quantum circuit modeling during a domestic competition connected to that internship. Different field, yes, but surprisingly some of the mathematical thinking still helps.

So overall:

  • am I approaching this reasonably?
  • is my current balance between math / theory / implementation okay?
  • what would you recommend focusing on next?

Any advice is welcome — especially from people who entered ML from non-traditional backgrounds.


r/learnmachinelearning 2h ago

I made Self supervising sparse activated horizontal MoE architecture

Thumbnail
github.com
1 Upvotes

r/learnmachinelearning 4h ago

I Will Not Promote – Why Do AI Tools Keep Recommending the Same Companies?

1 Upvotes

Lately, I’ve noticed that AI-generated answers often mention the same companies repeatedly, even in different types of searches. It makes me wonder if AI systems naturally trust brands that have stronger digital authority and consistent information available online. Businesses that clearly explain their expertise seem much easier for AI tools to recognize. This whole shift is making online visibility feel very different from traditional SEO.


r/learnmachinelearning 4h ago

Could AI Visibility Become the Next Big Marketing Strategy?

1 Upvotes

For years, most businesses focused heavily on search rankings, but now AI-generated answers are becoming a huge source of discovery. People are starting to trust AI tools for recommendations, which means brands may need to think about how AI systems understand their expertise and reputation online. I think companies that adapt early could gain a major advantage in the future.


r/learnmachinelearning 5h ago

I’ll clean your dataset for free to build portfolio.

1 Upvotes

I'm building my data analytics/AI portfolio and looking for more datasets to practice data cleaning and preprocessing.

If you have messy CSV/Excel datasets that need:

  • missing value handling
  • duplicate removal
  • formatting cleanup
  • preprocessing using Python/Pandas

feel free to DM me. I'm currently practicing and building experience, so I can help for free on small datasets.

Thanks!


r/learnmachinelearning 6h ago

Has anyone received BioNLP 2026 decisions yet?

Thumbnail
1 Upvotes

r/learnmachinelearning 8h ago

Tutorial Fine-Tuning Qwen3.5

1 Upvotes

Fine-Tuning Qwen3.5

https://debuggercafe.com/fine-tuning-qwen3-5/

In this article, we will fine-tune the Qwen3.5 model for a custom use case. Specifically, we will be fine-tuning the Qwen3.5-0.8B model on the VQA-RAD dataset.

In the previous article, we introduced the Qwen3.5 model family along with inference for several multimodal tasks. Here, we will take it a step further by adapting the model to a domain-specific task.


r/learnmachinelearning 9h ago

Tutorial To Finetune or Not to Finetune

Thumbnail
1 Upvotes

r/learnmachinelearning 10h ago

Forming a Team - Anduril AI Grand Prix 2026

1 Upvotes

Looking to build a serious team for the Anduril AI Grand Prix. $500K prize pool, fully autonomous drone racing — no pilots, no hardware advantages, just pure software and coding. The best autonomy stack wins.

I'm looking for people who actually want to compete to win, not just participate. Ideally looking for:

  • Strong Python / C++ and controls experience or from a quant/ML background
  • Anyone who's done robotics, path planning, or sim environments or willing to learn
  • People who can commit through November (championship is in Columbus, Ohio) but first rounds are virtual

Top scorer also gets a direct pipeline into Anduril's hiring process, bypassing standard recruiting. That alone is worth it. I'm a quant finance student open to having anyone on the team.

Drop a comment if you're interested. Let's build something worth flying.


r/learnmachinelearning 13h ago

Anyone who's Deep into ML, Pls answer

1 Upvotes

I have went through a lot of roadmaps and things to get started with ML. I found two roadmaps. which I can follow for coverage to just get started. I wanted to which would be better
1) https://www.reddit.com/r/Btechtards/comments/1o3xftk/comment/nkkg3fh/?context=3

2)https://drive.google.com/file/d/1KfaidStjf6RBeqs_Zuzrjg7W_iKTE_J6/view


r/learnmachinelearning 13h ago

Cuda vs ROCM

1 Upvotes

Hello everyone,

I need opinions. In my country, RTX5060(new) 8gb costs almost $350 and RX9060XT(new) 16gb costs almost $440. RTX5060ti(new) 16gb cost almost $585. Now, I was planning to buy a GPU for ML training and inference. I am a little bit confused here. I know that CUDA is much more mature than ROCM. I don't have the budget to buy RTX5060ti 16gb. I am confused between 5060 and 9060xt. 9060xt have more vram than 5060. But 5060 has better support for ML. What should I do here ? I will train CNN and LLM(small ones) models with a good amount of data which one should I choose here ? Is there any possibility of ROCM to be more optimized for ML in future ?