Avoid Overfitting: A Practical Guide to CS7641 Machine Learning (2024)

Table of Contents:

  • Important note: Do not lynch me.

  • Let's Clear The Air: Develop this mindset before you enter this course

  • The Need For Analysis: what analysis? Let me CODE!!

  • Assignment 1 | Supervised Learning

  • Assignment 2 | Randomized Optimizations

  • Assignment 3 | Unsupervised Learning

  • Assignment 4 | Reinforcement Learning

  • Other reviews

  • My scores: showcases why assignments are important

  • Lectures can be found here.

  • Changelog

  • I took this class in Spring of 2024, the requirements of specific assignments could have changed for the upcoming semesters.

  • If you have any doubts whether the suggestions below apply to your cohort, check with the TAs, I claim no responsibility over lost marks or points.

  • Use the OMSCS study slack channel #cs7641 or study groups. 😎

Before we get into the nitty-gritties of discussing how to navigate the course, you need to get rid a couple of internal biases that can limit the derived learning.

Let’s clear the air

  • You shall reap what you sow: ML is one of those courses where the amount of work put in to understand concepts is highly correlated to the learning you shall receive. In other words, the number of training examples to achieve a low-bias low-variance mindset is high.

  • Start early: This advice is not new but it cannot be stressed enough. Each assignment comes with its unexpected challenges and it takes time for your mind to absorb and understand new information and at the same time tackle the assignment.

  • Don’t focus solely on assignments: While assignments are the main push, definitely do the lectures and read the textbook and readings. You should not end up in a scenario where you fully perform the experiments in the assignments but are not able to understand what’s going on. Plus, the book conveys good intuition on what to focus on for analysis.

    The song lyrics is what you will understand if you go in blindly tuning hyperparameters without understanding the material. fhdsddhsjshjkjasaskjdihdknkamkam? 🥸

  • You’re lucky, work with what you have: I am a firm believer of keeping a positive mindset and that has helped me through the course. It is important to note that most of the libraries (sklearn, mlrose, bettermdptools) used in the course didn’t exist in the past and we’re lucky to have them. The FAQs and TA blogs are really helpful too, the past cohorts did not have them. So, let’s be a bit grateful to the TAs on improving this course for us! :)

  • Follow the assignment PDF and FAQs: An extension of the above point, follow the blogs, FAQs and assignment PDFs exclusively and you will be fine.

  • The TAs are not the enemy: The TAs are here to help. While they may make mistakes, it’s important to understand like you, they’re juggling full time jobs + a OMSCS course or two + being a TA.

  • This course will eat your time: I know that most people will not be able to cover everything here because real life. But come in with the mindset that you need to spend time in this course. Take it easy, do even simple tasks one day at a time, start early and you should be fine. Don’t compare your progress with others, just keep going. Remember, keep a positive mindset as this course calls for one to persevere and struggle.

  • Use Overleaf: Use Kyle Nakamura's LaTeX template. It’s much easier to write reports, don’t waste time using Microsoft Word.

  • OHs are helpful: You may not be able to attend all of them and that’s okay but view the recordings or summarize the transcripts. Sometimes you find some good POVs of analysis from the OHs.

I leave it to Prof. Isbell to answer this, this will help your analysis: Charles Isbell and Michael Littman: Machine Learning and Education | Lex Fridman Podcast #148 (The links start at a specific time, watch until 14:50).

WARNING on the above video: Do not steal code written specifically for the CS7641 course (eg: ChedCode), steal anything else, some egs: how to implement Decision Tree in sklearn, how to make a line/bar/scatter-plot, how to use pandas, you get the idea.

More information on analysis:

  • What is initial hypothesis? I struggled with this a lot in my first assignment. It made no sense to me what an initial hypothesis was and I have never done research papers to know what it means. Let me put it simply: looking at your data, you make an assumption on how a specific algorithm may perform then once you perform the experiment, you either accept or reject your initial assumption (hypothesis) and justify why it occurs.

  • My model has bad accuracy, I keep checking with other datasets: This is a mistake many students make, the aim of the course is not to have the best top-notch tip-top greatest model with the best accuracy. The aim is to record what you see and convey why what you see occurs. The aim of the analysis is to understand whether you have developed a deep intuition on the hyperparameters. Why changing what affects the algorithm how and how the data can be conducive or negate this effect.

    THE BEST THE BEST THE BEST THE BEST?????? NO, try not to go this route.

  • Grid search: On the above point, you don’t need THE BEST model so you can avoid grid search. Try to develop an intuition on how validation / learning curves show generalizability and follow the below point on scrutinizing religiously.

  • Scrutinize everything: Leave no stone unturned, scrutinize everything. Why this occurs? Could it be the formula? Could it be the data? Could it be a specific hyperparameters? Why did you make this choice? What was the intended gain with that choice and how much of that gain was achieved? The scope of “scrutinization” is up to you but don’t miss out the requirements laid out in the assignment requirements and FAQ.

  • Scrutinizing / questioning everything may feel redundant? It is not: If you feel you can convey your intuition differently, go for it. But I will tell you this, questioning everything and going to the depths in the analysis will make you a better ML engineer. I am not in tech and I may be wrong but people in this industry seem to be questioned a lot for their choices of SL algorithm / hyperparameters. Questioning everything along with initial hypothesis and justification of results seems to help indirectly with this.

  • Examples on initial hypothesis: I did not take an ML example as that may just be giving you a free hypothesis 🙃

    • I throw a ball with an assumption (my hypothesis) that my friend will be able to catch it. However, he misses the catch (this being the experimentation) because his height was too short or my throw was too fast (the justification).

    • Hypothesis: Let me have a coffee because it will with my productivity. Experiment: Amount of work completed is checked once caffeine is taken.
      Analysis: Statistical / ML analysis if there is huge difference with and without caffeine.
      Conclusion: What did the analysis show?
      Justification: Investigate on the conclusion (scrutinize everything)

  • Use what helps your analysis: You feel adding another image might help your justification? Do it. You feel adding a table of some information may help showcase your work better, do it.

    Avoid Overfitting: A Practical Guide to CS7641 Machine Learning (1)
  • A tip on saving time on what to talk about: Learn the concepts well and find a friend or a group of friends. Talk and discuss about your understanding and ask them to share their understanding on the topic. This will help unveil more on the subject which can be nice points in your analysis.

Assignment 1 - Supervised Learning (SL)

Library to use: Scikit-learn
(for NN, you could look at Keras, PyTorch but best to stick with sklearn’s MLPClassifier as it is used for A2)

Books / Readings:

YouTube:

  • StatQuest videos, 3Blue1Brown, CampusX - his videos are in Hindi but they provide good intuition on how the algorithms work, especially SVM and Boosting.

  • There are many videos on Supervised Learning, explore YouTube/MIT OCW, etc.

In this assignment, you pick 2 “interesting” datasets and apply SL algorithms on them. Here are some tips.

  1. What is interesting and why 2 datasets? Look at the podcast video in the above section if you have not. To summarize, you pick 2 datasets so that you can compare and contrast between the 2 datasets. You may need to try a couple of datasets before you settle on 2 which show differences. Try to pick simpler datasets, it’s not necessary to pick very complex ones and run into a situation with high run times. Also, remember that the same datasets will be use in some form and fashion in A2 and A3 so do keep that in mind when selecting.

  2. Understanding for analysis: Learn how to understand and interpret validation and learning curves.

    1. You will later learn that the bias-variance trade-off or concepts of underfitting and overfitting is not limited to SL algorithms but can be applied to other algorithms as well if you think hard enough.

Library to use: mlrose-hiive lib | mlrose “non-hiive” outdated docs
YouTube:

Books / Readings:

  • TA Blogs : Randomized Optimizations Category

  • Prof Deepak Khemani has a book as well, it conveys the same intuition as the lectures that can be used to understand the algorithms

  • Mitchell chapters: Ch 9 only explains Genetic algs.

  • Another good book: Genetic Algorithms by David E Goldberg

  • Artificial Intelligence by Russel and Norvig

  • 2 papers by Prof Isbell on MIMIC

  • And many more? Do message me if you have anything interesting. These are all the ones I perused in my A2 assignment.

In this assignment, you implement RHC, SA, GA, MIMIC on a problem of your choice.

  • No docs: You’ve been spoilt by Scikit-learn in A1, now it’s time to torture you. The first thing you will come across is that the “hiive” version of mlrose has no documentation. Dive directly into code and understand the comments, use the docs just to understand how to create an optimization problem.

  • Use the runners: Use the runners in hiive, they give nice output dataframe that can be used for plotting.

  • Code tip: Since the runtimes for some of these ROs can be really long (like ~12-13 hours, no joke), there are 2 ways to save your results:

    • Save the output as csv and call it in another .py or notebook for plotting.

    • Pickle your objects, literally saving the python objects. I prefer the above .csv route since pickling objects could take time. I didn’t try it in A2 but in A4, pickling sizes were huge.

  • As always, just remember again to scrutinize everything based on assignment requirements.

Assignment 3 - Unsupervised Learning

Library to use: Scikit-learn
YouTube:

  • StatQuest videos, 3Blue1Brown, CampusX - his videos are in Hindi but they provide good intuition on how the algorithms work, especially PCA.

  • Good intuition on GMM

Books / Readings:

In this assignment, you use various unsupervised learning techniques such as clustering and dimension reduction techniques alongside supervised learning algorithms.

This assignment wants you to develop an intuition on:

  • Clustering: How to uncover groupings in data (clustering) when target labels are not present.

    • Based on your input data, which clustering techniques are more useful? K-Means, K-Modes, GMM (any other EM technique)?

    • Do the natural groupings in the data align with the actual target labels? If not, why?

    • Will appending existing dataset with clustering information give your supervised learning techniques more context to work with? Can this additional context help the SL technique generalize better?

      • You can look at this by different feature selection techniques as well such as decision trees, forward feature selection or backward feature selection.

  • Dimension Reduction:

    • Can reducing dimensions help improve performance of supervised learning techniques?

    • Do the reduced dimensions represent the same out of information as the actual dataset?

Library to use: BetterMDPTools and Gymnasium
YouTube:

Books / Readings:

In this assignment, you model problems as Markov Decision Processes and understand how different model-based and model-free problems work on them.

  • MC, MRP, MDPs: It is important to understand the difference between Markov Chain, Markov Reward Processes and Markov Decision Processes, each one of them build on each other. The David Silver videos illustrate the difference on that well.

  • Value and Policy Iteration: Understand how these algorithms work. How is the policy derived in value iteration and approximated in policy iteration.

  • Q-Learner: How the agent explores and exploits the space.

Congratulations!

You’re done with the assignments, wish you all the best for the Finals.

Avoid Overfitting: A Practical Guide to CS7641 Machine Learning (2)

My scores were A1 (98/100), A2 (100/100), A3 (96/100), A4 (95/100) which led to an overall of 98.00 % from assignments.

I could literally skip finals and still score an A in the class, this shows the importance of giving it your all in the assignments!

In finals, I score a 32.5/57 which dropped my overall to 88.70 % leading to an A. The finals did not go so well since the questions were tricky and confusing but mostly from the lectures. Just do the lectures + George’s notes and you should be fine.

Other interesting text to review:

  • First draft completed, 19th April 2024.

  • First edit, 20th April 2024.

  • Second edit: added TOC, more content, updated confusing information, 9th May 2024.

Avoid Overfitting: A Practical Guide to CS7641 Machine Learning (2024)

FAQs

How to avoid overfitting in machine learning? ›

You can prevent overfitting by diversifying and scaling your training data set or using some other data science strategies, like those given below.
  1. Early stopping. Early stopping pauses the training phase before the machine learning model learns the noise in the data. ...
  2. Pruning. ...
  3. Regularization. ...
  4. Ensembling. ...
  5. Data augmentation.

Which samples are used to decide when to stop training to avoid overfitting? ›

Early stopping sample are used to decide when to stop training to avoid overfitting.

How to avoid overfitting in decision tree in machine learning? ›

Strategies to Overcome Overfitting in Decision Tree Models
  1. Pruning Techniques. ...
  2. Limiting Tree Depth. ...
  3. Minimum Samples per Leaf Node. ...
  4. Feature Selection and Engineering. ...
  5. Ensemble Methods. ...
  6. Cross-Validation. ...
  7. Increasing Training Data.
May 2, 2024

What is overfitting and underfitting and how to remove it? ›

Overfitting models produce good predictions for data points in the training set but perform poorly on new samples. Underfitting occurs when the machine learning model is not well-tuned to the training set. The resulting model is not capturing the relationship between input and output well enough.

Is 97% accuracy overfitting? ›

Our training dataset contains 80,000 customers, while our test dataset contains 20,000 customers. In the training the dataset, we observe that our model has a 97% accuracy, but in prediction, we only get 50% accuracy. This shows that we have an overfitting problem.

How to tell if a model is overfitting? ›

Plot the model's performance on both the training and validation sets over time. If the two curves start to diverge, it's a sign of overfitting. Cross-validation. Use cross-validation, where the training data is split multiple times and the model is evaluated on each split.

Is 100 epochs too much? ›

As a general rule, the optimal number of epochs is between 1 and 10 and should be achieved when the accuracy in deep learning stops improving. 100 seems excessive already. Batch size does not affect your precision. This is simply used to modify the pace or efficiency of the GPU's memory.

When should you stop training a model to avoid overfitting? ›

Answer: You should stop the training when the validation loss starts to increase, indicating the onset of overfitting. Determining the optimal epoch to stop training and avoid overfitting depends on monitoring the model's performance on a validation dataset.

How do you prevent overfitting learning rate? ›

To prevent overfitting, the best solution is to use more complete training data. The dataset should cover the full range of inputs that the model is expected to handle. Additional data may only be useful if it covers new and interesting cases. A model trained on more complete data will naturally generalize better.

Which of the following helps in avoiding overfitting in decision? ›

There are several techniques that can help prevent overfitting in decision trees, such as pruning, regularization, and ensemble methods. Pruning is the process of removing or collapsing branches or nodes that do not contribute much to the accuracy or complexity of the tree.

How is overfitting generally avoided in a decision tree ____? ›

Post-Prunning. This technique allows decision trees to grow to their full depth in the training process, then starts removing the branches of the trees to prevent the model from overfitting.

Does pruning prevent overfitting? ›

Pruning reduces the complexity of the final classifier, and hence improves predictive accuracy by the reduction of overfitting.

How do you avoid overfitting? ›

Below are several ways that can be used to prevent overfitting:
  1. Early Stopping.
  2. Train with more data.
  3. Feature Selection.
  4. Cross-Validation.
  5. Data Augmentation.
  6. Regularization.

Can overfitting be eliminated? ›

Techniques to Reduce Overfitting

Improving the quality of training data reduces overfitting by focusing on meaningful patterns, mitigate the risk of fitting the noise or irrelevant features. Increase the training data can improve the model's ability to generalize to unseen data and reduce the likelihood of overfitting.

How can you solve the problem of overfitting and underfitting? ›

  1. 1 Use appropriate data. One of the most important factors that affect overfitting and underfitting is the quality and quantity of the data you use to train your model. ...
  2. 2 Apply regularization. ...
  3. 3 Use cross-validation. ...
  4. 4 Use early stopping. ...
  5. 5 Use data augmentation. ...
  6. 6 Use batch normalization. ...
  7. 7 Here's what else to consider.
Apr 2, 2023

What are the two methods of how overfitting can be mitigated? ›

Overfitting makes the model relevant to its data set only, and irrelevant to any other data sets. Some of the methods used to prevent overfitting include ensembling, data augmentation, data simplification, and cross-validation.

How do you avoid overfitting in transfer learning? ›

How can you avoid overfitting with transfer learning?
  1. Choose the right model.
  2. Freeze or fine-tune the layers.
  3. Use data augmentation. Be the first to add your personal experience.
  4. Use regularization.
  5. Use early stopping. Be the first to add your personal experience.
  6. Use cross-validation. ...
  7. Here's what else to consider.
Mar 7, 2024

How to reduce overfitting in CNN? ›

Simplify the model by reducing the number of layers or parameters to limit its capacity to memorize training data. Monitor the model's performance on a validation set and stop training when performance degrades. Apply techniques like L1 or L2 regularization to penalize large weights and reduce overfitting.

References

Top Articles
Latest Posts
Article information

Author: Arielle Torp

Last Updated:

Views: 6066

Rating: 4 / 5 (61 voted)

Reviews: 92% of readers found this page helpful

Author information

Name: Arielle Torp

Birthday: 1997-09-20

Address: 87313 Erdman Vista, North Dustinborough, WA 37563

Phone: +97216742823598

Job: Central Technology Officer

Hobby: Taekwondo, Macrame, Foreign language learning, Kite flying, Cooking, Skiing, Computer programming

Introduction: My name is Arielle Torp, I am a comfortable, kind, zealous, lovely, jolly, colorful, adventurous person who loves writing and wants to share my knowledge and understanding with you.