Research/Blog
CellStrat > Research/Blog > Artificial Intelligence > Deep Learning > Numerous Reasons for a Neural Network to go Wrong
Numerous Reasons for a Neural Network to go Wrong
- March 23, 2018
- Posted by: CellStrat
- Category: Artificial Intelligence Deep Learning Machine Learning
No Comments
![](http://learning.cellstrat.com/wp-content/uploads/2018/03/Neural-network.jpg)
How many times it has happened that you create a neural network, put it for training which takes more than half to sometimes full day or even days to train. But, then you find out that you did something wrong that the model starts outputting garbage. Where do you turn to then??? Whom do you ask or where do you look for errors???
There could be several reasons. A lot of things can go wrong. But some of them are more likely to be broken than others. One can usually start with this short list as an emergency first response:
- Start with a simple model that is known to work for this type of data (for example, VGG for images). Use a standard loss if possible.
- Turn off all bells and whistles, e.g. regularization and data augmentation.
- If finetuning a model, double check the pre-processing, for it should be the same as the original model’s training.
- Verify that the input data is correct.
- Start with a miniscule dataset (2–20 samples). Overfit on it and gradually add more data.
- Start gradually adding back all the pieces that were omitted: augmentation/ regularization, custom loss functions, try more complex models.
If the steps above don’t solve the problem, start going down the following big list and verify things one by one:
- Dataset issues:
- Check if the input data you are feeding the network makes sense. For example, using the same batch over and over. So print/display a couple of batches of input and target output and make sure they are OK.
- Try passing random numbers instead of actual data and see if the error behaves the same way. If it does, it’s a sure sign that your net is turning data into garbage at some point. Try debugging layer by layer and see where things go wrong.
- Your data might be fine but the code that passes the input to the net might be broken. Print the input of the first layer before any operations and check it.
- Check if data is labelled correctly.
- Is the relationship between input and output too random?
- Is there too much noise in the dataset?
- If your dataset hasn’t been shuffled and has a particular order to it (ordered by label) this could negatively impact the learning.
- Are there a 1000 class A images for every class B image? Then you might need to balance your loss function or try other class imbalance approaches.
- If you are training a net from scratch (i.e. not finetuning), you probably need lots of data.
- Make sure your batches don’t contain a single label.
- Reduce batch size
- Data Normalization/ Augmentation:
- Did you standardize your input to have zero mean and unit variance? Augmentation has a regularizing effect. Too much of this combined with other forms of regularization (weight L2, dropout, etc.) can cause the net to underfit.
- Augmentation has a regularizing effect. Too much of this combined with other forms of regularization (weight L2, dropout, etc.) can cause the net to underfit.
- If you are using a pretrained model, make sure you are using the same normalization and pre-processing as the model was when training. For example, should an image pixel be in the range [0, 1], [-1, 1] or [0, 255]?
- Check the pre-processing for training/validation/test set.
- Implementation issues:
- Try solving a simpler version of the problem.
- Check your loss function.
- If your loss is composed of several smaller loss functions, make sure their magnitude relative to each is correct. This might involve testing different combinations of loss weights.
- Sometimes the loss is not the best predictor of whether your network is training properly. If you can, use other metrics like accuracy.
- Training issues:
- Overfit a small subset of the data and make sure it works. For example, train with just 1 or 2 examples and see if your network can learn to differentiate these. Move on to more samples per class.
- Check weights initialization.
- Too much regularization can cause the network to underfit badly. Reduce regularization such as dropout, batch norm, weight/bias L2 regularization, etc.
- Maybe your network needs more time to train before it starts making meaningful predictions. If your loss is steadily decreasing, let it train some more.
- Monitor the activations, weights, and updates of each layer. Make sure their magnitudes match.
- A low learning rate will cause your model to converge very slowly.
- A high learning rate will quickly decrease the loss in the beginning but might have a hard time finding a good solution.
(Ref: Excerpted from Kaggle)