torch7 nn docs: https://github.com/torch/nn/tree/master/doc ==nets== *googlenet *lenet *alexnet *overfeat ==operations== many operations can be found here: https://github.com/torch/nn/blob/master/doc/transfer.md *Max Pooling: http://deeplearning.net/tutorial/lenet.html : down sample (using its max value) *softmax: http://en.wikipedia.org/wiki/Softmax_function : "squashes" to (0,1) *ReLU: The rectifier function is an activation function f(x) = Max(0, x) which can be used by neurons just like any other activation function, a node using the rectifier activation function is called a ReLu node. The main reason that it is used is because of how efficiently it can be computed compared to more conventional activation functions like the sigmoid and hyperbolic tangent, without making a significant difference to generalisation accuracy. The rectifier activation function is used instead of a linear activation function to add non linearity to the network, otherwise the network would only ever be able to compute a linear function. *Dropout: Yes, the technique described is the same as dropout. The reason that randomly ignoring nodes is useful is because it prevents inter-dependencies from emerging between nodes (I.e. nodes do not learn functions which rely on input values from another node), this allows the network to learn more a more robust relationship. Implementing dropout has much the same affect as taking the average from a committee of networks, however the cost is significantly less in both time and storage required.