On the Saddle Point Problem for Non-convex Optimization