怎么处理overfitting/underfitting，

什么是regularization，

也就是在loss function中增加一个regularization term来平衡generalization和overfitting，通常这个term为 lambda w^T w，lambda是正则变量，也是一个hyperparameter，是我们自己来设定的。有了这个term，我们可以发现，lambda越大，这个term的权重也越大，那么相应的w参数权重越小，也就是曲线越平滑，极端情况，lambda很大，那么w的各个参数趋近于0，从而使得pattern接近于一条直线（underfitting)；lambda越小，整个term的权重也越小，相应的w的权重越大，曲线则越不平滑，极端情况lambda=0则退化为没有正规化的loss function，从而overfitting. lambda这样的hyperparameter的选取通常是得不到数学的保障的，所以为什么很多人将ML的model选取和古代道士炼丹一样（更多的是经验和运气）。当然我们可以plot不同的lambda的选择形成的loss function，从而选择合适的lambda（似乎比道士高不了多少）

什么是convex optimization，

Convex optimizationis a subfield ofoptimizationthat studies the problem of minimizingconvex functionsoverconvex sets. The convexity makes optimization easier than the general case sincelocal minimummust be aglobal minimum, and first-order conditions are sufficient conditions for optimality.[1]

Convex minimization has applications in a wide range of disciplines, such as automaticcontrol systems, estimation andsignal processing, communications and networks, electroniccircuit design,[2]data analysis and modeling,finance,statistics(optimal experimental design),[3]andstructural optimization.[4]With recent improvements in computing and in optimization theory, convex minimization is nearly as straightforward aslinear programming. Many optimization problems can be reformulated as convex minimization problems. For example, the problem ofmaximizingaconcavefunctionfcan be re-formulated equivalently as a problem ofminimizingthe function -f, which isconvex.

simplex algorithm is the convex optimization algorithm

什么是gradient vanish，

LSTM用几层，

keep gate的结构，

Drop-out的结构……

有一道题我不明白：Machine learning分为numerical和categorical两种，它们各自假设的分布是什么？

ML

results matching ""

No results matching ""