์ •๊ทœํ™”(Regularisation)

์ •๊ทœํ™”(Regularisation)

์ด๋ฒˆ์—๋Š” ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ์ •๊ทœํ™” ํ•ด ๋ณด์ž. vecnorm๊ณผ ๊ฐ™์€ ์ •๊ทœํ™”๋ฅผ ํ•ด์ฃผ๋Š” ์ ์ ˆํ•œ ํ•จ์ˆ˜๋ฅผ ๊ฐ ๋ชจ๋ธ ํŒŒ๋ผ๋ฏธํ„ฐ์— ์ ์šฉํ•˜์—ฌ ๊ทธ ๊ฒฐ๊ณผ๋ฅผ ๋ชจ๋“  loss์— ๋”ํ•˜๋„๋ก ํ•˜์ž.

์˜ˆ๋ฅผ ๋“ค์–ด, ๋‹ค์Œ๊ณผ ๊ฐ™์€ ๊ฐ„๋‹จํ•œ regression์„ ๋ณด์ž.

julia> using Flux

julia> m = Dense(10, 5)
Dense(10, 5)

julia> loss(x, y) = Flux.crossentropy(softmax(m(x)), y)
loss (generic function with 1 method)

m.W์™€ m.b ํŒŒ๋ผ๋ฏธํ„ฐ์— L2 norm์„ ์ทจํ•˜์—ฌ ์ •๊ทœํ™” ํ•ด๋ณด์ž.

julia> penalty() = vecnorm(m.W) + vecnorm(m.b)
penalty (generic function with 1 method)

julia> loss(x, y) = Flux.crossentropy(softmax(m(x)), y) + penalty()
loss (generic function with 1 method)

๋ ˆ์ด์–ด๋ฅผ ์ด์šฉํ•˜๋Š” ๊ฒฝ์šฐ, Flux๋Š” params ํ•จ์ˆ˜๋ฅผ ์ œ๊ณตํ•˜์—ฌ ๋ชจ๋“  ํŒŒ๋ผ๋ฏธํ„ฐ๋ฅผ ํ•œ๋ฒˆ์— ๊ฐ€์ ธ์˜ฌ ์ˆ˜ ์žˆ๋‹ค. sum(vecnorm, params)๋ฅผ ์‚ฌ์šฉํ•˜์—ฌ ์ „์ฒด๋ฅผ ์‰ฝ๊ฒŒ ์ ์šฉํ•  ์ˆ˜ ์žˆ๋‹ค.

julia> params(m)
2-element Array{Any,1}:
 param([-0.61839 -0.556047 โ€ฆ -0.460808 -0.107646; 0.346293 -0.375076 โ€ฆ -0.608704 -0.181025; โ€ฆ ; -0.2226 -0.0992159 โ€ฆ 0.0707984 -0.429173; -0.331058 -0.291995 โ€ฆ 0.383368 0.156716])
 param([0.0, 0.0, 0.0, 0.0, 0.0])

julia> sum(vecnorm, params(m))
2.4130860599427706 (tracked)

์ข€ ๋” ํฐ ๊ทœ๋ชจ์˜ ์˜ˆ๋กœ, ๋ฉ€ํ‹ฐ-๋ ˆ์ด์–ด ํผ์…‰ํŠธ๋ก (perceptron)์€ ๋‹ค์Œ๊ณผ ๊ฐ™๋‹ค.

julia> m = Chain(
         Dense(28^2, 128, relu),
         Dense(128, 32, relu),
         Dense(32, 10), softmax)
Chain(Dense(784, 128, NNlib.relu), Dense(128, 32, NNlib.relu), Dense(32, 10), NNlib.softmax)

julia> loss(x, y) = Flux.crossentropy(m(x), y) + sum(vecnorm, params(m))
loss (generic function with 1 method)

julia> loss(rand(28^2), rand(10))
39.128892409412174 (tracked)