Given that you can restrict f and g to any form (convex, monotonic, etc.) what can be said about log(f(g(x))) (if anything)?
For context:
I am looking to consider replacing weight updates in neural network backpropagation with log weight updates as a way to deal with vanishing gradients in long chains of partial derivatives. The form for a neural network looks like:
g(W2g(W1x))=ˆy
With g and f as any arbitrary non-linear functions. During backpropagation you compute ΔWi=∂L∂Wi which ends up looking like a large chain of partial derivatives ΔWi=∂L∂h∂h∂a∂a∂Wi. Taking logΔWi allows you to add those partial derivatives together instead of multiplying, but you are left with logΔWi instead of ΔWi.
I think the question ultimate is about if it is possible to constrain the forward model in such a way (perhaps limiting it's expressiveness) that we might use logΔWi to update weights without needing to take elogΔWi. One of my first thoughts was to take logˆy and sort of see what happens, but I realized I didn't know much about what I might be able to do with log(f(g(x)).
I'm thinking there might be concepts like Jensen's Inequality but for composite functions and then we seek to minimize our loss function L as a upper or lower bound.
No comments:
Post a Comment