logarithms - What, if anything, can be said about $log(f(g(x))$

Tuesday, November 21, 2017

logarithms - What, if anything, can be said about $log(f(g(x))$

Given that you can restrict $f$ and $g$ to any form (convex, monotonic, etc.) what can be said about $\log(f(g(x)))$ (if anything)?

For context:

I am looking to consider replacing weight updates in neural network backpropagation with $\log$ weight updates as a way to deal with vanishing gradients in long chains of partial derivatives. The form for a neural network looks like:

$g(W_2g(W_1x)) = \hat{y}$

With $g$ and $f$ as any arbitrary non-linear functions. During backpropagation you compute $\Delta W_i = \frac{\partial L}{\partial W_i}$ which ends up looking like a large chain of partial derivatives $\Delta W_i = \frac{\partial L}{\partial h}\frac{\partial h}{\partial a}\frac{\partial a}{\partial W_i}$. Taking $\log{\Delta W_i}$ allows you to add those partial derivatives together instead of multiplying, but you are left with $\log{\Delta W_i}$ instead of $\Delta W_i$.

I think the question ultimate is about if it is possible to constrain the forward model in such a way (perhaps limiting it's expressiveness) that we might use $\log{\Delta W_i}$ to update weights without needing to take $e^{\log{\Delta W_i}}$. One of my first thoughts was to take $\log{\hat{y}}$ and sort of see what happens, but I realized I didn't know much about what I might be able to do with $\log(f(g(x))$.

I'm thinking there might be concepts like Jensen's Inequality but for composite functions and then we seek to minimize our loss function $L$ as a upper or lower bound.

Blog

Tuesday, November 21, 2017

logarithms - What, if anything, can be said about $log(f(g(x))$

No comments:

Post a Comment

analysis - Injection, making bijection