There is no need to approximate a ReLu or tanh well. Machine learning is statist...

freemint · on Aug 12, 2021

> There is no need to approximate a ReLu or tanh well

Similarily there might not a need to emulate neurons well to get the circuits in the brain to work. However when someone makes arguments that neurons are equivalent x artifical neurons it is necessary to choose a bound for comparison (fe. L2 error of activation) for the emulations you compare.

neuah · on Aug 12, 2021

Also the nonlinearity only needs to be differentiable because ANNs are trained with gradient descent. With other more biologically plausible learning mechanisms, this might matter even less (or have other constraints / requirements)

sdenton4 · on Aug 12, 2021

Meanwhile, if we actually understood brains, I bet we would find endless examples of 'improper' behavior. Evolution picks up what seems to work, and sloooowly improves the parts that break, leaving good enough alone. (After all, if it doesn't affect reproductive probabilities, it doesn't matter.)

Activation functions will almost certainly not be the crux move for solving AGI.

stephencanon · on Aug 12, 2021

> Tanh is implemented in hardware so it's used.

Tanh is _not_ generally implemented in hardware, and it’s one of the fussier functions in math.h to implement well. Its only real virtues are that implementations are available everywhere, its derivative is relatively simple, and it has the right symmetries.

CodesInChaos · on Aug 12, 2021

You're right that neural networks don't care too much the shape of most activation functions. I assume that splicing together two decaying exponential functions at the origin would work just as well in practice.

However tanh is a bit more special than just having the right symmetries. Sigmoid is the correct function to turn an additive value into a probability (range 0 to 1). Tanh is a scaled sigmoid which fulfills the same purpose for the -1 to +1 interval.

I sometimes wonder if clamped linear or exponential functions would work better than tanh/sigmoid in places where they're currently used (like LSTM/GRU gates).

stephencanon · on Aug 12, 2021

Yeah, wiki has a decent survey of sigmoid (the family, not the specific function ML people often refer to by that name) functions here: https://en.wikipedia.org/wiki/Sigmoid_function#/media/File:G...

Note that tanh saturates to ±1 faster than most except erf when normalized to have slope 1 at the origin (its series at +infinity is like 1 - 2e^{-2x} + o(e^{-4x}), while many of the other options have polynomial series, so they don't approach 1 nearly as fast).

I suspect some applications would in theory rather use erf, but erf is even worse to compute than tanh (on the other hand, erf's derivative is really nice, so who knows?)

MauranKilom · on Aug 12, 2021

I assume that splicing together two decaying exponential functions at the origin would work just as well in practice.

Also known as tanh: https://en.wikipedia.org/wiki/Hyperbolic_functions

One "disadvantage" is that it doesn't saturate to [-1.0, 1.0] like appropriately scaled tanh.

CodesInChaos · on Aug 13, 2021

By splicing together I mean a piecewise function which is `exp(x) - 1` on the left and `1 - exp(-x)` on the right. Which should be similar enough to tanh for most purposes.

stephencanon · on Aug 15, 2021

Sure, it even has continuous derivatives of all orders and the right slope at the origin. It just doesn’t saturate to +/-1 as fast, which probably doesn’t matter.

magicalhippo · on Aug 12, 2021

So sin() could be used instead of tanh, if appropriately shifted and scaled I presume?

CodesInChaos · on Aug 12, 2021

You'd at least want to keep it at ±1 once it reaches that value instead of oscillating.

magicalhippo · on Aug 12, 2021

I was thinking of a half-period, ie +/- pi/2.

But yeah I wasn't thinking too much about large input values, I presumed clamped inputs, which I guess might not be ideal.

CodesInChaos · on Aug 12, 2021

I was talking about an output value of ±1 which corresponds to ±pi/2 as an input value. So we mean the same thing.