Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
I realized that derivatives are linear (codesmoothie.com)
103 points by jasonszhao on June 26, 2018 | hide | past | favorite | 69 comments


So the point of this article is that differentiation is linear. That is, the operator D which takes f to d f/d x is linear. The author points out that one can write this down in as a matrix with respect to a basis of polynomials, which is nice for suitably well behaved functions and I think nice for understanding. Other operators one might look at are integration, Fourier or Laplace transforms, or more exotic integral transforms which are linear. One can view a Fourier transform like a change of basis.

In another sense, derivatives themselves are linear: for a function f: U -> V of vector spaces, the derivative (at some point) is a linear map from U -> V, (i.e. the derivative of the functions is a function Df: U -> L(U,V)) and this extends the concept of derivative to multiple dimensions as f(x+h) = f(x) + (Df)(x)h + o(h).

This seems ok at first derivatives but can become unwieldy as they became tensors higher rank.

Another question one might ask on learning that differntiation is a linear operator is what it’s eigenvalues are. For differentiation these are functions of the form f(x) = exp(ax). But one can construct other linear operators and from this you get Sturm–Liouville theory which is fantastic.

One final note is that much of this multidimensional derivatives and tensor stuff becomes a lot easier if one learns suffix notation (aka Einstein notation, aka index notation, aka summation convention), as well as perhaps a few identities with the kronecker delta or Levi-Civita symbol. Notation can break down a bit with arbitrary rank tensors: $a_{i_1,...,i_k}$ becomes unwieldy but writing $a_{pq...r}$ is ok.


The derivative is a linear operator, but it's not a bounded operator. That is, for example, the vector norm of f(x) = k·sin(x/k) → 0 when k→0, but the norm of d/dx f(x) does not. This also means that it's not continuous.

Of the mappings between vector spaces, the most well behaving are the bounded linear operators, and the derivative doesn't belong to these. But yes, it's linear.

Edit: Originally wrote f(x) = k·sin(k·x), but meant f(x) = k·sin(x/k).


Depends on what space you define the derivative on. It is of course a bounded (and therefore continuous) operator from C^k to C^{k-1} for any positive integer K.

Additionally, it only really makes sense to talk about bounded operators between topological vectors spaces (as you need to make sense of what it means to be bounded), of which the most commonly dealt with are Banach spaces.


It also depends on what topology you put on your space, while the derivative operator is defined on all of C^k, that does not make it continuous. In fact as the GPs example shows, the topology you put on C^k and C^{k-1} must be so that uniform convergence does not imply convergence in C^k (which differs from many peoples intuitive notion of convergence of functions) or cos(x/k) converges to 0 in C^{k-1} (which is just plain weird).


> Additionally, it only really makes sense to talk about bounded operators between topological vectors spaces

Reading your comment, I wondered how can you define bounded sets in a topological vector space (where you don't have a norm). The definition is cute: a set X if bounded if any neighborhood of 0 can be inflated to include the whole of X.


You cannot speak of boundedness of operators if you don't specify the domain and codomain spaces. The derivation operator is for example bounded between C^1(R) and C^0(R), or between C^\infty(R) and C^\infty(R). Your example correctly proves that it is not bounded between C^0(R) and C^0(R).


Anyone have a concrete example of an open subset of C^k whose preimage under differentiation is not open?

(Also, what's the most natural norm on C^k? The sup norm? I haven't done any functional analysis in years)


If you want Cᵏ to be a normed space you do

|f| = ∑ᵢ₌₀ᵏ sup|f⁽ⁱ⁾|.

On open domains you can also use the topology that forces uniform convergence on all compact subsets. But this will only give you a metric space, no Banach space (but you’ll include unbouded functions). This is needed for studying Brownian motions with an unbounded time domain.

Regarding the other question: If you take Cᵏ⁻¹ as a the co-domain differentiation will be continuous. IIRC to get unbounded linear maps defined on the whole Banach space you need the axiom of choice, you won’t be able to write one down.

The problem with differential operators is that they are usually only defined on a dense subset of the domain, and there they are not bounded. E.g. in quantum mechanics the space of states is L² but all the interesting observables are differential operators. You can weasel out of this situation by defining them on a subset of “physical states” (e.g. smooth wave functions of rapid decay). But they aren’t continuous anymore (the spectrum is unbounded). But on Hilbert spaces everything mostly works out fine. Physicist usually ignore those technical problems and still don’t make mistakes.


So what level of mathematics is needed to understand the following thread? Undergrad real analysis? Grad math courses?


> So what level of mathematics is needed to understand the following thread?

First course in functional analysis.

Before that, people usually take linear algebra, all the calculus courses, some kind of theorem-proving course for calculus (either analysis or real analysis or really rigorous versions of the calculus courses) and measure theory, maybe also topology.


hoping for an answer to this too


This was an example used in my linear algebra class as soon as they started introducing the vector spaces in an abstract sense.

I think this post may still be too wedded to the idea of linear spaces and vectors being arrays of objects - specifically in insisting on decomposing functions like sin and cos to Taylor Series. In fact, you can have a vector space where, in addition to polynomial terms, there are also dimensions for sin(x), tan(x), sin(x - pi), e^x, etc. The fact that you can't enumerate these dimensions, or even describe the set of them until given a set of vectors you're trying to describe, doesn't keep this from being a vector space.


Hm. Interesting.

I always viewed real functions as infinite-dimensional vectors in the "canonical" basis, that is, shifted Dirac impulses. I guess it can be transformed into your representation with a change of basis with some handwaving.


What you're describing may not be a subspace of the space I'm describing. Mine is definitely a subspace of yours - e.g. you can project functions into sums of shifted Dirac impulses by representing each function dimension as a linear combination of the "Dirac vectors" for that function's values.


Hmmm. I think it’s easy to fix that:

Suppose that there’s a function f, that can be written as an infinite sum (integral) of shifted Dirac impulses, but cannot be written in your representation as a sum of those “base functions”. Then simply add a new dimension to your representation that will correspond to f, so that f will be represented as 1 at this new dimension, and zero everywhere else. (In other words: add f to the base functions)

Repeat until you have covered every function.


Well this is the whole point of derivatives (i.e. tangent maps): to be linear approximations of functions.

So yes, a linear approximation of a linear function is the function itself.


You are right, but what you are saying has nothing to do with the author's point: what he is saying is that the differentiation operator itself is linear, which is a meaningful and true fact even in spaces where you have no idea of what a linear function is.


I think you'll find that derivatives are overwhelming defined as operators on vector spaces. It is definitely correct to talk about a linear function on a vector space.


I agree with you, but this is not what GGP is saying. GGP is saying another true fact (i.e., that derivative of a linear function is that same function), which a different thing than stating, as the article says, that the differentiation operator is linear. On a manifold there is no concept of a linear function, so you cannot say that the derivative of a linear function is the same function, but the differentiation operator is still defined and linear.


Yes, you are right. My apologies.


Of course you are correct, even though one could argue the linearity of differentiation is a property you can obtain by differentiating in coordinate charts, where the reasoning is still valid.

In any case, it is probably a good thing to get a good intuition of what differentiation and derivatives are in the vector space setting before digging into differential geometry.


Not all vector spaces are manifolds (in fact most aren't), and you don't need charts to define differential operators (just see functional analysis).


I most definitely agree with you, however this brings us even further away from the author's setting.


That is my feeling after reading this article. For people who have learned a little bit of Analysis, it is like saying "the earth is round". There is nothing new. The Wikipedia page of Derivative (https://en.wikipedia.org/wiki/Derivative) has a detailed description on the linearity. (Well, I do appreciate the author's way of presenting the idea, but I don't think it deserves an in-depth discussion on Hacker News.)


If a function f passes through some point (a,b), then the tangent to f through that point is given by

    (y-b) = f'(a)·(x-a)
and that function is affine but usually not linear. (For the tangent curve to be a linear function, you would need a·f'(a) = b, so that the tangent goes through the point (0,0).)

It's not at all obvious to me that this means that the function d(f) = df/dx is linear. It is linear, but I don't see how the tangent curve demonstrates it.


It is common in this context to say "linear" to also mean "affine", because after all affine functions are not much more complicated than linear functions.


The derivative is only the direction of the tangent line. The affine part does not come into play.


The sum operator is linear, so its derivative is itself: the derivative of the sum is the sum of derivatives. Same goes for multiplication by a scalar.

From this you obtain that the derivative of any linear combination is the linear combination of the derivatives: differentiation is linear.


The first headline for this was something along the lines of "I realized that derivatives are linear" making it clear that this is not a new discovery, but rather a person sharing a lightbulb moment.

I feel a lot of comments are saying "well of course they are!", not realizing that this is not about a new discovery.


Be careful when using knowledge from this post. These are special cases, not the general rules of differentiaton (or calculus).

For example, for multi variable calculus, the results would be very different.

Let's take the example of W.X

d/dx (W.X) = X.d/dx(W) + W.d/dx(X)

since W is not dependent on x, the first term is zero and we get the answer the author got.

Before drawing conclusions from the post, please remember the assumptions the author has taken.


Integrals and Fourier Transforms are also linear...


There's a ² missing. Should be dC/dx = sum d/dx |...|².


Yep, fixed!


>Most of the other non-polynomial functions have an equivalent Taylor polynomial

_analytic_ functions have a Taylor _series_, but it would be incorrect to say that "most" functions have a taylor series, and a taylor series is not a polynomial.


There seems to be a misconception that linear transformations have to look like lines.


One easy way to see that "linear" does not mean "line-like" is looking at matrix transforms: All matrix transforms are linear. And we can do a lot of stuff easily with those, like rotating, distorting, even perspective (with w-normalization).


All matrix transforms are linear. And few people understand matrix transforms. The term itself "linear" transform stems more from the contrast to nonlinear equations, but it still "feels" like a bit of a misnomer. Perhaps it should be called "proportional" or "relatable" transform.

Ehh, then again, it doesn't really matter, people who don't understand it still won't with a nomenclature change.


HN continues to confuse me to no end.

Mention some mathematically advanced idea: out come the pitchforks about how you don't need that, all you need is code/market size/scalability/product fit/investment/execution.

Mention a banality that anyone who studied algebra knows: frontpage.


Could you please not post like this? It's a shallow dismissal and it's tediously meta.

There's nothing wrong with the OP at all—someone sharing the excitement of discovering something for themselves.


You may be overestimating the banality. I studied CS with many math courses and some of the article goes over my head. (It would be clearer 10y ago) I don't expect many people here actually studied math as their main goal.


That aside I think there's something bigger here.

I think learning is really hard work, and so most people's first reaction to hard work is to say No, and then go and construct an a posteriori rationale for why actually they shouldn't do that hard work (it's not that useful, you're never gonna use it, you're an expert at something else, etc).

Similar story for why asking data structures in job interviews is a bad idea when you're an applicant (but the people who have been hired and are hiring, do think it's good to ask)


Yeah it seems analogous to bike shedding or Paul Graham's 'blub' paradox. I.e., "we don't understand that and haven't [realized that we've] needed it to date therefore it's probably not important and certainly not of interest".

I like your conservation-of-mental-energy interpretation.


TIL about the Blub paradox https://en.wikipedia.org/wiki/Blub_paradox#The_Blub_paradox

Thanks for that.


Or... you know, this is actually new information to some people. Why does there have to be something bigger behind it?


Or... both can be true.


My sentiments exactly. I see how the author felt thrilled by his discovery. But hn is getting less interesting each month. Otoh lobsters and Lbda the ultimate keep going strong.


Oh how interesting. Just went on lobste.rs and at the bottom they have the moderation log made public. Really cool. I guess it's unusual because moderators don't like accountability.

What's Lbda though?


Ltu, lambda the ultimate


This is why you take linear algebra and calculus before doing machine learning.


On the contrary - ML is a great motivator to finally grapple the "prerequisites".

During school I never understood what the math was for, so my unconscious brain never saw the necessity to actually learn it. Now I want to learn - with hugely better results.

This mechanism should be utilized much more often instead of shoving seemingly unrelated knowledge into peoples ears without letting them feel the need for it first.


Right. There are a lot of comments along the lines of "Duhhh, that's the point of calculus" here. But you know what? Screw that. People learn when people learn, and discovering it for yourself is a lot cooler than someone telling you about it, especially if it's related to investigating something you care about.

I can't tell you the number of 'trivial' math facts that I have (re)discovered because they were in the context of something I cared deeply about.

The point isn't to remember D_x is a linear operator--math isn't about memorization. It's about understanding the context where this is a useful fact and knowing how to figure it out.

Learn it in Calc I and you can half-heartedly reference it (...isn't differentiation linear? I feel like I remember that from senior year of high school...).

Figure it out on your own and you own it for life.

Post it to the internet and you get ridiculed and mocked for it so that you wish you could forget it.


> The point isn't to remember D_x is a linear operator--math isn't about memorization. It's about understanding the context where this is a useful fact and knowing how to figure it out.

Totally agree with this part.

> Post it to the internet and you get ridiculed and mocked for it so that you wish you could forget it.

Telling everybody you meet about a basic fact that you just learned is mildly cute when a 6-year-old does it.

It's great that this person learned something that was new to them, sure. But that doesn't mean that they need to shout about it to the tens of thousands of their closest friends who read the front page of HN.


Agreed. This is the approach taken by the generally brilliant '3Blue1Brown' YouTube channel.

His video on determinants (for example) teaches what they actually do, rather than emphasising a seemingly arbitrary algorithm for you to blindly follow (which is how I was taught determinants).


> On the contrary - ML is a great motivator to finally grapple the "prerequisites".

Your comment doesn't make sense, considering that you acknowledge math is in fact a prior requirement. Just because you didn't made heads or tails out of math during school that doesn't mean math ceased to be a prerequisite to learn applied math.


calculus fundamentals pretty often lead to extremal problems. Those seem quite useful for optimization at any rate.


"Please don't post shallow dismissals, especially of other people's work. A good critical comment teaches us something."

https://news.ycombinator.com/newsguidelines.html


Hm, I feel like how that comment is read is in the eye of the beholder... it just depends on whether you see the glass half empty or half full. You could read it as a dismissal, or you could read it as an advice or a warning that could very well teach someone (like the OP) something... in this case, I suppose that might be something like: "Here's the important thing to really take away from this incident: it's worth taking the courses in the recommended manner so you can learn more insightful things like this with less effort."


I think it's great that machine learning is exciting enough that people want to learn the math behind it.


The dude/dudette figured it out on their own which is what math is actually about. Who cares if it's well known? Knowing a fact because you learned it in a class is cool and all but it's not the only way to acquire knowledge, and I would argue not even the best way---just the most efficient fact/hour ratio.


You are right. Sometimes the most sticking facts in my head are those that I rediscovered by myself, because that helped me to see their actual importance, which might have not been that apparent while hearing about them in classes. There are teaching theories in which the student is not supposed to learn about something, but they are guided to rediscover it by their tutors.


I used to teach fencing, in old days people would do months of footwork before being allowed to have a go at the actual sport. This changed because once the possibility of fatal encounters disappeared almost no one was motivated enough to go through the prerequisite.

But being clear, the footwork is the fundamental and can't be skipped.


[flagged]


This is HN, they call it "backpropagation".



hey, it is better to be nice in that case! Because you don't know anything when your are born, everyday thousands of people learn about well-known cool things that are new for them: https://www.xkcd.com/1053/

This guy even explained his rediscovery in a beautiful form.


> Because you don't know anything when your are born, everyday thousands of people learn about well-known cool things that are new for them: https://www.xkcd.com/1053/

That doesn't mean that we should praise everyone who announces they discovered or invented basic math properties which for some reasom were ignored although they are basic prerequisites for a specific task.

Should we also praise carpenters who discover the pythagorean theorem?


> Should we also praise carpenters who discover the pythagorean theorem?

Most certainly!


Heyyy man, what if the universe is LINEAR?


"Classification of mathematical problems as linear and nonlinear is like classification of the Universe as bananas and non-bananas."


Quantum computing is expressible as linear transformations of qubit superpositions




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: