Wait, is this implementation just wrong? The "num_channel" is the embed_d, the K...

jostmey · on April 30, 2023

You may be right. I was not expecting this to be at the top of HN

UPDATE: I have updated the repo, attempting to address the issue raised by GaggiX

GaggiX · on April 30, 2023

Now that you have updated the code I think you have just flipped the error, it's still there, just doing the opposite (still wrong) thing: for example, if "num_steps" is 768 and "num_inputs" is 1 then K, Q, V is 1x1, okay but the the attention matrix cannot also be 1x1, it needs to be 768x768; you can also do the opposite K, Q, V being 768x768 and the attention matrix 1x1, before the error was that you made both the K, Q, V and the attention matrix 768x768 by applying the matrices on the wrong dimension, now the error is that you have made the the K, Q, V and the attention matrix 1x1.

For a simple fix just make the attention matrix W [num_steps, num_steps] instead of [num_inputs, num_inputs] then "y_i = torch.matmul(x_v_i, w_i)" would be "y_i = torch.matmul(w_i, x_v_i)" and should be correct.

jostmey · on April 30, 2023

I just checked and the attention matrix is 768x768. Basically, I reshape each sample to be a single step with an embed size of 768. I hope this makes sense. I may rewrite the README later tonight to make this point clear.

Let me know if you think I still don't got it!!!

GaggiX · on April 30, 2023

If you have a single embedded vector, it should be clear that the self-attention matrix cannot be an arbitrary NxN but only a 1x1, follow the fix I have added to my previous comment.

Also if K, Q, V and the self-attention matrix have the same dimensions even when the embed_d and sequence dim have different sizes there is something wrong.

jostmey · on April 30, 2023

Okay, I think it is fixed. I appreciate your help. If you see this, let me know if the issue still exists!

GaggiX · on April 30, 2023

Now it should be correct.

zxexz · on April 30, 2023

I made the same mistake recently! It doesn’t help that there are so many poorly documented repos that make similar mistakes (and worse yet, half baked medium articles that blatantly give false information). Directly translating papers to PyTorch is hard too, I really wish there was a better API to express these models in that aligns better with how they’re actually documented in papers.

AmblingAvocado · on April 30, 2023

[flagged]

dang · on April 30, 2023

"Don't be snarky."

"Edit out swipes."

https://news.ycombinator.com/newsguidelines.html

ShamelessC · on April 30, 2023

No one here actually checks anything before upvoting it. Having said that, this repo seems worthwhile and I appreciate the effort of OP. It's open source which empowers people to fix such problems themselves and make a PR, or at least an issue.

magicalhippo · on April 30, 2023

The discussions are sometimes more interesting than the submission itself.

jostmey · on April 30, 2023

Once I got this feedback from GaggiX, I was actually looking in HN for a way to remove my post, but then discovered I could not! So I have tried to fix it as soon as possible.

zxexz · on April 30, 2023

For the best, IMO. That was a great interaction, and anybody running into issues in the future related to this problem actually stand a chance of running into this thread now.