Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I think the math here should be right, although perhaps the confusion here is because I wasn't clear about the role of the order:

We're ordering the trials by speed, and computing the probability of getting an AAABBB ordering. The probability of ordering AAABBB by random chance is 1/20. That's why we don't care about getting BBB first, and why getting AAA first does tell you that A is faster then B.

You can also click on the link in the OP and see other people explaining the logic of the rule. If there's an error with either my explanation or others, I'm interested in hearing it.



The problem is that you are equally likely to get a BBBAAA ordering which would give you the same Type 1 error as AAABBB in the case where A and B have the same distribution.

The main key to remember when deriving a p value is that you are computing the probability of seeing an outcome at least as extreme given the null hypothesis, not the probability of seeing your particular outcome.


I see your point. But isn't that only true for two-tailed tests? This sort of ordering problem seems more suited to a one-tailed test. Let's say for example, we represent the bins of the probability distribution as the number of B's we can obtain (0, 1, 2, or 3).

In this case I am defining the statistical significance of our test ordering (AAABBB) as anything less than or equal to 0.05 of our probability distribution - which corresponds to the 0 Bs bin. This corresponds to a one-sided confidence interval of 95%.

What am I doing wrong?


A one sided test is only valid if you know that it's not possible to get a result in the other direction. In the general case you don't know for certain that B can't be faster than A. One sided tests are almost always invalid as it's very difficult to know that the other direction is impossible.

If this doesn't make sense, I would recommend running simulations under the null hypothesis. You will see that 5% of the time you will falsely conclude that A < B and that in another 5% of the time you will conclude that B < A, leading to an overall false positive rate of 10%.


I think I'm finally starting to see the heart of the problem now.

> You will see that 5% of the time you will falsely conclude that A < B and that in another 5% of the time you will conclude that B < A, leading to an overall false positive rate of 10%.

From my perspective, this makes perfect sense, since I know there is a 1 / 20 chance of getting 3 AAAs based on the combinations formula. Where I think I am having trouble is why the p value needs to be derived to account for both these extremes.

Earlier you wrote: > The main key to remember when deriving a p value is that you are computing the probability of seeing an outcome at least as extreme given the null hypothesis, not the probability of seeing your particular outcome.

This is where the confusion is coming from for me. Based on this definition of the p-value everything you say makes sense. But, I don't understand what the point of deriving a p value this way is, or what it tells us. And to clarify, I'm not saying your wrong, more that I just need to read up on p-value derivation to understand this.




Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: