Actually, because the birthday paradox has k^2 as a term, this is actually less true than you might think. Having a time component actually reduces the chance of collisions over the long run, albeit at a cost of reducing the number that can be safely generated in any given quantum.
If you consider a 128-bit random number, you effectively have 64 bits of allocation space before you are likely to get a collision.
If you devote 48-bits to time, which provides millisecond accuracy for 9000 years, you then have 80 bits of randomness, effectively giving 40 bits of allocation space per millisecond before you are likely to get a collision.
Instead of approx 2^64 allocations across all time before a collision, you instead have 2^40 (1 trillion per millisecond). That sounds like a poor deal, until you realise that the factor is only 2^24, or 16777216ms or under 280 minutes.
So in reality, reducing the random space and increasing bits that are guaranteed unique is actually a great trade.
Surprised the author didn't even think about the logical conclusion of his closing paragraph: "128 bits is the ideal sweet spot, collision safety effectively forever, and it happens to match the size of a UUID, which means every database, every language, and every protocol already knows how to handle it."
UUIDs are already generated randomly for exactly the same reason. Rather than inventing something new, they should have just used a UUID.
It basically makes no odds, unless you consider applying a constant AND and constant OR operator complicated - as UUID v4 is just 122 random bits and 6 bits fixed.
UUID v7 is a 48 bit timestamp, 74 random bits and 6 bits fixed. Sure, this is a little more complicated, but it's often worth it for many applications because it can be sorted, so keys will be approximately monotonically increasing.
I think UUIDv7 could make sense but I suspect the recommendations in the spec predate UUIDv7. Also, if you want sorted schemes then there are slightly more efficient schemes than UUIDv7. With UUID you are always sacrificing some bits to distinguish between the UUID types which I guess does not really matter in practice but it seems unnecessary.
Yeah, in practice the 6 bits you lose aren't important. Redoing his calculations with 122 bits and a quadrillion generated IDs (that's a million billion), the probability of a collision is 9.4 x 10^-8 (or one in 10 million) using UUID v4.
In my opinion, UUID v7 is useful because you per millisecond, you still have 74 bits split between user defined (up to 12) or randomness (minimum 62). If you choose the minimum 64 bits randomness, you can read the numbers straight from the article - 1 million UUIDs per millisecond with less than one in a million chance of collision, but you still have 10 bits to add additional data, such as which machine generated it.
If you stick with just time and have the full 74 bits of randomness, you can generate a trillion (10^12) UUIDs per millisecond with less than one in 40 billion chance of collision (2.6 x 10^-11) using UUID v7.
I think the fact the formula is (k^2/2N) actually shows that having a time component makes better use of the bits than a purely randomised space. In this example, we have a lower chance of collision with a trillion (10^12) UUIDs generated per millisecond than a quadrillion (10^15) UUIDs across all time.
BTW I realised I didn't address why those bits are necessary - actually, while it might seem you are increasing randomness with more bits and so reducing the risk of collisions, that's not necessarily true.
The old schemes generated numbers that weren't uniformly distributed across the 128-bit space as they were intentionally biased in certain ways, such as time [0] and MAC addresses [1]. This means that most of the IDs generated in previous schemes would have many bits in common, and so the UUIDs that had been generated were not uniformly distributed across that 128-bit space [2] and so if you just used the whole 128-bits for random data, but didn't use those extra bits to avoid conflicts with the previous schemes, then random IDs that happened to be valid in the previous schemes would be more likely to collide.
Of course, this only matters if the properties of globally unique matter to you. For a closed system with a guaranteed scope, sure who cares? But given that the extra randomness doesn't add any useful value beyond a certain threshold, you might as well use a UUID because you don't know what that identifier might end up being used for in the future, plus you can use off-the-shelf systems to generate them.
[0] Ironically, future proofed time fields with many bits are more likely to be non-linearly distributed - e.g. the original version 0 UUID supported timestamps from 1582AD to 5236AD but was only used from 1987 for around a decade.
[1] With certain manufacturers of network cards massively more popular than others, their MAC address prefixes showed up significantly more frequently, and there were privacy concerns were you could correlate between UUIDs generated on a single machine, and sometimes infer machines that might be on the same network because they had similar MAC addresses and so the cards were probably all from the same manufacturing batch.
[2] Which is fine within the scope of UUIDs as they are still very likely to be globally unique, so it doesn't really matter if bits are wasted in this scheme
And there’s a good reason for that, because UUIDs have additional properties. I don’t know if versioning, partial ordering, or stable references are useful for traces or not, but with UUIDs those could’ve been a possibility.
I remember getting burned by this a bit over 20 years ago trying to get on the housing ladder. At the time, the going rate for property where I was looking was around £80k, but I'd seen one wreck go for £60k and I was waiting around trying to get another for that kind of price. Unfortunately for me, I never did find that bargain, and that year saw massive property price rises, and in the end I had to pay £120k for a house that was the same as those that had been selling for £80k a year earlier. My guess is that most of the bargains were bought by people who knew how to flip properties, as a lot of the properties were a very high standard around then. Just to rub it in, after a 50% rise in a year, the prices were then pretty stagnant for the next 12-13 years. At a guess, its value has approximately doubled in the decade since the stagnant period.
Why is no software so important? If you design your board well enough, you can route the programming ports somewhere you can program it in-situ, possible with other components that also need programming.
But in terms of cost, a simple microcontroller is usually cheaper than a 555 nowadays, often doesn't require external components, and so even if all you wanted was a single function like an edge-triggered pulse, or generate a single frequency, it probably still makes sense to use a microcontroller from a board design perspective. As soon as you want anything slightly more complicated, odds are you can replace a ton of other circuitry on the board with that single chip and a small program.
Because nothing is faster and more responsive than direct hardware logic.
"a simple microcontroller is usually cheaper than a 555 nowadays, often doesn't require external components,"
Often? Every UC I've ever used has required a whole slew of caps and resistors just to get the thing to take in operative firmware through a programming port. Even the simple light flashers for vehicles that I've made using a UC and accelerometer need at least two caps and two resistors to make a proper circuit that allows for flashing info to the controller.
"so even if all you wanted was a single function like an edge-triggered pulse, or generate a single frequency, it probably still makes sense to use a microcontroller from a board design perspective."
Frequency generation? Inductor, capacitor, input voltage. Zero UC required and guaranteed to be cheaper.
"As soon as you want anything slightly more complicated, odds are you can replace a ton of other circuitry on the board with that single chip and a small program."
And accomplish things at a glacial speed that a basic hardware-only solution would've solved. As an example - BOSS pedals have basically zero latency because it is all analog. All these newer Line 6 and POD and other digital FX pedal makers have horrible latency, some I've measured past 50ms (almost as bad as trying to live-monitor a Windows Audio device.) It has been this way for the over 30 years I've been playing guitar.
Most times, raw hardware with zero software is THE way to go. Anything else is just a performance loss.
> "a simple microcontroller is usually cheaper than a 555 nowadays, often doesn't require external components,"
> Often? Every UC I've ever used has required a whole slew of caps and resistors just to get the thing to take in operative firmware through a programming port.
ATtiny for example. Many others only requiring an external capacitor, and complaining about a decoupling cap on a chip replacing a 555 that also needs an RC network to function seems rather petty.
> And accomplish things at a glacial speed that a basic hardware-only solution would've solved.
Most of these uCs operate at least 1 MHz or higher. The ATtiny85 can run at 8MHz from the internal oscillator and has an interrupt latency of 4-6 cycles. To achieve anything that's replacing something you'd do with a 555, you'd have to try incredibly hard to get latency as bad as you're describing. Perhaps they're actually doing something significantly more complicated than just replacing a 555?
Actually, this reminds me of an anti-patten I often see on websites, after they've bombarded me with cookie banners and this that and the other, you get to read about 1 paragraph of whatever it is on the page and a few seconds later a "why don't you subscribe" dialog pops up. I don't think I've ever not once just immediately cancelled and decided then and there that I will never be subscribing to whatever it is. I've not even been given a chance to read the article yet, how am I supposed to know if the quality is worth me subscribing? All I've learned so far is that the website author doesn't value my time.
> If someone opens my videoconferencing product 98% of the time it's they've got a scheduled call to join within the next 20 seconds. They're not going to be late for their meeting so they can read my release notes.
I'd go even further. If someone opens your product, they don't care about anything in your release notes as long as they are still able to join the call. Not only does nobody care about the new background effects etc. right then, they probably don't care about them at all. Maybe if someone discovers the feature and uses it, they might hunt around for it before the next meeting, but probably by the time that meeting comes around they'll be busy then as well.
More generally, most people don't care about 90% of the features of a product, just that it lets them do the one thing they need it to do, as soon as possible. If it isn't obvious how to do that one thing, making that obvious is more important than a product tour explaining it.
Even more likely: if someone's opens your product your last update probably broke their workflow. They don't need to read your release notes to know this
I think notifications after updates make more sense. You already know the software, and you are informed of some new feature that you may or may not care about. How else can a user be informed about a new feature else? Even if I do not care about the majority of features added, there is still occasionally one that I may want to use.
I often read such notes/product tours in software I already use/know, and in contrast I find it a bit stupid when they add some feature and they do not tell the users. It should not be obstructing, though. I would say the updating itself breaks the workflow more than a pop-up window or sth.
I mean the point the article makes - black plastic has different characteristics to brown and beige plastics, so they need to be developed separately - seems reasonable on the surface, but that doesn't explain why they do the "novelty" colours first. Especially from the way it's worded, it sound like they might have to redo the moulds going brown -> black but potentially not the other way round. So overall, it just seems like the whole article is just PR spin.
There's a good argument to be made that the data for reviews could be held in git repos just as easily as the source.
It can be done incredibly easily simply by having a branch per review with a known prefix (although these will rapidly clog up the default branch namespace), implemented via git namespaces to be distinct from the main namespace, or maybe just a special branch e.g. ".reviews" that just contains commit IDs for the tip of each review branch.
It just needs someone who's invested enough to specify it and make a viable implementation, after which people might start adopting it. I guess the reason github and the various forges didn't take this approach is that keeping the review metadata within their ecosystem is what gives their platform value. If anyone could use any local tool they like for reviewing other people's code, there wouldn't be as much vendor stickiness.
EDIT: actually, I guess there are other reasons why you might want your review metadata in a different repository, such as access control and/or cross-repo reviews.
There were a few efforts like that back in the day (when people still cared about offline and store-and-forward-style operation[1]), like Bugs Everywhere[3], git-appraise[4] which stored its data in Git’s little-known “notes” namespace[5], and git-bug[6] which for some reason I’ve seen mentioned quite a bit in such threads recently unlike the others—though I’m not complaining about one of them getting mentioned at least.
Also, as far as read-only access, Gerrit review data is actually accessible via Git[7] (for review ABCDE, pull refs/changes/DE/ABCDE/meta instead of one of the usual numbered refs under that prefix), and someone made the effort[8] to make it accessible via Git notes too (as mentioned in the post on Git notes that I linked above).
Also also, the Fossil SCM of SQLite fame somewhat famously does[9] do this kind of thing with its builtin bug tracker. It has been relegated to obscurity partly as an accident of history (Git won) and partly on the merits (it is aggressively hostile to the kind of history rewriting we are used to routinely—if not always wisely—performing in Git).
Going back to working on top of Git, though, I think that part of the problem is that you really want custom merge strategies when you’re trying to build a fancy datatype, and Git’s support for them requires a lot of wrapping to make it seamless (the location tracking stuff in git-annex[10] is the only success story I am aware of, and that’s a sizeable Haskell project). The existing porcelain is just too rigid.
[1] Can I have a viable replacement for PGP for that use case? Please stop telling me that I don’t exist and should screw off[2]? Please?..
Lots of good points. As for the last point, most review tools seem to be centered on tracking a branch ref over time. The actual merge strategy probably doesn't really matter as long as the tool can see that the watched reference now points to a new commit.
You can't do this physically do this yourself in the UK (AFAIK at least), but I've heard of people taking businesses to the small claims court in the UK, getting a default judgment because the company didn't bother showing up, then when the company refused to pay the settlement, they got the court to freeze their bank accounts and appoint a debt collector to recover the money.
Actually, because the birthday paradox has k^2 as a term, this is actually less true than you might think. Having a time component actually reduces the chance of collisions over the long run, albeit at a cost of reducing the number that can be safely generated in any given quantum.
If you consider a 128-bit random number, you effectively have 64 bits of allocation space before you are likely to get a collision.
If you devote 48-bits to time, which provides millisecond accuracy for 9000 years, you then have 80 bits of randomness, effectively giving 40 bits of allocation space per millisecond before you are likely to get a collision.
Instead of approx 2^64 allocations across all time before a collision, you instead have 2^40 (1 trillion per millisecond). That sounds like a poor deal, until you realise that the factor is only 2^24, or 16777216ms or under 280 minutes.
So in reality, reducing the random space and increasing bits that are guaranteed unique is actually a great trade.
reply