More

gtrubetskoy · on May 14, 2020

Subinterpreters existed from the very early days in the C API and were key to the implementation of mod_python (which I wrote). So if you used mod_python, you used subinterpeters without realizing it.

http://modpython.org/live/current/doc-html/pythonapi.html#mu...

EDIT: And it looks like I had subinterpreters in the first released version in May 2000, so the initial git (formerly SVN) commit already had them https://github.com/grisha/mod_python/blob/9b211b7e8a65f1af4b...

EDIT2: Just noticed this comment:

  * Nov 1998 - support for multiple interpreters introduced.

BiteCode_dev · on May 14, 2020

How did you deal with C-extensions, since apparently most don't support it at all (which is a shame, apparently we messed up culturally here).

gtrubetskoy · on May 14, 2020

I didn't :)

gtrubetskoy · on May 2, 2020

One problem with this article is the number of times the solution involves COUNT(DISTINCT).

One of the best SQL interview questions is "Explain what is wrong with DISTINCT and how to work around it".

0az · on May 2, 2020

What is wrong with DISTINCT?

barbegal · on May 2, 2020

DISTINCT generally requires the results to be sorted which has O(n^2) worst performance so it can have a big performance hit on a query. It is best to make your database structure such that queries only return distinct data. E.g. by disallowing duplicates

namibj · on May 3, 2020

If your sorting algorithm degrades to anywhere near O(n^2) in pathological cases, you're doing something wrong. And even if it's just a kind of timeout/operations-limit to detect pathological cases and just run an in-place mergesort instead. Tail latency/containing pathological data is quite important if there's any interactivity.

marcosdumay · on May 2, 2020

Nearly every time, it's a symptom of bad data normalization.

But every time, it interferes badly with any kind of locking (that's DBMS dependent, of course), and imposes a high performance penalty (on every DBMS).

brightbeige · on May 2, 2020

“Think before you DISTINCT”

barrkel · on May 2, 2020

In order to determine the distinct items, the items need to be deduplicated. Generally that's done in only two ways: a hash table that skips items already seen, or a sort followed by a scan that skips over duplicates. The hash table is O(1), but the sort is easier to make parallel without sharing mutable state and has more established algorithms to use when spilling to disk.

branko_d · on May 3, 2020

There is a third way: keep the data pre-sorted in the database (via an index).

GlennS · on May 3, 2020

It covers up bad queries, so you may not see an underlying data duplication problem.

Often better to group explicitly so you know what's actually going on.

matwood · on May 3, 2020

> It covers up bad queries,

Bingo. I used to work with a guy who would see duplicate results and just throw a distinct on his query. I had to keep on him to fix his queries or explain why distinct was correct in this case. My default is that distinct is almost always not the solution.

cultofmetatron · on May 2, 2020

seriously.. I'm building out some functionality using plpgsql and have used it. This is going to be haunting my dreams

wesd · on May 2, 2020

there is possible performance hit [1]. Also, it could mean that the data granularity has not been modeled well if there are duplicate rows.

https://sqlperformance.com/2017/01/t-sql-queries/surprises-a...

yread · on May 3, 2020

Huh? If you have a table with attributeid, sampleid and value how would you count how many samples have a value in any attribute? Exists subquery?

matwood · on May 3, 2020

In your example you must also have another table, 'sample' with all the samples. So yes, you would use an exists or in subquery with the table you suggested.

gtrubetskoy · on March 19, 2020

Github was cool when git was new years back - but these days, and especially given how git inherently is not centralized, it is not very clear to me why we all cling to github. With a little work, all that it offers can be done without any help of a centralized server/corporation.

gtrubetskoy · on Nov 14, 2019

I've spent a lot of time trying to understand PoW and came to the conclusion that is a distributed clock of sorts, described here https://grisha.org/blog/2018/01/23/explaining-proof-of-work/

pdpi · on Nov 14, 2019

Pretty much, yes. It's kind of spelled out in the Nakamoto paper. From the introduction:

" In this paper, we propose a solution to the double-spending problem using a peer-to-peer distributed timestamp server to generate computational proof of the chronological order of transactions."

Everything else in Bitcoin is just turning that timestamp server into a practical(ish) system.

blueprint · on Nov 14, 2019

https://opentimestamps.org

noxer · on Nov 14, 2019

Yes, that's why non PoW distributed ledger like the XRPL work essentially the same by only sorting Tx by time and then use a federated byzantine agreement to "filter out" Tx that did not propagate trough the network in a specific time and thus cant be put in correct order. These Tx will be added to the next ledger(block) instead which isn't a problem if block times are just seconds.

jmeyer2k · on Nov 14, 2019

Well, it relies on a synchronized clock, so it can't provide a clock. PoW adjusts the difficulty based on the time to try to meet some difficulty target.

In fact, if the time of nodes is not synchronized, it can cause significant problems and vulnerabilities. If time is too fast, the difficulty adjustment algorithm will think it mined too few blocks and decrease the difficulty.

derefr · on Nov 14, 2019

Those aren’t really “significant problems and vulnerabilities”: any given node can lie about what time it is, but you’re not trusting a particular node for more than the outcome of a single contiguous block—and block difficulty “velocity” is capped—so you’d need a Sybil attack to actually walk the difficulty down. Otherwise, even at 49% malicious nodes, consensus is just going to bounce between nodes that say the time was really short, and nodes that give “regular” timestamps, keeping the difficulty roughly constant within the network’s margin of error.

Really, the timestamp field in most PoW systems’ “block” structs (Bitcoin’s, Ethereum’s, etc.) is just defined as “a number that is higher than the one in the parent block, and not so high that when interpreted as a POSIX timestamp it would land 30+ seconds in the future relative to the local node’s time.” So you just need >50% of the nodes to have a ±30s clock sync in order to agree on which blocks are valid for consideration; and even if you don’t have that level of synch, those blocks will still become valid eventually, once they’re old enough that all the nodes do consider them to be in the past. (And most PoW systems keep around near-“future” blocks until they’re valid for just such a case.)

nemo1618 · on Nov 14, 2019

The timing aspect is an important part of PoW, but it's not the entire purpose. The Bitcoin whitepaper itself goes on to say, "The proof-of-work also solves the problem of determining representation in majority decision making." That is, PoW also solves the problem of deciding which consensus rules to enforce, not just when.

gtrubetskoy · on Oct 3, 2019

  RecursionError: maximum recursion depth exceeded

CameronBarre · on Oct 3, 2019

Must not have been tail call recursion.

gtrubetskoy · on July 22, 2019

Github goes down on Monday around 9am pacific time - must be totally random.

matthewowen · on July 22, 2019

The funny thing about this is the idea that west coast engineers actually start work at or before 9am

InvaderFizz · on July 22, 2019

I'm rarely in before 9AM. The 10 minute standup at 10:30 AM is the only daily requirement for most staff.

GhostVII · on July 22, 2019

I bet a bunch of PR's were built up from the weekend which weren't deployed, and some guy who came in at 9 decided to deploy them and broke things. Always scary to be the one to deploy if no one has deployed in a while.

gtrubetskoy · on May 15, 2019

Any compact disk works as a spectrometer.

https://www.cs.cmu.edu/~zhuxj/astro/html/spectrometer.html

NIL8 · on May 16, 2019

I've never seen this. Thanks!

gtrubetskoy · on May 9, 2019

I once submitted a blog post [1] and later received an email from someone at HN saying that it was a great article but didn't do so well and if I re-submit it, they will make sure it does better, so I did and it went to the top.

[1] - https://news.ycombinator.com/item?id=16862077

bryant · on May 9, 2019

Would be interesting to see the correspondence, but don't feel obligated.

adenadel · on May 9, 2019

They do it frequently. I've had it happen several times. The specific text isn't particularly exciting. They just give you a link to resubmit if you're interested. You also get an extra upvote upon submission (and I imagine there is more of a bump behind the scenes).

x2f10 · on May 9, 2019

What does HN feel about this? Is it curation from the staff or is it selective manipulation? On first blush, I'm for it... but I'd be interested to see what others think.

krapp · on May 9, 2019

>Is it curation from the staff or is it selective manipulation?

Curation is selective manipulation. Whether it's positive or negative correlates to whether you personally agree with it.

bernardlunn · on May 10, 2019

Yes. I generally find HN works ie surfaces interesting stuff. I judge HN by what not how. The how moan is often “they don’t like my stuff” sniff sniff

1337biz · on May 9, 2019

I like it. Hn is still worth checking a few times every day. Whatever the staff is doing, it feels like they keep the quality relatively stable.

Insanity · on May 9, 2019

at first thought I'd be against it. Because they essentially bump what they think is good, and it does not (necessarily) reflect the community. Sometimes things slip through at HN though so giving a nudge to resubmit sounds like a good idea.

Tbh, I don't really mind either way, I've enjoyed most content on here.

dang · on May 10, 2019

We bump what we think the community might like and is aligned with the site guidelines. And only to the lower half of the front page, whence it soon falls away if we guessed wrong. The posts aren't necessarily what we ourselves like or think is good; mostly we don't have time to decide on that.

Insanity · on May 10, 2019

Fair enough. As I mentioned I am pretty happy with the content on here. Also happy with the moderations so thank you for that! :)

untilHellbanned · on May 10, 2019

You have time to decide what to bump, but not to decide what you like? Makes no sense.

_dczq · on May 10, 2019

They explicitly said they bump what they think the community might like. It doesn’t take a thorough reading of every submission to make that guess.

hammock · on May 10, 2019

It does, unless you want to create an echo chamber (which HN somewhat is)

dang · on May 11, 2019

What would make HN less of an echo chamber?

hammock · on May 11, 2019

I don't think it can be solved by moderation, if that makes you feel better

dang · on May 12, 2019

It makes me feel worse. But I'd like to hear why you say that HN somewhat is an echo chamber, and what you think would make it less of one, or where you might look for examples of non-echo-chambers on the internet. You're welcome to reply here, or email hn@ycombinator.com.

It's possible that you've observed something that's not on our radar, and such information is important for us to be aware of, even if we can't solve it by moderation.

_dczq · on May 12, 2019

Do you have any answer to the question?

untilHellbanned · on May 13, 2019

Im aware. I’m skeptical of what Dang said that it should take more time to decide what they like.

scarface74 · on May 10, 2019

I’ve upvoted opinion pieces that I disagree with, but I wanted to see what the community thought.

vasco · on May 10, 2019

You have time to decide if the community might like it but not if yourselves do? Bit weird.

dang · on May 10, 2019

If you do something hundreds of times a day for years, you get fast at it. Also, don't forget I said "might". These are quick, approximate guesses.

marcosdumay · on May 10, 2019

I guess a good community needs constant upkeep, like anything else in life. The community was created over what a few people thought was good content, and if left alone will probably get dissolved in some "large reddit" effect.

qazpot · on May 10, 2019

Interesting thing to know would be - whether HN staff does not allow the stories that they disagree with, to get popular.

twic · on May 9, 2019

I've had that happen a couple of times. The email is fairly bland - it looked like the sort of copy you'd write for an automated system. It includes a link to this comment by way of explanation:

https://news.ycombinator.com/item?id=11662380

bryanrasmussen · on May 10, 2019

the way it works is described in this HN post https://news.ycombinator.com/item?id=11662380 I've had about a dozen of them, I think probably because I'm in a time zone that gets buried more than others?

on edit: maybe it's not exactly the same thing because I don't have to resubmit when I get these, they are just put in the second chance pool.

simonebrunozzi · on May 10, 2019

The same happened to me just a couple of days ago. I thought it was a very nice thing for the quality of the content that goes to the top (not just because it was my submission, of course).

misiti3780 · on May 9, 2019

that has happened to me too

gtrubetskoy · on March 12, 2019

FWIW - my notes on how locking was done in Thredis: https://github.com/grisha/thredis/blob/master/README-THREDIS

jdsully · on March 12, 2019

Thanks! I did look at thredis originally, but the github seemed to not have been maintained at the time.

gtrubetskoy · on March 12, 2019

Heh, I did this back in 2012: http://thredis.org/ (Not only is it threaded, it also supports SQL).

biggestdummy · on March 14, 2019

And Alibaba did it a couple years ago with Pedis! https://github.com/fastio/pedis (looks like the project is still active, but I don't really know of anyone else who uses it.)

jayd16 · on March 13, 2019

Good name too.