Hacker Newsnew | past | comments | ask | show | jobs | submit | more gtrubetskoy's commentslogin

Subinterpreters existed from the very early days in the C API and were key to the implementation of mod_python (which I wrote). So if you used mod_python, you used subinterpeters without realizing it.

http://modpython.org/live/current/doc-html/pythonapi.html#mu...

EDIT: And it looks like I had subinterpreters in the first released version in May 2000, so the initial git (formerly SVN) commit already had them https://github.com/grisha/mod_python/blob/9b211b7e8a65f1af4b...

EDIT2: Just noticed this comment:

  * Nov 1998 - support for multiple interpreters introduced.


How did you deal with C-extensions, since apparently most don't support it at all (which is a shame, apparently we messed up culturally here).


I didn't :)


One problem with this article is the number of times the solution involves COUNT(DISTINCT).

One of the best SQL interview questions is "Explain what is wrong with DISTINCT and how to work around it".


What is wrong with DISTINCT?


DISTINCT generally requires the results to be sorted which has O(n^2) worst performance so it can have a big performance hit on a query. It is best to make your database structure such that queries only return distinct data. E.g. by disallowing duplicates


If your sorting algorithm degrades to anywhere near O(n^2) in pathological cases, you're doing something wrong. And even if it's just a kind of timeout/operations-limit to detect pathological cases and just run an in-place mergesort instead. Tail latency/containing pathological data is quite important if there's any interactivity.


Nearly every time, it's a symptom of bad data normalization.

But every time, it interferes badly with any kind of locking (that's DBMS dependent, of course), and imposes a high performance penalty (on every DBMS).


“Think before you DISTINCT”


In order to determine the distinct items, the items need to be deduplicated. Generally that's done in only two ways: a hash table that skips items already seen, or a sort followed by a scan that skips over duplicates. The hash table is O(1), but the sort is easier to make parallel without sharing mutable state and has more established algorithms to use when spilling to disk.


There is a third way: keep the data pre-sorted in the database (via an index).


It covers up bad queries, so you may not see an underlying data duplication problem.

Often better to group explicitly so you know what's actually going on.


> It covers up bad queries,

Bingo. I used to work with a guy who would see duplicate results and just throw a distinct on his query. I had to keep on him to fix his queries or explain why distinct was correct in this case. My default is that distinct is almost always not the solution.


seriously.. I'm building out some functionality using plpgsql and have used it. This is going to be haunting my dreams


there is possible performance hit [1]. Also, it could mean that the data granularity has not been modeled well if there are duplicate rows.

https://sqlperformance.com/2017/01/t-sql-queries/surprises-a...


Huh? If you have a table with attributeid, sampleid and value how would you count how many samples have a value in any attribute? Exists subquery?


In your example you must also have another table, 'sample' with all the samples. So yes, you would use an exists or in subquery with the table you suggested.


Github was cool when git was new years back - but these days, and especially given how git inherently is not centralized, it is not very clear to me why we all cling to github. With a little work, all that it offers can be done without any help of a centralized server/corporation.


I've spent a lot of time trying to understand PoW and came to the conclusion that is a distributed clock of sorts, described here https://grisha.org/blog/2018/01/23/explaining-proof-of-work/


Pretty much, yes. It's kind of spelled out in the Nakamoto paper. From the introduction:

" In this paper, we propose a solution to the double-spending problem using a peer-to-peer distributed timestamp server to generate computational proof of the chronological order of transactions."

Everything else in Bitcoin is just turning that timestamp server into a practical(ish) system.



Yes, that's why non PoW distributed ledger like the XRPL work essentially the same by only sorting Tx by time and then use a federated byzantine agreement to "filter out" Tx that did not propagate trough the network in a specific time and thus cant be put in correct order. These Tx will be added to the next ledger(block) instead which isn't a problem if block times are just seconds.


Well, it relies on a synchronized clock, so it can't provide a clock. PoW adjusts the difficulty based on the time to try to meet some difficulty target.

In fact, if the time of nodes is not synchronized, it can cause significant problems and vulnerabilities. If time is too fast, the difficulty adjustment algorithm will think it mined too few blocks and decrease the difficulty.


Those aren’t really “significant problems and vulnerabilities”: any given node can lie about what time it is, but you’re not trusting a particular node for more than the outcome of a single contiguous block—and block difficulty “velocity” is capped—so you’d need a Sybil attack to actually walk the difficulty down. Otherwise, even at 49% malicious nodes, consensus is just going to bounce between nodes that say the time was really short, and nodes that give “regular” timestamps, keeping the difficulty roughly constant within the network’s margin of error.

Really, the timestamp field in most PoW systems’ “block” structs (Bitcoin’s, Ethereum’s, etc.) is just defined as “a number that is higher than the one in the parent block, and not so high that when interpreted as a POSIX timestamp it would land 30+ seconds in the future relative to the local node’s time.” So you just need >50% of the nodes to have a ±30s clock sync in order to agree on which blocks are valid for consideration; and even if you don’t have that level of synch, those blocks will still become valid eventually, once they’re old enough that all the nodes do consider them to be in the past. (And most PoW systems keep around near-“future” blocks until they’re valid for just such a case.)


The timing aspect is an important part of PoW, but it's not the entire purpose. The Bitcoin whitepaper itself goes on to say, "The proof-of-work also solves the problem of determining representation in majority decision making." That is, PoW also solves the problem of deciding which consensus rules to enforce, not just when.


  RecursionError: maximum recursion depth exceeded


Must not have been tail call recursion.


Github goes down on Monday around 9am pacific time - must be totally random.


The funny thing about this is the idea that west coast engineers actually start work at or before 9am


I'm rarely in before 9AM. The 10 minute standup at 10:30 AM is the only daily requirement for most staff.


I bet a bunch of PR's were built up from the weekend which weren't deployed, and some guy who came in at 9 decided to deploy them and broke things. Always scary to be the one to deploy if no one has deployed in a while.


Any compact disk works as a spectrometer.

https://www.cs.cmu.edu/~zhuxj/astro/html/spectrometer.html


I've never seen this. Thanks!


I once submitted a blog post [1] and later received an email from someone at HN saying that it was a great article but didn't do so well and if I re-submit it, they will make sure it does better, so I did and it went to the top.

[1] - https://news.ycombinator.com/item?id=16862077


Would be interesting to see the correspondence, but don't feel obligated.


They do it frequently. I've had it happen several times. The specific text isn't particularly exciting. They just give you a link to resubmit if you're interested. You also get an extra upvote upon submission (and I imagine there is more of a bump behind the scenes).


What does HN feel about this? Is it curation from the staff or is it selective manipulation? On first blush, I'm for it... but I'd be interested to see what others think.


>Is it curation from the staff or is it selective manipulation?

Curation is selective manipulation. Whether it's positive or negative correlates to whether you personally agree with it.


Yes. I generally find HN works ie surfaces interesting stuff. I judge HN by what not how. The how moan is often “they don’t like my stuff” sniff sniff


I like it. Hn is still worth checking a few times every day. Whatever the staff is doing, it feels like they keep the quality relatively stable.


at first thought I'd be against it. Because they essentially bump what they think is good, and it does not (necessarily) reflect the community. Sometimes things slip through at HN though so giving a nudge to resubmit sounds like a good idea.

Tbh, I don't really mind either way, I've enjoyed most content on here.


We bump what we think the community might like and is aligned with the site guidelines. And only to the lower half of the front page, whence it soon falls away if we guessed wrong. The posts aren't necessarily what we ourselves like or think is good; mostly we don't have time to decide on that.


Fair enough. As I mentioned I am pretty happy with the content on here. Also happy with the moderations so thank you for that! :)


You have time to decide what to bump, but not to decide what you like? Makes no sense.


They explicitly said they bump what they think the community might like. It doesn’t take a thorough reading of every submission to make that guess.


It does, unless you want to create an echo chamber (which HN somewhat is)


What would make HN less of an echo chamber?


I don't think it can be solved by moderation, if that makes you feel better


It makes me feel worse. But I'd like to hear why you say that HN somewhat is an echo chamber, and what you think would make it less of one, or where you might look for examples of non-echo-chambers on the internet. You're welcome to reply here, or email hn@ycombinator.com.

It's possible that you've observed something that's not on our radar, and such information is important for us to be aware of, even if we can't solve it by moderation.


Do you have any answer to the question?


Im aware. I’m skeptical of what Dang said that it should take more time to decide what they like.


I’ve upvoted opinion pieces that I disagree with, but I wanted to see what the community thought.


You have time to decide if the community might like it but not if yourselves do? Bit weird.


If you do something hundreds of times a day for years, you get fast at it. Also, don't forget I said "might". These are quick, approximate guesses.


I guess a good community needs constant upkeep, like anything else in life. The community was created over what a few people thought was good content, and if left alone will probably get dissolved in some "large reddit" effect.


Interesting thing to know would be - whether HN staff does not allow the stories that they disagree with, to get popular.


I've had that happen a couple of times. The email is fairly bland - it looked like the sort of copy you'd write for an automated system. It includes a link to this comment by way of explanation:

https://news.ycombinator.com/item?id=11662380


the way it works is described in this HN post https://news.ycombinator.com/item?id=11662380 I've had about a dozen of them, I think probably because I'm in a time zone that gets buried more than others?

on edit: maybe it's not exactly the same thing because I don't have to resubmit when I get these, they are just put in the second chance pool.


The same happened to me just a couple of days ago. I thought it was a very nice thing for the quality of the content that goes to the top (not just because it was my submission, of course).


that has happened to me too


FWIW - my notes on how locking was done in Thredis: https://github.com/grisha/thredis/blob/master/README-THREDIS


Thanks! I did look at thredis originally, but the github seemed to not have been maintained at the time.


Heh, I did this back in 2012: http://thredis.org/ (Not only is it threaded, it also supports SQL).


And Alibaba did it a couple years ago with Pedis! https://github.com/fastio/pedis (looks like the project is still active, but I don't really know of anyone else who uses it.)


Good name too.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: