Hacker Newsnew | past | comments | ask | show | jobs | submit | jasongill's commentslogin

While in an SSH session, press enter, then type tilde and capital C (enter ~C) and you can add command line options to the current session. To add a port forward from your local 8080 to the remote port 80 without closing the connection, do:

  enter ~C -L 8080:localhost:80

That is a neat trick. Added to the list.

(Ultimately unhelpful though because I use mosh everywhere these days and that doesn't appear to have anything fancy like this.)


Thanks. This could really benefit from a TUI!

Maybe it's just still too early in the morning yet, but what is the significance of hitting enter first?

SSH expects the escape sequence (tilde) to be the first character on a new line; since backspace is sent as a character, you can't just backspace over something you've started typing and then press tilde to have it recognized.

Technically, you don't have to press enter if you've not typed anything (try it in a new SSH session - as soon as you are logged in, type ~? to get the SSH help output), but since the comment was about doing this during an active session without ending it, I figured noting that pressing enter first to be sure you're on a new line wouldn't hurt


yes, most of them look like USRobotics Courier modems. Note that not all the machines have one, and some have two.

Assuming that the parent commenter is right and that they are using internal line cards, I wonder if the external modems were being added to support higher speeds.

However, the fact that we can see at least 2 (but I think four) 66 blocks means they had 50 to 100 phone lines for the machines visible, which would make sense that the external modems are the primary connection and no internal modems are being used, based on the number of modems visible and the fact that each 66 block can handle 25 lines.


I think you're right and that there were only two modems connected to the boxes so that's just the built in serial ports, here is another copy of the same picture by someone that apparently funded the board with some details:

https://x.com/ScottApogee/status/1593729387106512896


This almost confirms it then, each machine has an external modem tied to 1 phone line per modem, and there are no internal modems in use. The picture shows 50 modems that I can see, and the original article indicates that it's around half of their total setup. Scott notes that they had a T3 (likely a frac T3) with 140 dial-in nodes, which aligns with the articles guess of 134 machines.

So I would say that almost definitely, they are using 1 (or 2 for some on the right side of the photo) external modem per PC connected to the 66 block, those analog phone blocks tied back to the channel bank/multiplexer, and the carrier's T3 tied in there.

No internal modems used at all.

And the person who posted the photo on twitter is none other than Scott Miller, founder of Apogee Software, publisher of some of the most revolutionary games of the late 80's and early 90's, and this BBS (Software Creations) was the cornerstone of distributing the shareware versions of those games. Very cool bit of history, I remember dialing in to Software Creations to download Commander Keen!


I'm surprised that Cloudflare hasn't started hosting a pre-scraped version of websites that use Cloudflare's proxy - something like https://www.example.com/cdn-cgi/cached-contents.json They already have the website content in their cache, so why not just cut out the middle man of scraping services and API's like this and publish it?

Obviously there's good reasons NOT to, but I am surprised they haven't started offering it (as an "on-by-default" option, naturally) yet.


Well, the conversion process into the JSON representation is going to take CPU, and then you have to store the result, in essence doubling your cache footprint.

Doing it on demand still utilizes their cached version, so it saves a trip to the origin, but doesn’t require doubling the cache size. They can still cache the results if the same site is scraped multiple times, but this saves having to cache things that are never going to be requested.

Cache footprint management is a huge factor in the cost and performance for a CDN, you want to get the most out of your storage and you want to serve as many pages from cache as possible.

I know in my experience working for a CDN, we were doing all sorts of things to try to maximize the hit rate for our cache.. in fact, one of the easiest and most effective techniques for increasing cache hit rate is to do the OPPOSITE of what you are suggesting; instead of pre-caching content, you do ‘second hit caching’, where you only store a copy in the cache if a piece of content is requested a second time. The idea is that a lot of content is requested only once by one user, and then never again, so it is a waste to store it in the cache. If you wait until it is requested a second time before you cache it, you avoid those single use pages going into your cache, and don’t hurt overall performance that much, because the content that is most useful to cache is requested a lot, and you only have to make one extra origin request.


> Doing it on demand still utilizes their cached version, so it saves a trip to the origin, but doesn’t require doubling the cache size. They can still cache the results if the same site is scraped multiple times, but this saves having to cache things that are never going to be requested.

Isn't this solving a slightly, but very significantly different problem?

You could serve the very same data in two different ways: One to present to the users and one to hand over to scrapers. Of course, some sites would be too difficult or costly to transform into a common underlying cache format, but people who WANT their sides accessible to scrapers could easily help the process along a bit or serve their site in the necessary format in the first place.

But the key is:

A tool using a "pre-scraped" version of a site has very likely very different requirements of how a CDN caches this site. And this could be easily customizable by those using this endpoint.

Want a free version? Ok, give us the list of all the sites you want, then come back in 10min and grab everything in one go, the data will be kept ready for 60s. Got an API token? 10 free near-real-time request for you and they'll recharge at a rate of 2 per hour. Want to play nice? Ask the CDN to have the requested content ready in 3 hours. Got deep pockets? Pay for just as many real-real-time requests as you need.

What makes this so different is that unless customers are willing to hand over a lot of money, you dont need to cache anything to serve requests at all. Potentially not even later if you got enough capacity to serve the data for scheduled requests from the storage network directly.

You just generate an immediate promise response to the request telling them to come back later. And depending on what you put into that promise, you've got quite a lot of control over the schedule yourself.

- Got a "within 10min" request but your storage network has plenty if capacity in 30s? Just tell them to come back in 30s.

- A customer is pushing new data into your network around 10am and many bots are interested in getting their hands on it as soon as possible, making requests for 10am to 10:05? Just bundle their requests.

- Expected data still not around at 10:05? Unless the bots set an "immediate" flag (or whatever) indicating that they want whatever state the site is in right now, just reply with a second promise when they come back. And a third if necessary... and so on.


Not the same thing, but they have something close (it's not on-by-default, yet) [1]:

> Cloudflare's network now supports real-time content conversion at the source, for enabled zones using content negotiation headers. Now when AI systems request pages from any website that uses Cloudflare and has Markdown for Agents enabled, they can express the preference for text/markdown in the request. Our network will automatically and efficiently convert the HTML to markdown, when possible, on the fly.

[1] https://blog.cloudflare.com/markdown-for-agents/


Interesting - its sounds like this could be combined with some creative cache parsing on their side to provide this feature to sites that want it.

so... we will get the reader moder with one header set in a browser?

> I'm surprised that Cloudflare hasn't started hosting a pre-scraped version of websites that use Cloudflare's proxy

It's entirely possible that they're doing this under the hood for cases where they can clearly identify the content they have cached is public.


How would they know the content hasn’t changed without hitting the website?

They wouldn't, well there's Etag and alike but it still a round trip on level 7 to the origin. However the pattern generally is to say when the content is good to in the Response headers, and cache on that duration, for an example a bitcoin pricing aggregator might say good for 60 seconds (with disclaimers on page that this isn't market data), whilst My Little Town news might say that an article is good for an hour (to allow Updates) and the homepage is good for 5 minutes to allow breaking news article to not appear too far behind.

Keeping track of when content changes is literally the primary function of a CDN.

Caching headers?

(Which, on Akamai, are by default ignored!)


Based on the post, it seems likely that they'd just delay per the robots.txt policy no matter what, and do a full browser render of the cached page to get the content. Probably overkill for lots and lots of sites. An HTML fetch + readability is really cheap.

It’s a bit more complicated than that. This is their product Browser Rendering, which runs a real browser that loads the page and executes JavaScript. It’s a bit more involved than a simple curl scraping.

So does that mean it can replace serpapi or similar?

That would prolly work for simple sites, but you still need the dedicated scraping service with a browser to render sites that are more complex (i.e. SPAs)

Offering wholesale cache dumps blows up every assumption about origin privacy and copyright. Suddenly you are one toggle away from someone else automatically harvesting and reselling your work with Cloudflare as the unwitting middle tier.

You could try to gate this behind access controls but at that point you have reinvented a clunky bespoke CDN API that no site owner asked for, plus a fresh legal mess. Static file caches work because they only ever respond to the original request, not because they claim to own or index your content.

It is a short path from "helpful pre-scraped JSON" to handing an entire site to an AI scraper-for-hire with zero friction. The incentives do not line up unless you think every domain on Cloudflare wants their content wholesale exported by default.


I think Common Crawl already offers this, although it's free: https://commoncrawl.org/

That was my first thought when I read the headline. It would make perfect sense, and would allow some websites to have best of both worlds: broadcasting content without being crushed by bots. (Not all sites want to broadcast, but many do).

This makes a lot of sense. Cloudflare already has the rendered content at edge — serving a structured snapshot from cache would eliminate redundant crawling entirely.

What I'd love to see is site owners being able to opt in and control the format. Something like a /cdn-cgi/structured endpoint that respects your robots.txt directives but gives crawlers clean markdown or JSON instead of making them parse raw HTML. The site owner wins (less bot traffic), the crawler wins (structured data), and Cloudflare wins (less load on origin).


But think about poor phishers and malware devs protected by Cloudflare.

The parallel port (at least in it's later implementations) actually supports DMA - I'm sure that data exfiltration via the parallel port is hard, but probably not impossible...

Nothing is safe, unfortunately!


The parallel port controller can DMA, that lets the driver tell the port hey, send this buffer out to the port and let me know or read this many bytes into this buffer and let me know. It's not peripheral controlled DMA like with firewire or PCI.

You can absolutely exfiltrate data via the parallel port... that's why you attach printers or zip disks... it's just that it needs host participation.


It’s almost certainly impossible on modern systems. The southbridge which allowed DMA to parallel port was absorbed into PCH and slowly stripped of legacy LCP support by chipset and motherboard manufacturers.

It supports Apple Intelligence, all 8gb iPhones and iPads support Apple Intelligence and the promo materials for this Macbook Neo say it supports Apple Intelligence as well.


They were referring to their M1 not being able to support Apple Intelligence.


8 GB M1 MacBook Air does support Apple Intelligence.


Yes it does. I was clarifying what the commenter was saying; not making his statement myself.

akmarinov said their M1 doesn't support apple intelligence but they still think it's plenty usable; jasongill thought akmarinov was referring to the Neo and responded that the Macbook Neo does in fact support Apple intelligence; and I clarified what I think akmarinov intended to say.


correct, I thought he meant that the Neo does not support it, since his M1 Macbook does support Apple Intelligence but perhaps he's not aware of that or hasn't updated yet.


You are correct i indeed thought that you need at least 16 GB to support Apple Intelligence and I thought Neo doesn’t support it.

I stand corrected.


WaPo is reporting that OpenAI and xAI already agreed to the Pentagon's "any lawful use" clause, aka, mass surveillance and fully autonomous killbots. From the WaPo article https://archive.is/yz6JA#selection-435.42-435.355

> Officials say other leading AI firms have gone along with the demand. OpenAI, the maker of ChatGPT, Google and Elon Musk’s xAI have agreed to allow the Pentagon to use their systems for “all lawful purposes” on unclassified networks, a Defense official said, and are working on agreements for classified networks.

The only difference is simply that Anthropic is already approved for use on classified networks, whereas Grok and OpenAI are not yet (but are being fast-tracked for approval, especially Grok). Edit: Note someone below pointed out that OpenAI may be approved for Secret level, so it's odd that Washington Post reports that they are working on it still.


OpenAI is usable through Azure for Government up to IL-6.

https://devblogs.microsoft.com/azuregov/azure-openai-authori...

Either Anthropic is seen as the clear leader (it certainly is for coding agents) or this is a political stunt to stamp out any opposition to the administration. Or both.


> fully autonomous killbots

I keep hearing this but it should be plainly obvious to everyone (at least here) that an LLM is not the right AI for this use case. That's like trying to use chatgpt for an airplane autopilot, it doesn't make sense. Other ML models may but not an LLM. Why does the "autonomous killbot" thing keep getting brought up when discussing Anthropic and other llm providers?

For reference, "autonomous killbots" are in use right now in the Ukraine/Russia war and they run on fpv drones, not acres of GPUs. Also, it should be obvious that there's a >90% probability every predator/reaper drone has had an autonomous kill mode for probably a decade now. Maybe it's never been used in warfare, that we know of, but to think it doesn't exist already is bonkers.


It wouldn't make sense to have the LLM try to do the target recognition, trajectory planning, or motor control. It might make sense to have the LLM at a higher level handling monitoring of systems and coordination with other instances, to provide more flexibility to react to novel situations than rules bases systems.


It's almost a silly distinction since ML has been used in weapons for quite a while. For example: Javelin missiles have automatic target recognition, cruise missles have intelligent terrain following, long range drones use algos like SLAM for guidance.


I just searched "new hospital opened in CA" on Google and see that there were two new hospitals opened in Irvine in December, half of a new hospital complex in Santa Clara opened in October, more being built and slated to open this year or next...


Now look up when those projects were started...I will wait.


Hospitals always take long time, both are non-profit and had to raise ton of money. They are both large multi-building complexes. And I think the UCI one is a trauma center (even more complexity) to deal with the fact that the previous (UCI) trauma center no longer meets earthquake standards.


Isn't fewer than 1000 infected animals in an area that covers 6 countries pretty good? Obviously there's still work to do, but I would have expected hundreds of thousands or millions of animal cases if it was an epidemic


Few cases is good, but if there are any, the whole machinery of surveillance, treatment, and education has to be in place. As soon as we reach 0 cases for a certain amount of time, all those resources can be redirected to other neglected tropical diseases that haven't been wiped out, like onchocerciasis, loa loa, yaws, lymphatic filariasis, trachoma, and all the others. Yaws in particular is a good candidate for eradication.


Are they just the known infected animals or is it an estimation of the total count of infected animals?


I hate this hacker crap!


Why are people downvoting this? It's just a quote from the movie...


I have a collection of pop culture prop items and this is definitely going on my ebay alerts list, would be cool to have on the wall of the garage... thank you for posting!


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: