Hacker Newsnew | past | comments | ask | show | jobs | submit | Gladdyu's commentslogin

The difference is that software compilation is a fairly local optimization process - allowing you to change/inline one bit of generated code without significantly affecting the performance of the rest of the generated code - address space is cheap. On the other hand, especially when the fpga approaches capacity, one change could cause a significant portion of the design to have to be rerouted as space is limited causing significant performance differences or not meeting the timing requirements anymore.


> The difference is that software compilation is a fairly local optimization process - allowing you to change/inline one bit of generated code without significantly affecting the performance of the rest of the generated code

Not really.

The uop cache, and L1 code cache, of modern chips is rather small. You can often grossly increase performance locally by loop-unrolling, but if that causes the "hot path" to no longer fit in uop-cache (or L1 cache), then you've lost a chunk of global performance vs a small local-gain in performance.

Global vs local optimization is just a tough subject in general. Even on CPUs (which is probably easier than FPGAs)


OP wrote "a fairly local optimzation". Compared to what a placement algorithm must do for an FPGA, your example is still exactly that, and one that's relatively easy to get right with a few heuristics.


Yep. But to me that only changes what the problem is that the toolchain solves. It doesn't affect the social and economic pressures on how the relevant toolchains evolve.

To a certain (admittedly limited) degree the "limited space" problem with FPGAs maps intriguingly to the "limited memory" conditions that early LISP and FORTH compilers spawned in.


More to the point the market for FGPAs is fundamentally limited by some factors.

If you are making 10,000+ units the economics are overwhelmingly in favor of ASIC over FPGA.

(In particularly if you are the first mover that proves the market with an FPGA you should have started an ASIC design in parallel to it because the second and third movers will have ASIC from day one and a cost structure 10x or 20x better than you!)

This keeps FPGA a niche market because it serves niche markets.


I'm not sure whether their fees are really that low. For transfers between your own accounts, for anything over very small amounts, I found that their 0.35% variable fee is pretty large compared to adding the funds to a brokerage account and then doing the conversion there (ie. IB charges 0.002% on currency trades, with a min of $2), and transferring back out into the desired currency.


Well, for starters most transfers I and my friends have had to make are not to our own accounts. There's also the time issue - I have an IB account, and love it, but most of the time I'm using Transferwise because I'm in a different country and paying someone, or because I need to move money between my own accounts quickly. Transferwise is literally instant for about half the transfers they do, which is incredible. Meanwhile with IBKR I'm usually out at least three business days for any USD funds to show up in IBKR, plus whatever time to wait on the foreign currency side. Then there's also the UI - sometimes I need someone else to transfer the money, or IBKR's login system is locking me out, etc etc, and in those cases Transferwise always wins. Or if I'm transferring under 25k and I need to do it outside of market hours - in that case my transaction is an odd lot at IBKR, getting me (oftentimes) bad rates, and I might as well use Transferwise.

Don't get me wrong, I like IBKR and I do use it, but Transferwise is definitely more useful 90% of the time.


Yeah, you're definitely paying a premium for them handling smaller quantities, end-to-end, in one easy tool. IBKR requires knowledge of Forex trading, but is hands down the cheapest and fastest solution - for those that are able to work it. To be fair, that describes basically all of IBKR.

It's still mind-blowing to think IBKR charges $20 per million, when the airport Travelex will charge you $20 per hundred lol.


Could you explain how this works, or point me in the right direction on what to read? I'm interested in not using Transferwise as much if this is better.


https://www.interactivebrokers.com/en/index.php?f=759

You'd open up a trading account, deposit funds from one country to it (generally free in EU/UK, can be done with a simple bank transfer, though can take 1-3 business days). Rather than the common use case of a brokerage account to buy stocks/invest, you place an order into their IdealPro order book they run to buy the currency you want (might have to jump through 2 trades, they don't run every single pair). Generally this is tailored at larger investors, so anything under $25k is subject to some additional fees due to being a small size (https://ibkr.info/node/1459), but still significantly less than the 0.35% TransferWise charges. The costs for you are commission (<0.2bps), potentially small size fee (1-2 bps), and crossing the bid/ask spread, which is ~0 for currencies). When that order has been filled you now have the desired currency in your account and you can transfer it back out (potentially to a different country or different account).


In some countries you'd have to include it in your tax reports of your are using this method. An additional thing to take care of


Your gain in delta (due to the stock moving) will be offset by your loss in vega (due to the volatility coming off after the impending crash). Option fair value is a function of both underlying price and volatility, which is currently at unprecedentedly high values.


It does load it - mmap doesn't copy the file content into a buffer, it merely allows you to operate on a file as if it were in memory. Memory reads correspond to file read operations.


Sort of. mmap absolutely copies the file contents into the kernel file system cache which is a buffer, it just lets you map the filesystem cache into your address space so you can see it. And memory reads don't translate to file reads unless not in the cache already.


> mmap absolutely copies the file contents into the kernel file system cache which is a buffer

Isn't this a bit misleading? mmaping a file doesn't cause the kernel to start loading the whole thing into RAM, it just sets things up for the kernel to later transparently load pages of it on demand, possibly with some prefetching.


The GP's comment on the other hand seemed to imply it was completely unbuffered.


"Completely unbuffered" is almost unattainable in practice, so I'm not sure that's a reasonable inference. About the best you can do in general is not do any buffering yourself, and usually explicitly bypass whatever buffering is going on at the next level of abstraction down. Ensuring you've cut out all buffering in the entire IO stack takes real effort.


> "Completely unbuffered" is almost unattainable in practice, so I'm not sure that's a reasonable inference.

While absolutely true, I've found that fact to be very surprising to a lot of engineers.


But I think the important part is that the file starts on disk and ends parsed. The rate of that was NVME limited (per article).


> The rate of that was NVME limited (per article).

The article shows that he's getting half the throughput of parsing a CSV that's already in RAM. But: he's using RAID0 of two SSDs and only getting a little more than half the throughput of one of those SSDs. As currently written, this program might not be giving the SSDs a high enough queue depth to hit their full read throughput. I'd like to see what throughput is like with an explicit attempt to prefetch data into RAM (either with a thread manually touching all the necessary pages, or maybe with a madvise call). That could drastically reduce the number of page faults and context switches affecting the OpenMP worker threads, and yield much better CPU utilization.


I thought queue depth related to supporting outstanding/pending reads. For a serial access such as csv parsing, what would you do other than having a readahead - somehow, see my other question - which would presumably maintain the queue depth at about 1.

Put another way, what would you do to read in the CSV serially to increase speed that would push the queue depth above 1?


For sequential accesses, it usually doesn't make a whole lot of difference whether the drive's queue is full of lots of medium-sized requests (eg. 128kB) or a few giant requests (multiple MB), so long as the queue always has outstanding requests for enough data to keep the drive(s) busy. Every operating system will have its own preferred IO sizes for prefetching, and if you're lucky you can also tune the size of the prefetch window (either in terms of bytes, or in terms of number of IOs). Different drives will also have different requirements here to achieve maximum throughput; an enterprise drive that stripes data across 16 channels will probably need a bigger/deeper queue than a consumer drive with just 4 channels, if the NAND page size is the same for both.

However, optimal utilization of the drive(s) will always require a queue depth of more than one request, because you don't want the drive to be idle after signalling completion of its only queued command and waiting for the CPU to produce a new read request. In a RAID0 setup like the author describes, you need to also ensure that you're generating enough IO to keep both drives busy, and the minimum prefetch window size that can accomplish this will usually be at least one full stripe.

As for how you accomplish the prefetching: the madvise system call sounds like a good choice, with the MADV_SEQUENTIAL or MADV_WILLNEED options. But how much prefetching that actually causes is up to the OS and the local system's settings. On my system, /sys/block/$DISK/queue/read_ahead_kb defaults to 128, which is definitely insufficient for at least some drives but might only apply to read-ahead triggered by the filesystem's heuristics rather than more explicitly requested by a madvise. So manually touching pages from a userspace thread is probably the safer way to guarantee the OS pages in data ahead of time—as long as it doesn't run so far ahead of the actual use of the data that it creates memory pressure that might get unused pages evicted.


WILLNEED really just reads the entire file asynchronously into the page cache, at least on Linux.


Is that still true if the size_t parameter to the madvise call is less than the entire file size? I would think that madvise hints could be issued at page granularity and not affect the entire mapping as originally allocated.


With no limit? What if the file is huge, will it evict other things in the cache?


Yes, probably. With hindsight, it is probably a mistake to use mmap. I probably can do better to just read file myself, since I have to make a mirror buffer later for some data manipulation anyway.


That makes sense, thanks!


Well, it copies into the kernel buffer as you access it as a sort of demand paging that isn’t actually all that bad depending on what you’re doing. It’s dramatically different from a typical “read everything into a buffer” that most programs do.


General question: if mmap pulls in data as you ask it and not before, you're going to have CPU waits on the disk, followed by processing on the CPU but no disk activity, alternating back and forth. I'd assume that to be optimal is to have them both working at once, so to have some kind of readahead request for the disk. How is this done, if at all?

Edit: just seen this which kind of touches on the same https://news.ycombinator.com/item?id=24737186


Generally the OS should see if you’re doing a long sequential access and prefetch this data before you access it.


Not sure if you know how mmap works, but regardless you can't say that memory reads correspond to file reads.

There is literally no io being done on your data access paths. Synchronising mapped pages with file contents happens in background write back threads.


That'd be special relativity for you - nothing ever exceeds the speed of light, hence a corrective factor needs to be applied.

https://en.wikipedia.org/wiki/Special_relativity


As described in the blog post, the rust compiler generates a state machine for every async function and generates the appropriate poll methods for the state transition. This is fundamentally at odds with the preemption, which would then have to indroduce new intermediate states into the diagram which it won't be able to do at runtime.


But if Rust is the operating system/kernel, whenever it decides to to schedule something is preemption for anything downstream, right?

I mean, you don't actually use preemption in the kernel right? Don't you have to handle all that yourself, since there's nothing higher level than you to handle it for you? In that respect, doesn't plugging in a Futures runtime that looks for device flags and stuff as appropriate and flags tasks to wake up/finish accomplish the same thing? (those are actual questions, not leading statements)


If you would write a basic scheduler, at some point you'd have to await the userspace code but you wouldn't have any way to force it to stop running. If the userspace code would enter an infinite loop it would hold the kernel thread forever. Within a constrained environment, eg. the kernel itself (and even that's sufficiently complex with loadable drivers that you might end up with bad interactions) I could see some use for async await, but you'd still need to be able to preempt untrusted code.


As far as I understand the scheduler and userspace processes are going to be completely orthogonal features to this. In general would it be complex[1] in this case to integrate both preemptive and cooperative multitasking?

[1] as in: http://journal.stuffwithstuff.com/2015/02/01/what-color-is-y...


Intel tried that with Itanium. It didn't go well.


Also keep the delay slots of MIPS in the back of your mind, which with the evolution of MIPS became a burden.


They’re a burden to work with regardless of whether they match how pipelines work.


The issue was that there wasn't a cheap riscv board that supported the privileged section of the ISA (so you can run an OS on it). This, being a microcontroller, doesn't either.


The Reference Manual claims "Machine (M) and User (U) Privilege levels support" on page 22.

Probably not the kind of privileges discussed yesterday.


yup, for linux you want M/S/U modes at least (kernel runs in S) and page table table support of course


I was under the impression you could get away with using Machine mode for both the SBI and the Kernel?

Sv39 paging is the major requirement as you point out.


Machine mode runs without the MMU, linux really needs an MMU to do kernel stuff too (including stuff like copyin/copyout)


There is actually quite a bit of work going on in Linux without MMU. Check Linux Plummers conference RISC-V track for example.


Yeah that's why I said "really needs", there are cut down versions that will run without an MMU, it's not really the mainline Linux though - I've worked with MMUless kernels in the past, it's not a lot of fun (and I started porting V6/V7 for base and bounds swapping machines)


Ah, I didn't actually know this, thanks for pointing it out :)


Not Google being good - Irish law has been changed such that it's no longer permitted.

"The legislation passed in Ireland in 2015 ends the use of the tax scheme for new tax plans. However, companies with established structures can continue to benefit from the old system until 2020."

https://www.investopedia.com/terms/d/double-irish-with-a-dut...


> Due largely to international pressure and the publicity surrounding the use of the double Irish with a Dutch sandwich, the Irish finance minister passed measures to close the loopholes in the 2015 budget.

The EU ordered them to collect taxes from Apple after it came out that Apple ran an optimized double Irish with special exceptions (private rulings) granted by the Irish tax office. This law change is the result of them getting caught violating European trade agreements that predate the EU itself.


> Not Google being good

That goes without saying. What does need to be said is that anti-Google implications should not be taken from this statement.

1. The prior and new behavior is neither good nor bad.

2. That only Google does this is, and should be singled out, is a bit of pitchfork-ism (to be fair, from the article, not your comment). Lots of companies do this and lots of companies will need to stop doing it.

https://www.nytimes.com/2017/11/06/world/apple-taxes-jersey....


> 1. The prior and new behavior is neither good nor bad.

That depends from which point of view you're looking at it. Technically according to the law? All good. Ethically? Different people will have different opinions, but many would agree that it's bad behaviour. You're a law abiding leech on society, but you're still a leech.


You are still arguing good or bad! I say, it's completely amoral.

Google (and the scores of others doing this) are smart, clever, and correct for taking advantage of loopholes like this. If it is in fact legal but unethical, it's easily addressed by Congress. Expecting companies to self-regulate their tax burden is foolish. Google, Apple and the rest are not "public benefit companies" with the declared benefit of maximizing their tax burden.

'Leech' as well is quite a strong word, most unjustified IMHO.

ps. It's not Google/Apple/etc themselves that find these loopholes; it's outsourced specialists.


If "being good" means intentionally paying more tax than legally obligated, I don't think any corporation fits the bill.


What if "being good" means "no legal arbitrage" and "no jurisdiction shopping"? What if it means "not really, really going out of your way to find ways to pay less tax, including by buying legislation"?


what if it means no more taxes to corrupt governments?


> means intentionally paying more tax than legally obligated

You have an unusual definition of “legally obligated” if you feel they were 100% following the law doing this. Sure, it used to be legal in Ireland, but intentionally moving IP to claim revenue in a different tax jurisdiction than where the money was earned was a legal grey area at best.


It was explicitly, intentionally legal. Ireland enacted these laws so companies like google would move IP and high paying jobs to Ireland. And the Irish economy benefitted from it.

You can’t have a fair discussion of the ethics of this scenario without considering the “tax havens” themselves. Shouldn’t a sovereign country like Ireland have the right to enact laws that benefit its economy? For small jurisdictions like the caymans, what looks like a “tax haven” to outsiders is actually a huge boon to their economy and objectively good for their citizens.


I didn’t claim that it was not legal in Ireland, as that’s not the point. Google (and others like Apple) are multi-national companies which means they might not be following the letter, let alone the intent, of the law in other jurisdictions in which they do business by using this loophole. It’s definitively a legal grey area and everyone skilled in international tax law I’ve read on the topic agree. So saying it’s flat out legal everywhere is glossing over important considerations, and at best comes off naive.


Unfortunately true exponential growth of a physical system cannot exist - it'll always be some sort of logistic model that looks exponential near x=0.


I’m intrigued. Why cant true exponential growth of a physical system exist?

Do you mean it can’t be maintained forever? I can think of lots of physical processes that have exponential behavior for some amount of time.


Consider applying for YC's Summer 2026 batch! Applications are open till May 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: