

I’m hoping Arc survives all this?
I know they want to focus, but no one’s going to want their future SoCs if the GPU part sucks or is nonexistent. Heck, it’s important for servers, eventually.
Battlemage is good!
I’m hoping Arc survives all this?
I know they want to focus, but no one’s going to want their future SoCs if the GPU part sucks or is nonexistent. Heck, it’s important for servers, eventually.
Battlemage is good!
It was selectively given to institutions and “major” celebrities before that.
Selling them dilutes any meaning of “verified” because any joe can just pay for extra engagement. It’s a perverse incentive, as the people most interest in grabbing attention buy it and get amplified.
It really has little to do with Musk.
the whole concept is stupid.
+1
Being that algorithmic just makes any Twitter-like design too easy to abuse.
Again, Lemmy (and Reddit) is far from perfect, but fundamentally, grouping posts and feeds by niche is way better. It incentivizes little communities that are concerned about their own health, while users have zero control over that shouting into the Twitter maw.
Not sure where you’re going with that, but it’s a perverse incentive, just like the engagement algorithm.
Elon is a problem because he can literally force himself into everyone’s feeds, but also because he always posts polarizing/enraging things these days.
Healthy social media design/UI is all about incentivizing good, healthy communities and posts. Lemmy is not perfect, but simply not designing for engagement/profit because Lemmy is “self hosted” instead of commercial is massive.
I mean, “modest” may be too strong a word, but a 2080 TI-ish workstation is not particularly exorbitant in the research space. Especially considering the insane dataset size (years of noisy, raw space telescope data) they’re processing here.
Also that’s not always true. Some “AI” models, especially oldschool ones, function fine on old CPUs. There are also efforts (like bitnet) to get larger ones fast cheaply.
I have no idea if it has any impact on the actual results though.
Is it a PyTorch experiment? Other than maybe different default data types on CPU, the results should be the same.
That’s even overkill. A 3090 is pretty standard in the sanely priced ML research space. It’s the same architecture as the A100, so very widely supported.
5090 is actually a mixed bag because it’s too new, and support for it is hit and miss. And also because it’s ridiculously priced for a 32G card.
And most CPUs with tons of RAM are fine, depending on the workload, but the constraint is usually “does my dataset fit in RAM” more than core speed (since just waiting 2X or 4X longer is not that big a deal).
The model was run (and I think trained?) on very modest hardware:
The computer used for this paper contains an NVIDIA Quadro RTX 6000 with 22 GB of VRAM, 200 GB of RAM, and a 32-core Xeon CPU, courtesy of Caltech.
That’s a double VRAM Nvidia RTX 2080 TI + a Skylake Intel CPU, an aging circa-2018 setup. With room for a batch size of 4096, nonetheless! Though they did run into some preprocessing bottleneck in CPU/RAM.
The primary concern is the clustering step. Given the sheer magnitude of data present in the catalog, without question the task will need to be spatially divided in some way, and parallelized over potentially several machines
It’s a little too plausible, heh.
I posit the central flaw is the engagement system, which was AI driven long before LLM/diffusion was public. The slop is made because it sells in that screwed up, toxic feed.
If engagement-driven social media doesn’t die in fire, the human race will.
Excess carrier inventory they can write off as a loss since the Plus didn’t sell as well, or so I was told.
TBH most of the cost is from the individual components. The core chip fab, the memory fab, the oled screen fab, the battery, power regulation, cameras, all massive operations and very automated. Not to speak of the software stack. Or the chip R&D and tape out costs.
The child labor is awful, but it’s not the most expensive part of a $1k+ iPhone.
They’re more often subsidized by carriers here (in the US), too. I didn’t really want an iPhone, but $400 new for a Plus, with a plan discount, just makes an Android set not worth it.
OP’s being abrasive, but I sympathize with the sentiment. Bluesky is algorithmic just like Twitter.
…Dunno about Bluesky, but Lemmy feels like a political purity test to me. Like, I love Lemmy and the Fediverse, but at the same time mega upvoted posts/comments like “X person should kill themself,” explulsion of nuance in specific issues, leaks into every community and such are making me step back more and more.
Yes because ultimately, it just wasn’t good enough.
That’s what I was trying to argue below. Unified memory is great if it’s dense and fast enough, but that’s a massive if.
It’s not theoretical, it’s just math. Removing 1/3 of the bus paths, and also removing the need to constantly keep RAM powered
And here’s the kicker.
You’re supposing it’s (given the no refresh bonus) 1/3 as fast as dram, similar latency, and cheap enough per gigabyte to replace most storage. That is a tall order, and it would be incredible if it hits all three of those. I find that highly improbable.
Even dram is starting to become a bottleneck for APUs, specifically, because making the bus wide is so expensive. This applies to the very top (the MI300A) and bottom (smartphones and laptop APUs).
Optane, for reference, was a lot slower than DRAM and a lot more expensive/less dense than flash even with all the work Intel put into it and busses built into then top end CPUs for direct access. And they thought that was pretty good. It was good enough for a niche when used in conjunction with dram sticks
You are talking theoretical.
A big reason that supercomputers moved to a network of “commodity” hardware architecture is that its cost effective.
How would one build a giant unified pool of this memory? CXL, but how does it look physically? Maybe you get a lot of bandwidth in parallel, but how would it be even close to the latency of “local” DRAM busses on each node? Is that setup truly more power efficient than banks of DRAM backed by infrequently touched flash? If your particular workload needs fast random access to memory, even at scale the only advantage seems to be some fault tolerance at a huge speed cost, and if you just need bulk high latency bandwidth, flash has got you covered for cheaper.
…I really like the idea of non a nonvolatile, single pool backed by caches, especially at scale, but ultimately architectural decisions come down to economics.
How is that any better than DRAM though? It would have to be much cheaper/GB, yet reasonably faster than the top-end SLC/MLC flash Samsung sells.
Another thing I don’t get… in all the training runs I see, dataset bandwidth needs are pretty small. Like, streaming images (much less like 128K tokens of text) is a minuscule drop in the bucket compared to how long a step takes, especially with hardware decoders for decompression.
Weights are an entirely different duck, and stuff like Cerebras clusters do stream them into system memory, but they need the speed of DRAM (or better).
Yeah, it’s a solution in search of a problem.
Is it much cheaper than DRAM? Great! But until then, even if it’s lower power due to not needing the refresh, flash is just so cheap that it can scale up much better.
And I dunno what they mean by AI workloads. How would non volatility help at all, unless it’s starting to approach SRAM performance.
Some embedded stuff could use it, but that’s not a huge margin market.
Optane was sorta interesting because it was ostensibly cheaper and higher capacity than DRAM, albeit not enough.
Good!