• 0 Posts
  • 56 Comments
Joined 1 year ago
cake
Cake day: June 10th, 2023

help-circle


  • Be really interested to know what it’s made out of. Had a coworker who used to work in forgings and did some stuff that got sent to nuclear plants, they said that they had really strict requirements on material compositions, specifically needed to ensure that the (think it was steel, may have been something else) material had basically no traces of cobalt in it because the cobalt would becomes radioactive over the service life.


  • If you’re ok with some bulk, go for an nvme enclosure. I have a sabrent one with a 256 GB crucial gen 3 drive in it, it’s a slow cheap drive, still substantially better than any usb key and you can put one together for under $100 cad including a longer high speed cable.

    I just did a fresh install off of my usb key and wow, super slow compared to any time I’ve done off my enclosure







  • Could use Polars, afaik it supports streaming from CSVs too, and frankly the syntax is so much nicer than pandas coming from spark land.

    Do you need to persist? What are you doing with them? A really common pattern for analytics is landing those in something like Parquet, Delta, less frequently seen Avro or ORC and then working right off that. If they don’t change, it’s an option. 100 gigs of CSVs will take some time to write to a database depending on resources, tools, db flavour, tbf writing into a compressed format takes time too, but saves you managing databases (unless you want to, just presenting some alternates)

    Could look at a document db, again, will take time to ingest and index, but definitely another tool, I’ve touched elastic and stood up mongo before, but Solr is around and built on top of lucene which I knew elastic was but apparently so is mongo.

    Edit: searchable? I’d look into a document db, it’s quite literally what they’re meant for, all of those I mentioned are used for enterprise search.











  • I gave it a fair shake after my team members were raving about it saving time last year, I tried a SFTP function and some Terraform modules and man both of them just didn’t work. it did however do a really solid job of explaining some data operation functions I wrote, which I was really happy to see. I do try to add a detail block to my functions and be explicit with typing where appropriate so that probably helped some but yeah, was actually impressed by that. For generation though, maybe it’s better now, but I still prefer to pull up the documentation as I spent more time debugging the crap it gave me than piecing together myself.

    I’d use a llm tool for interactive documentation and reverse engineering aids though, I personally think that’s where it shines, otherwise I’m not sold on the “gen ai will somehow fix all your problems” hype train.