• 0 Posts
  • 7 Comments
Joined 2 years ago
cake
Cake day: July 3rd, 2023

help-circle
  • As a software engineer who started programming when he was 11, I get what you mean about “ladder climbers” feeling alien (my elitist term for them is “9-to-5ers” or “pedestrians”).

    However, I think this question is dumb at least so far as it won’t work to weed out the people you think it will. I don’t read fiction often, and the only scifi books I remember reading are Dune and Prey, but that’s very out of character for me. It’s pretty much luck that I read those, and more a factor of me just being an old fart (I’m almost 30, and that’s a lot of time to stumble upon at least one scifi book). Ask me this question a few years earlier and I’d draw a blank.

    Both were good books, but nothing that would consider a “favorite”. Dune is memorable to me just because it very clearly was based on Lawrence of Arabia, which I found neat. As for Prey, I only vaguely remember something about killer nanomachines, and that it was a fun read.

    But if you’re specifically looking to hire someone you can talk scifi novels with, then it’s a very good question (as long as you’re mature enough to hire someone who says their favorite book is one that you hate).


  • gamer@lemm.eetoAsklemmy@lemmy.mlWhy would'nt this work?
    link
    fedilink
    arrow-up
    8
    arrow-down
    1
    ·
    3 days ago

    This doesn’t account for blinking.

    If your friend blinks, they won’t see the light, and thus would be unable to verify whether the method works or not.

    But how does he know when to open his eyes? He can’t keep them open forever. Say you flash the light once, and that’s his signal to keep his eyes open. Okay, but how long do you wait before starting the experiment? If you do it immediately, he may not have enough time to react. If you wait too long, his eyes will dry out and he’ll blink.

    This is just not going to work. There are too many dependent variables.


  • gamer@lemm.eetoAsklemmy@lemmy.mlSuperbowl sadness
    link
    fedilink
    arrow-up
    15
    arrow-down
    1
    ·
    3 days ago

    I’m seeing people say that the broadcaster (Fox Sports, of course) injected cheers into the broadcast for Trump, and boos for Taylor Swift. I don’t want to spread misinfo though so does anyone know if it’s true, or if there’s a way to validate it? (Eg by analyzing the audio)



  • 96 GB+ of RAM is relatively easy, but for LLM inference you want VRAM. You can achieve that on a consumer PC by using multiple GPUs, although performance will not be as good as having a single GPU with 96GB of VRAM. Swapping out to RAM during inference slows it down a lot.

    On archs with unified memory (like Apple’s latest machines), the CPU and GPU share memory, so you could actually find a system with very high memory directly accessible to the GPU. Mac Pros can be configured with up to 192GB of memory, although I doubt it’d be worth it as the GPU probably isn’t powerful enough.

    Also, the 83GB number I gave was with a hypothetical 1 bit quantization of Deepseek R1, which (if it’s even possible) would probably be really shitty, maybe even shittier than Llama 7B.

    but how can one enter TB zone?

    Data centers use NVLink to connect multiple Nvidia GPUs. Idk what the limits are, but you use it to combine multiple GPUs to pool resources much more efficiently and at a much larger scale than would be possible on consumer hardware. A single Nvidia H200 GPU has 141 GB of VRAM, so you could link them up to build some monster data centers.

    Nivida also sells prebuilt machines like the HGX B200 which can have 1.4TB of memory in a single system. That’s less than the 2.6TB for unquantized deepseek, but for inference only applications, you could definitely quantize it enough to fit within that limit with little to no quality loss… so if you’re really interested and really rich, you could probably buy one of those for your home lab.


  • If all you care about is response times, you can easily do that by just using a smaller model. The quality of responses will be poor though, and it’s not feasible to self host a model like chatgpt on consumer hardware.

    For some quick math, a small Llama model is 7 billion parameters. Unquantized that’s 4 bytes per parameter (32 bit floats), meaning it requires 28 billion bytes (28 gb) of memory. You can get that to fit in less memory with quantization, basically reducing quality for lower memory usage (use less than 32 bits per param, reducing both precision and memory usage)

    Inference performance will still vary a lot depending on your hardware, even if you manage to fit it all in VRAM. A 5090 will be faster than an iPhone, obviously.

    … But with a model competitive with ChatGPT, like Deepseek R1 we’re talking about 671 billion parameters. Even if you quantize down to a useless 1 bit per param, that’d be over 83gb of memory just to fit the model in memory (unquantized it’s ~2.6TB). Running inference over that many parameters would require serious compute too, much more than a 5090 could handle. This gets into specialized high end architectures to achieve that performance, and it’s not something a typical prosumer would be able to build (or afford).

    So the TL; DR is no