• 1 Post
  • 60 Comments
Joined 1 year ago
cake
Cake day: June 15th, 2023

help-circle

  • To actually read how they did it, here is there model page: https://huggingface.co/gradientai/Llama-3-8B-Instruct-Gradient-1048k

    Approach:

    • meta-llama/Meta-Llama-3-8B-Instruct as the base
    • NTK-aware interpolation [1] to initialize an optimal schedule for RoPE theta, followed by empirical RoPE theta optimization
    • Progressive training on increasing context lengths, similar to Large World Model [2] (See details below)

    Infra

    We build on top of the EasyContext Blockwise RingAttention library [3] to scalably and efficiently train on contexts up to 1048k tokens on Crusoe Energy high performance L40S cluster.

    Notably, we layered parallelism on top of Ring Attention with a custom network topology to better leverage large GPU clusters in the face of network bottlenecks from passing many KV blocks between devices. This gave us a 33x speedup in model training (compare 524k and 1048k to 65k and 262k in the table below).

    Data

    For training data, we generate long contexts by augmenting SlimPajama. We also fine-tune on a chat dataset based on UltraChat [4], following a similar recipe for data augmentation to [2].









  • I use it almost daily.

    It does produce good code. It does not reliably produce good code. I am a programmer, it makes my job 10x faster and I just have to fix a few bugs in the code it usually generates. Over time, I learned what it is good at (UI code, converting things, boilerplate) and what it struggles with (anything involving newer tech, algorithmic understanding, etc.)

    I often refer to it as my intern: It acts like an academically trained, not particularly competent, but very motivated, fast typing intern.

    But then I am also working on the field. Prompting it correctly is too often dismissed as a skill (I used to dismiss it too). It needs more understanding than people give it credit for.

    I think that like many IT tech it will go from being a dev tool to everyday tool gradually.

    All the pieces of the puzzle to be able to control a computer by voice using only natural language are there. You don’t realize how big it is. Companies haven’t assembled it yet because it is actually harder to monetize on it than code it. I think probably Apple is in the best position for it. Microsoft is going to attempt and will fail like usual and Google will probably put a half-assed attempt at it. I’ll personally go for the open source version of it.









  • I’d have a slightly different take: managing things in-house is going to be cheaper if you have a competent team to do it. The existence of the cloud as a crucial infrastructure is because it is hard to come up with competent IT and sysadmin people. The market is offer-driven now. IT staff could help the company save money on AWS hosting but it could also be used in more crucial and profitable endeavour and this is what is happening.

    I see it at the 2 organization I am working at: one is a startup which does have a single, overworked “hardware guy” who sets up the critical infra of the company. His highest priority is to maintain the machine with private information that we want to host internally for strategic reasons. We calculated that having him install a few machines for hosting our dev team data was the cheapest but after 3 months of wait, we opted out for a more expensive, but immediately available, cloud option. We could have hired a second one but our HR department is already having a hard time finding candidates for out crucial missions.

    On the non-profits I am working on, there is a strong openness/open-hardware spirit. Yet I am basically the only IT guy there. I often joke they should ditch their Microsoft, Office and Google based tools, and I could help them do it, but I prefer to work on the actual open hardware research projects they are funding. And I think I am right in my priorities.

    So yes, the Cloud is overpriced, but it is a convenience. Know what you pay for, know you could save money there and it may at some point be reasonable to do so. In the end that’s a resource allocation problem: human time vs money.


  • Heh, when you can’t compete on the performance anymore, compete on the openness.

    I don’t like calling it “NSFW”, the correct term is “uncensored”. The problem is not that it can’t generate titties, the problem is that it seriously limits its intelligence to limit it to “socially acceptable” answers. That was one of the points of Fahrenheit 451: censorship is a one way street. You can’t go back once you start squashing “fringe” opinions purely on the ground it may shock people.


  • What do you guys think about this?

    Not enough information to know if this is a good or a bad idea.

    A tool that does bodycam -> written reports automatically may improve things. One that simply fakes a lot of details so that it looks like a well fleshed report is a terrible idea.

    Generally speaking, the less human subjectivity intervenes in law enforcement, the better off we are. Yet companies and police always somehow find a way to turn good ideas into terrible implementation. I do hope it is for the best, but it could as well increase accountability as it could mass-manufacture lies.