• Digit@lemmy.wtf
    link
    fedilink
    English
    arrow-up
    4
    ·
    4 天前

    AI-generated code contains more bugs and errors than human output

    Yeah. No shit. I used an LLM’s “help” to make fin.

    It got me reading and debugging more than 10 times the [bad] code, per day, than I had in the entire prior 10 years of using fish. [And reading the documentation way more too, learning a lot.]

    … more bugs and errors than human output

    However, it’s not necessarily a bad thing, with AI improving efficiency across the initial stages of code generation.

    Oh but it’s so effortless. HA! Debugging takes a lot more effort. And then still have to just re-write it all yourself any way.

    Still, it’s a good learning experience.

    Dear AI,

    Thanks for being so shit.

    Taught me a lot.

  • termaxima@slrpnk.net
    link
    fedilink
    English
    arrow-up
    15
    ·
    7 天前

    ChatGPT is great at generating a one line example use of a function. I would never trust its output any further than that.

    • diabetic_porcupine@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      2
      ·
      7 天前

      So much this. People who say ai can’t write code are just using it wrong. You need to break things down to bite size problems and just let it autocomplete a few lines at a time. Increase your productivity like 200%. And don’t get me started about not having to search through a bunch of garbage google results to find the documentation I’m actually looking for.

      • termaxima@slrpnk.net
        link
        fedilink
        English
        arrow-up
        2
        ·
        2 天前

        Personally I only do the “not search through garbage google results” part (especially now that it’s clogged up with AI articles that don’t even answer the question)

        ChatGPT is great for that, I never have to spend 15 minutes searching up what’s the function called to do X thing.

        I really recommend to set the answers to be as brief and terse as possible. The base settings of a sycophant that generates a full article for every question are super annoying when you’re doing actual work.

      • Lifter@discuss.tchncs.de
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 天前

        Not 200 %. Maybe 5-10 %. You still have to read all of it to check for mistakes, which may sometimes take longer than if you would have just written it yourself (with a good autocomplete). The times it makes a mistake you have lost time by using it.

        It’s even worse when it just doesn’t work. I cannot even describe how frustrating it is to wait for an auto complete that never comes. Erase the line, try again aaaand nothing. After a few tries you opt write the code manually instead, having wasted time just fiddling with buggy software.

        • termaxima@slrpnk.net
          link
          fedilink
          English
          arrow-up
          1
          ·
          2 天前

          Agree with this. I personally don’t use any sort of autocomplete whatsoever. When I have a question for the AI, I ask it, then I type the code from what I learnt.

          Don’t make the mistake of delegating work. Make the AI teach you what it knows.

        • toddestan@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          6 天前

          I don’t know about ChatGPT, but Github Copilot can act like an autocomplete. Or you can think of it as a fancier Intellisense. You still have to watch its output as it can make mistakes or hallucinate library function calls and things like that, but it can also be quite good at anticipating what I was going to write and saves me some keystrokes. I’ve also found I can prompt it in a way by writing a comment and it’ll follow up with attempt to fill in code based upon that comment. I’ve certainly found it to be a net time saver.

        • diabetic_porcupine@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          7 天前

          Well not quite - I use ChatGPT more like to brainstorm ideas and sometimes I’ll paste a whole file or two into the prompt and ask what’s wrong and tell it the issue I’m seeing, it usually gives me the correct answer right away or after clarifying once or twice.

          I use copilot for tab completion. Sometimes it finishes a line or two sometimes more. Usually it’s good code if it’s able to read your existing codebase as a reference. bonus points for using an MCP.

          Warp terminal for intensive workflows. It’s integrated into your machine and can do whatever like implementing CICD scripts, executing commands, ssh into remote servers set up your infrastructure etc… I’ll use this when I really need the ai to understand my code base as a whole before providing any code or executing commands.

  • nutsack@lemmy.dbzer0.com
    link
    fedilink
    English
    arrow-up
    16
    ·
    7 天前

    this is expected, isn’t it? You shit fart code from your ass, doing it as fast as you can, and then whoever buys out the company has to rewrite it. or they fire everyone to increase the theoretical margins and sell it again immediately

    • 🍉 Albert 🍉@lemmy.world
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      3
      ·
      7 天前

      As a computer science experiment, making a program that can beat the Turing test is a monumental step in progress.

      However as a productive tool it is useless in practically everything it is implemented on. It is incapable of performing the very basic “Sanity check” that is important in programming.

      • robobrain@programming.dev
        link
        fedilink
        English
        arrow-up
        9
        ·
        7 天前

        The Turing test says more about the side administering the test than the side trying to pass it

        Just because something can mimic text sufficiently enough to trick someone else doesn’t mean it is capable of anything more than that

        • 🍉 Albert 🍉@lemmy.world
          link
          fedilink
          English
          arrow-up
          2
          ·
          7 天前

          We can argue about it’s nuances. same with the Chinese room thought experiment.

          However, we can’t deny that it the Turing test, is no longer a thought exercise but a real test that can be passed under parameters most people would consider fair.

          I thought a computer passing the Turing test would have more fanfare, about the morality if that problem, because the usual conclusion of that thought experiment was “if you cant tell the difference, is there one?”, but now it has become “Shove it everywhere!!!”.

          • M0oP0o@mander.xyz
            link
            fedilink
            English
            arrow-up
            5
            ·
            7 天前

            Oh, I just realized that the whole ai bubble is just the whole “everything is a dildo if you are brave enough.”

            • 🍉 Albert 🍉@lemmy.world
              link
              fedilink
              English
              arrow-up
              3
              ·
              7 天前

              yhea, and “everything is a nail if all you got is a hammer”.

              there are some uses for that kind of AI, but very limiting. less robotic voice assisants, content moderation, data analysis, quantification of text. the closest thing to Generative use should be to improve auto complete and spell checking (maybe, I’m still not sure on those ones)

                • 🍉 Albert 🍉@lemmy.world
                  link
                  fedilink
                  English
                  arrow-up
                  2
                  ·
                  edit-2
                  7 天前

                  In theory, I can imagine an LLM fine tuned on whatever you type. which might be slightly better then the current ones.

                  emphasis on the might.

        • 🍉 Albert 🍉@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 天前

          Time for a Turing 2.0?

          If you spend a lifetime with a bot wife and were unable to tell that she was AI, is there a difference?

      • iglou@programming.dev
        link
        fedilink
        English
        arrow-up
        2
        ·
        7 天前

        The Turing test becomes absolutely useless when the product is developed with the goal of beating the Turing test.

        • 🍉 Albert 🍉@lemmy.world
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 天前

          it was also meant as a philosophical test, but also, a practical one, because now. I have absolutely no way to know if you are a human or not.

          But it did pass it, and it raised the bar. but they are still useless at any generative task

  • Tigeroovy@lemmy.ca
    link
    fedilink
    English
    arrow-up
    14
    arrow-down
    1
    ·
    7 天前

    And then it takes human coders way longer to figure out what’s wrong to fix than it would if they just wrote it themselves.

  • HugeNerd@lemmy.ca
    link
    fedilink
    English
    arrow-up
    7
    arrow-down
    2
    ·
    7 天前

    Hey don’t worry, just get a faster CPU with even more cores and maybe a terabyte or three of RAM to hold all the new layers of abstraction and cruft to fix all that!

  • BilSabab@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    ·
    6 天前

    what’s funny is that this was predicted to be that way even before AI-generated code became an option. Hell, I remember doing an assessment back in early 2023 and literally every domain expert i talked with said this thing - it has its use, but purely supplemental and you won’t use it on some fundamental because the clean-up will take more time than was preserved. Counterproductive is the word.

  • antihumanitarian@lemmy.world
    link
    fedilink
    English
    arrow-up
    9
    arrow-down
    5
    ·
    7 天前

    So this article is basically a puff piece for Code Rabbit, a company that sells AI code review tooling/services. They studied 470 merge/pull requests, 320 AI and 150 human control. They don’t specify what projects, which model, or when, at least without signing up to get their full “white paper”. For all that’s said this could be GPT 4 from 2024.

    I’m a professional developer, and currently by volume I’m confident latest models, Claude 4.5 Opus, GPT 5.2, Gemini 3 Pro, are able to write better, cleaner code than me. They still need high level and architectural guidance, and sometimes overt intervention, but on average they can do it better, faster, and cheaper than me.

    A lot of articles and forums posts like this feel like cope. I’m not happy about it, but pretending it’s not happening isn’t gonna keep me employed.

    Source of the article: https://www.coderabbit.ai/blog/state-of-ai-vs-human-code-generation-report

    • iglou@programming.dev
      link
      fedilink
      English
      arrow-up
      11
      arrow-down
      1
      ·
      7 天前

      I am a professional software engineer, and my experience is the complete opposite. It does it faster and cheaper, yes, but also noticeably worse, and having to proofread the output, fix and refactor ends up taking more time than I would have taken writing it myself.

      • antihumanitarian@lemmy.world
        link
        fedilink
        English
        arrow-up
        1
        ·
        6 天前

        A later commenter mentioned an AI version of TDD, and I lean heavy into that. I structure the process so it’s explicit what observable outcomes need to work before it returns, and it needs to actually test to validate they work. Cause otherwise yeah I’ve had them fail so hard they report total success when the program can’t even compile.

        The setup I use that’s helped a lot of shortcomings is thorough design, development, and technical docs, Claude Code with Claude 4.5 Sonnet them Opus, with search and other web tools. Brownfield designs and off the shelf components help a lot, keeping in mind quality is dependent on tasks being in distribution.

      • GenosseFlosse@feddit.org
        link
        fedilink
        English
        arrow-up
        2
        arrow-down
        1
        ·
        7 天前

        In web development it’s impossible to remember all functions, parameters, syntax and quirks for PHP, HTML, JavaScript, jQuery, vue.js, CSS and whatever else code exists in this legacy project. AI really helps when you can divide your tasks into smaller steps and functions and describe exactly what you need, and have a rough idea how the resulting code should work. If something looks funky I can ask to explain or use some other way to do the same thing.

        • iglou@programming.dev
          link
          fedilink
          English
          arrow-up
          2
          arrow-down
          1
          ·
          6 天前

          And now instead of understanding the functions, parameters, syntax and quirks yourself, to be able to produce quality code, which is the job of a software engineer, you ask an LLM to spit out code that seem to be working, do that again, and again, and again, and call it a day.

          And then I’ll be hired to fix it.

    • hark@lemmy.world
      link
      fedilink
      English
      arrow-up
      3
      ·
      6 天前

      I’m a professional developer, and currently by volume I’m confident latest models, Claude 4.5 Opus, GPT 5.2, Gemini 3 Pro, are able to write better, cleaner code than me.

      I have also used the latest models and found that I’ve had to make extensive changes to clean up the mess it produces, even when it functions correctly it’s often inefficient, poorly laid out, and is inconsistent and sloppy in style. Am I just bad at prompting it or is your code just that terrible?

      • antihumanitarian@lemmy.world
        link
        fedilink
        English
        arrow-up
        2
        ·
        6 天前

        The vast majority of my experience was Claude Code with Sonnet 4.5 now Opus 4.5. I usually have detailed design documents going in, have it follow TDD, and use very brownfield designs and/or off the shelf components. Some of em I call glue apps since they mostly connect very well covered patterns. Giving them access to search engines, webpage to markdown, in general the ability to do everything within their docker sandbox is also critical, especially with newer libraries.

        So on further reflection, I’ve tuned the process to avoid what they’re bad at and lean into what they’re good at.

    • 🍉 Albert 🍉@lemmy.world
      link
      fedilink
      English
      arrow-up
      5
      ·
      7 天前

      Do not ask a corpse for advice, the question is what are we going to do?

      Boycott is a good first step, although I am not sure if it is better to boycott them or use their free tier to have the most deranged BS conversation that will consume their resources, eat at their scare cash reserves and when they use it in training, it will poison their data.

  • Katzelle3@lemmy.world
    link
    fedilink
    English
    arrow-up
    179
    arrow-down
    1
    ·
    8 天前

    Almost as if it was made to simulate human output but without the ability to scrutinize itself.

    • mushroommunk@lemmy.today
      link
      fedilink
      English
      arrow-up
      90
      arrow-down
      6
      ·
      8 天前

      To be fair most humans don’t scrutinize themselves either.

      (Fuck AI though. Planet burning trash)

          • Sophienomenal@lemmy.blahaj.zone
            link
            fedilink
            English
            arrow-up
            5
            ·
            8 天前

            I do this with texts/DMs, but I’d never do that with an email. I double or triple check everything, make sure my formatting is good, and that the email itself is complete. I’ll DM someone 4 or 5 times in 30 seconds though, it feels like a completely different medium ¯\_(ツ)_/¯

      • FauxLiving@lemmy.world
        link
        fedilink
        English
        arrow-up
        16
        arrow-down
        8
        ·
        8 天前

        (Fuck AI though. Planet burning trash)

        It’s humans burning the planet, not the spicy Linear Algebra.

        Blaming AI for burning the planet is like blaming crack for robbing your house.

        • BassTurd@lemmy.world
          link
          fedilink
          English
          arrow-up
          14
          ·
          8 天前

          Blaming AI is in general criticising everything encompassing it, which includes how bad data centers are for the environment. It’s like also recognizing that the crack the crackhead smoked before robbing your house is also bad.

        • Rhoeri@lemmy.world
          link
          fedilink
          English
          arrow-up
          8
          arrow-down
          4
          ·
          8 天前

          How about I blame the humans that use and promote AI. The humans that defend it in arguments using stupid analogies to soften the damage it causes?

          Would that make more sense?

        • KubeRoot@discuss.tchncs.de
          link
          fedilink
          English
          arrow-up
          1
          ·
          7 天前

          Blaming AI for burning the planet is like blaming guns for killing children in schools, it’s people we should be banning!