• Renegade@infosec.pub
    link
    fedilink
    arrow-up
    2
    ·
    8 months ago

    Nothing in the article corroborated the claim in the title that human intervention made things worse, just that the problem goes deeper.

    • keepthepace@slrpnk.net
      link
      fedilink
      arrow-up
      3
      ·
      8 months ago

      The study they link though has that among their conclusions:

      Finally, we show that existing methods for alleviating racial bias in language models such as human feedback training do not mitigate the dialect prejudice, but can exacerbate the discrepancy between covert and overt stereotypes, by teaching language models to superficially conceal the racism that they maintain on a deeper level.

      It feels like they have the same problem as hallucinations: The model learns core knowledge during the bas training and is then thought to ignore/invent some more but does not acquire new knowledge.