Their initial testing reveals that slight changes in the wording of queries can result in significantly different answers, undermining the reliability of the models.

  • davel [he/him]
    link
    fedilink
    English
    37 months ago

    Such an unfortunate title. No one who knows what they’re talking about would say that LLMs can—or ever will—reason. It’s not a flaw when something can’t do what it wasn’t designed to do.