• Zrybew@lemmy.world
    link
    fedilink
    English
    arrow-up
    3
    arrow-down
    1
    ·
    6 months ago

    LLMs performance are getting closer to plateau due to lack of data easily available. OpenAi is going around trying to license some data, but it won’t be enough.

    The company with more touch points with users is better positioned to transform these into Data Probes. Msft has windows, Apple has iOS and Google… Well Google is fucked because the other two have OS level access and can restrict what Google collects.

    Now that LLM Foundation models are out, the game will be “who can get the most data” to retrain, optimise and ultimately monetise these models. And there’s another whole “can of worms” with the legality of training models with unlicensed data collected trough “system snapshots”. I.e.: Collecting NY Times data through windows snapshots of users that visit the site.