Role of Open Source in AI-Human Collaboration

What role does open source play in AI safety? Is open source special?

There is one position which holds that strong autonomous AI systems pose risk, and that open source therefore does not really have much to do with AI safety, that on the contrary, open source and open weights etc. might be a net negative.

It might be very speculative, but I want to explore the idea that the risks of AI are more complicated than the unaligned superintelligent systems gone rogue. No doubt this idea that I will try to explore is only a small part of a bigger picture.

Let me paint a picture. Over the coming years the Pareto frontiers of models are going to steadily expand, smaller models are becoming more capable, bigger models are becoming more capable, easier to integrate in existing systems etc. As that happens, their ability to do what humans want will be limited by the understanding that they have of the context of the organisations, the systems, the preferences of humans and many more things. People have talked about a capabilities overhang, where if systems get very good, as available compute scales, society cannot quickly enough adapt to the improvements coming from scaling. But the idea of overhang could be extended or generalized. That is, it is imprecise to say capabilities overhang without saying that it is with respect to compute scaling. There can be a capability overhang with respect to preferences, or I think more insightfully, there can be an overhang with respect to collaboration.

Again, I believe it is only a smaller part of a broader picture on safety, but part of the AI safety problem is the ability for human and AI systems to collaborate. And I think open source plays an important role here. In my opinion, it is obvious that the incentives pressures are already being activated for AI companies to optimize for a certain kind of human-AI interaction, I am here especially thinking of the OpenAI’s ChatGPT Advanced Voice mode. Just in the past weeks they made an update to its “personality”, that most likely optimizes some reward signal that only OpenAI know the basis of. The risks of applying the full force of optimization toward the human reward signals is obviously risky, as social media technologies have shown. Optimizing for something that is truly collaborative, might be extremely difficult and I think open source will play a big role.

Meanwhile, a French open-source lab Kyutai has this month released Unmute.sh voice assistant based on low-latency streaming text-to-speech and speech-to-text models that allow for plugging in any LLM as the “brain” of the voice assistant. While its voice might be less seducing than OpenAI’s latest voices, their audio models are and will be open source, there is no changing of the personality, no unknown optimization, and so on. It feels refreshing and safe to talk to it, even when their demo only uses a 12B parameter Gemma model.

How does this aid human-AI collaboration. Trust is fundamental to humans, emotions are fundamnetal to humans. It might not be that there is a direct way to connect human-AI collaboration to improvements of the most radical aspects of AI safety, but one could imagine that there is a path, where as systems become more and more integrated, the main factors influencing the outcome as systems surpass human understanding, will be how much of the core of our preferences we can load on the rocket as it fires off. This is not done by applying simple minded optimization, but there might be a chance it can at least be aided by focusing on building trustworthy collaborations between humans and AIs.

What does that mean, trustworthy collaborations between humans and AIs. It at least means that trust goes both ways. For now, and at least for the foreseeable future, humans will be the most capable in the trust department, since it is central to human psychology and sociality. But it is an empirical question, social scientists should actually focus on it, take it seriously, I think the social sciences have tools and methods for this, and it seems clear that it can have some impact.

I can start by introspecting. I trust a systems like Kyutai’s Unmute.sh more because the voice I selected for it was relatively bland, and because while it was flattering sometimes, it did not feel like it was manipulating me emotionally. I think some of us have met people where you feel like they are trying to get something out of you, they somehow are too sweet or something, this might set off some alarms in our internal trust systems. It also felt refreshing to talk to a system that I new was just 12B parameters. Thought it is impressive what 12B parameter models can do these days, there is something assuring about interacting with a smaller model that is smart enough to know some things that you can learn from but not much more. Maybe a feeling that you can “fathom” the depths of the model. Maybe we can think of it like this; when we interact with some other person, if we have a feeling that they have a lot of knowledge that is “relevant” to our situation, but they are not telling it, we feel something is off, we feel unsafe, and we cannot trust. Transparency is very “correlated” with trustworthiness even if they are distinct concepts and the relation between the two concepts is more complex. So, transparency as a tool for increasing trustworthy collaboration might happen at multiple levels: For a technical user, the feeling arising from the knowledge that the text-to-speech model is trained on open data, that underlying LLM is available and there might be people out there advancing open mechanistic interpretability of the model (Gemmascope). It would be interesting if this notion that I mentioned of feeling of fathomability of a model is a thing that can be investigated experimentally.

What kind of interactions do people want with AI systems and how dependent are they on capability? Most businesses want workers that are very capable, but they might not necessarily want them to be unlimited in their knowledge and understanding of situations. It’s not obvious what the answer is - misunderstanidng of situations, lack of context etc., can cause problems. But are there cases where it is a balance between trust and capability? I am not prescribing anything about how I want human-AI collaboration to be, but if we are facing a scenario where more and more of the population feel that spending time with voice assistants is worthwhile, then what characteristics do they have: Perhaps, we as humans are not always trying to only spend time with the most intelligent and emotionally advanced people - in humans there might be (statistical) trade-offs, where some people might be more funny or chill if they are not as intelligent and goal oriented and emotionally aware as they could be. I might be wrong about that, just an idea.