How did they make 8B model better than GPT 4o? MiniCPM-o deep dive

MiniCPM-o is a very interesting model, although it has only 8B parameters i surpasses GPT4o on multiple benchmarks connected to multimodal analysis: audio, images and video. It is also fully open-source.
In this video we are going to take a closer look at the benchmarks, discuss internal components of the models: image encoder, text encoder, llm and voice synthesizer.
Follow us:
https://www.linkedin.com/company/thelionai
https://www.linkedin.com/in/aleksander-obuchowski/
[00:00:00] Intro
[00:01:41] Benchmarks
[00:09:20] Architecture
[00:12:09] Vision Encoder
[00:19:56] Audio Encoder
[00:24:00] LLM
[00:28:16] Voice Decoder
[00:31:51] Training