The AI Newsroom
Posts
✅ Mistral's Pixtral 12B, Explained

✅ Mistral's Pixtral 12B, Explained

French AI startup Mistral has released its first multi-modal model.

Douglas
September 20, 2024 • Reading Time: ~5 minutes

Welcome to this new (special) edition!

Hello readers, Douglas here!

Last week, OpenAI dropped a bombshell with their o1 model. This week, it's Mistral's turn with their new multi-modal model: Pixtral 12B.

What is a Multi-Modal Model?
How does it work?
Why now?
Can Pixtral 12B unlock new capabilities?
& much more…!

All these questions deserve a special edition. I’ve broken down everything to answer the questions you might have.

I hope this helps clarify things for you.

Enjoy! 💚

Pixtral 12B from Mistral AI

In partnership with 1440 Media

Have to thank 1440 Media for trusting The AI Newsroom for the second time.

We share the same values and ambitions, both in the quality of the content we try to create and in our desire to make our readers smarter. 🤓

By supporting them, you support me! 💚

About 1440:

Fact-based news without bias awaits. Make 1440 your choice today.

Overwhelmed by biased news? Cut through the clutter and get straight facts with your daily 1440 digest. From politics to sports, join millions who start their day informed.

What is a Multi-Modal Model?

The word “modal” refers to types of data or information. So, when we say “multi-modal,” it means using more than one type of data at the same time.

But what kinds of data we’re talking about?

Imagine an AI that can understand words (text) and pictures (images). That’s a simple example of multi-modal. Some AIs can even understand sounds, like speech or music, along with text and images.

Why is the multimodal model important?

The world is full of different kinds of information—words, pictures, sounds—and they often come together. For example, when you look at a photo of a cat and the word “cat” next to it, it’s easier to understand what the picture is about.
A multi-modal AI can look at both the picture and the word “cat” to fully understand that it’s seeing a cat.

How do AIs usually work?

Most AIs only understand one type of data at a time. For example, a text-only AI can read articles or messages, but it doesn’t “see” pictures. A picture-only AI can recognize objects in images, but it can’t read the words in a photo.

So what makes the multimodal model different is its ability to both read words and understand images.

Multi-Modal Model Structure. Credit: LeewayHertz

All about Pixtral 12B

What’s Pixtral 12B?

Pixtral 12B is a new AI multimodal model created by Mistral AI.
With 12 billion parameters (the building blocks of the model), Pixtral 12B is designed to be more powerful and efficient than previous models.

How does Pixtral 12B work?

Pixtral 12B works by “looking” at images and “reading” text at the same time. It takes in both kinds of data, connects the information, and gives back an answer that combines everything it has learned.

For example, if you upload a picture of a city skyline and ask, “Which city is this?” Pixtral 12B uses both the image (the buildings, landmarks) and the text (your question) to figure it out.

Why is this new model so special?

Because it’s not just about understanding one thing. Pixtral 12B can handle complex tasks because it processes information in a way that mimics how we humans use all our senses together. This makes it smarter at solving real-world problems. And it’s also very different from Mistral AI’s previous models. Older models were often limited to either text or images. Pixtral 12B combines both, making it more versatile and efficient than those older models.

Where can you get Pixtral 12B?

Pixtral 12B is available on Hugging Face, a platform known for AI and machine learning development.

You can download it directly from Hugging Face and use it in your projects. It’s ready for you to fine-tune, customize, and apply in various tasks involving text and images.

Is it free to use?

Yes! Pixtral 12B is offered under an Apache 2.0 license, meaning you can use it freely without any restrictions. Whether for research, experimentation, or product development, you have full flexibility to integrate Pixtral 12B into your workflow.

/pixtral-12b-240910 - Hugging Face

Thank you for your time!

See you on Tuesday at 9:12 am!

Hey readers, Doug here!

I'd like to sincerely thank you for taking the time to read The AI Newsroom every week.

I make every effort to send you valuable emails every week.

Please let me know how you found this edition by replying to this email or by answering the questionnaire below. 👇

♻️ Please also feel free to share as much of the newsletter as you can with your friends, colleagues, or AI girlfriend. It helps me enormously!

Keep Learning!

I’d Love Your Feedback!

Did you enjoy this edition? Let me know: