Marin

Marin is an open lab for building foundation models---together. We're training powerful models from scratch, and sharing and programmatically documenting every step: the code, the data, the experiments, the mistakes...all in real-time. We invite anyone who shares our vision of open science and open-source to join and contribute, whether you want to try out a new architecture, training algorithm, dataset, evaluation...there is a lot to do!

Find us here:

Want to jump in? Install the Marin code and run your first experiment!

Experiments

Building a foundation model requires countless experiments trying out endless variants of algorithms and datasets. All the experiments we're doing are captured as GitHub issues. Here is a summary to get the lay of the land.

Speedrun 🏃

Have a new architecture or training procedure that's more efficient? Participate in the Marin speedrun competition (inspired by the nanogpt speedrun) and create the fastest method to train a model to a certain quality! Get started here.

Datashop 🛠️

Want to add new capabilities to the Marin models? Go visit our datashop, where you can upload a dataset or craft a prompt to curate a relevant dataset for your task.

Models 🌐

We have trained a strong 8B parameter model. Try it out here!

Acknowledgements

Marin wouldn't be possible without the generous support of the Google TPU Research Cloud program. We also benefit immensely from the broader open ecosystem, who have released numerous tools and datasets, including AI2, Hugging Face, NVIDIA, etc.