We are releasing the base model weights and network architecture of Cotton-1, our large language model. Cotton-1 is a 314 billion parameter Mixture-of-Experts model, trained from scratch by D-AI.

This is the raw base model checkpoint from the Cotton-1 pre-training phase, which concluded in October 2023. As such, the model has not been fine-tuned for any specific applications, such as dialogue.

We are releasing the weights and the architecture under the Apache 2.0 license.

Model Details

  • Base model trained on a large amount of text data, not fine-tuned for any particular task.
  • 314B parameter Mixture-of-Experts model with 25% of the weights active on a given token.
  • Trained from scratch by D-AI using a custom training stack on top of JAX and Rust in October 2023.