A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.
Modalities
Input Price
$0.11per 1M
Output Price
$0.19per 1M
Context
4K
Weekly Tokens
265M
Released
Sep 28, 2023
A 7.3B parameter model that outperforms Llama 2 13B on all benchmarks, with optimizations for speed and context length.
Modalities
Input Price
$0.11per 1M
Output Price
$0.19per 1M
Context
4K
Weekly Tokens
265M
Released
Sep 28, 2023
Create an API key from your OpenRouter dashboard and set it as an environment variable:
Use mistralai/mistral-7b-instruct-v0.1 with the OpenRouter API:
OpenRouter provides an OpenAI-compatible completion API to 400+ models & providers that you can call directly, or using the OpenAI SDK. Additionally, some third-party SDKs are available.
In the examples below, the OpenRouter-specific headers are optional. Setting them allows your app to appear on the OpenRouter leaderboards.
For information about using third-party SDKs and frameworks with OpenRouter, please see our frameworks documentation.
Add "stream": true to your request body to receive responses as server-sent events:
https://openrouter.ai/api/v1/chat/completionsBearer $OPENROUTER_API_KEYapplication/jsonoptional — your site URL, for rankingsoptional — your site name, for rankingsmistralai/mistral-7b-instruct-v0.1| Name | Type | Default | Description |
|---|---|---|---|
max_tokens | integer | — | This sets the upper limit for the number of tokens the model can generate in response. |
temperature | float | 1 | This setting influences the variety in the model's responses. |
top_p | float | 1 | This setting limits the model's choices to a percentage of likely tokens: only the top tokens whose probabilities add up to P. |
top_k | integer | 0 | This limits the model's choice of tokens at each step, making it choose from a smaller set. |
seed | integer | — | If specified, the inferencing will sample deterministically, such that repeated requests with the same seed and parameters should return the same result. |
repetition_penalty | float | 1 | Helps to reduce the repetition of tokens from the input. |
frequency_penalty | float | 0 | This setting aims to control the repetition of tokens based on how often they appear in the input. |
presence_penalty | float | 0 | Adjusts how often the model repeats specific tokens already used in the input. |
min_p | float | 0 | Represents the minimum probability for a token to be considered, relative to the probability of the most likely token. |
stop | array | — | Stop generation immediately if the model encounter any token specified in the stop array. |
logit_bias | map | — | Accepts a JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100. |