This is a simplified guide to an AI model called Codellama-34b-Instruct-Gguf maintained by Andreasjansson. If you like these kinds of analysis, you should join AImodels.fyi or follow us on Twitter.
Model overview
The codellama-34b-instruct-gguf model is a large language model developed by andreasjansson at Replicate. It is based on the Llama 2 architecture and includes support for grammar-based decoding and JSON schema validation. This allows the model to generate outputs that adhere to specific structural and semantic constraints, making it well-suited for tasks requiring structured responses.
The model is part of a broader family of Llama 2 models created by andreasjansson, including the codellama-7b-instruct-gguf, llama-2-13b-chat-gguf, llama-2-70b-chat-gguf, llama-2-7b-embeddings, and llama-2-13b-embeddings models, all of which offer various capabilities and architectures tailored for different use cases.
Model inputs and outputs
The codellama-34b-instruct-gguf model takes a prompt as input and generates a sequence of text outputs. The prompt can include a grammar in GBNF format or a JSON schema, which the model will use to constrain the generated output to adhere to specific structural and semantic requirements.
Inputs
- Prompt: The input text that the model will use to generate output.
- Grammar: A grammar in GBNF format that the model will use to constrain the generated output.
- Jsonschema: A JSON schema that the model will use to constrain the generated output.
- Max Tokens: The maximum number of tokens the model should generate.
- Temperature: A value between 0 and 1 that controls the model's creativity and randomness.
- Top K: The number of most likely tokens to consider at each step of the generation process.
- Top P: The cumulative probability threshold to use for sampling tokens.
- Frequency Penalty: A value between 0 and 2 that penalizes the model for repeating the same tokens.
- Presence Penalty: A value between 0 and 2 that penalizes the model for generating tokens that have already appeared in the output.
- Repeat Penalty: A value between 0 and 2 that penalizes the model for generating repetitive output.
- Mirostat Mode: The mode to use for Mirostat sampling, which can be "Disabled", "Mirostat", or "Mirostat 2.0".
- Mirostat Entropy: The target entropy for Mirostat sampling.
- Mirostat Learning Rate: The learning rate for Mirostat sampling.
Outputs
- Output: A sequence of text that adheres to the specified grammar or JSON schema.
Capabilities
The codellama-34b-instruct-gguf mode...
Click here to read the full guide to Codellama-34b-Instruct-Gguf
Top comments (0)