{ "cells": [ { "cell_type": "code", "execution_count": 1, "metadata": {}, "outputs": [], "source": [ "%load_ext autoreload\n", "%autoreload 2" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "# API\n", "\n", "download a wrapper for the API, in this case in python: `pip install ollama`\n", "\n", "we then define our model as a variable and verify its content:" ] }, { "cell_type": "code", "execution_count": 2, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "['llama_3_1_8b', 'llama_3_2_1b', 'llama_3_2_3b', 'mistral_sm_24b']\n" ] } ], "source": [ "import ollama\n", "from src.models import Models, get_models, load_model\n", "\n", "available = get_models()\n", "print(available)" ] }, { "cell_type": "code", "execution_count": 3, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Selected model: hf.co/unsloth/Llama-3.2-3B-Instruct-GGUF:Q6_K\n" ] } ], "source": [ "model = available[2]\n", "print(f\"Selected model: {Models.model_fields.get(model).default}\")\n", "model = load_model(model)" ] }, { "cell_type": "code", "execution_count": 4, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "general.architecture: llama\n", "general.basename: Llama-3.2\n", "general.file_type: 18\n", "general.finetune: Instruct\n", "general.organization: Meta Llama\n", "general.parameter_count: 3212749888\n", "general.quantization_version: 2\n", "general.size_label: 3B\n", "general.type: model\n", "llama.attention.head_count: 24\n", "llama.attention.head_count_kv: 8\n", "llama.attention.key_length: 128\n", "llama.attention.layer_norm_rms_epsilon: 1e-05\n", "llama.attention.value_length: 128\n", "llama.block_count: 28\n", "llama.context_length: 131072\n", "llama.embedding_length: 3072\n", "llama.feed_forward_length: 8192\n", "llama.rope.dimension_count: 128\n", "llama.rope.freq_base: 500000\n", "llama.vocab_size: 128256\n", "tokenizer.ggml.bos_token_id: 128000\n", "tokenizer.ggml.eos_token_id: 128009\n", "tokenizer.ggml.merges: None\n", "tokenizer.ggml.model: gpt2\n", "tokenizer.ggml.padding_token_id: 128004\n", "tokenizer.ggml.pre: llama-bpe\n", "tokenizer.ggml.token_type: None\n", "tokenizer.ggml.tokens: None\n" ] } ], "source": [ "info = ollama.show(model).modelinfo\n", "for k, v in info.items():\n", " print(f\"{k}: {v}\")" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Generating responses" ] }, { "cell_type": "code", "execution_count": 7, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "Here's a simple recipe for a delicious citrusy cake that combines the brightness of lemon and orange with the warmth of brown sugar:\n", "\n", "**Lemon-Orange Sunshine Cake**\n", "\n", "Ingredients:\n", "\n", "For the cake:\n", "\n", "* 250g all-purpose flour\n", "* 200g granulated sugar\n", "* 100g unsalted butter, softened\n", "* 4 large eggs\n", "* 120g freshly squeezed lemon juice (about 2 lemons)\n", "* 60g freshly squeezed orange juice (about 1 orange)\n", "* 1 teaspoon grated lemon zest\n", "* 0.5 teaspoon baking powder\n", "* Salt to taste\n", "\n", "For the glaze:\n", "\n", "* 150g powdered sugar\n", "* 30g freshly squeezed lemon juice\n", "\n", "Instructions:\n", "\n", "1. Preheat your oven to 180°C (350°F). Grease two 20cm (8-inch) round cake pans and line the bottoms with parchment paper.\n", "2. In a medium bowl, whisk together flour, sugar, baking powder, and salt. Set aside.\n", "3. In a large mixing bowl, use an electric mixer to beat the butter until creamy. Add eggs one at a time, beating well after each addition.\n", "4. Gradually add the dry ingredients to the butter mixture, alternating with lemon and orange juice, beginning and ending with the dry ingredients. Beat just until combined.\n", "5. Stir in grated lemon zest.\n", "6. Divide the batter evenly between the prepared pans and smooth the tops.\n", "7. Bake for 25-30 minutes or until a toothpick inserted into the center of each cake comes out clean.\n", "8. Allow the cakes to cool in the pans for 5 minutes, then transfer them to a wire rack to cool completely.\n", "\n", "For the glaze:\n", "\n", "1. Whisk together powdered sugar and lemon juice until smooth.\n", "2. Drizzle the glaze over the cooled cake.\n", "\n", "**Tips:**\n", "\n", "* Make sure to use fresh citrus juices for the best flavor.\n", "* Don't overmix the batter, as this can lead to a dense cake.\n", "* If you want a more pronounced lemon flavor, you can add another 10-15g of lemon juice to the batter or glaze.\n", "\n", "Enjoy your delicious and moist Lemon-Orange Sunshine Cake!\n" ] } ], "source": [ "from ollama import chat\n", "\n", "prompt = \"Give me a simple recipe for a delicious citrusy cake. Make sure units are in grams when it makes sense. Temperatures should be in C.\"\n", "\n", "response = chat(\n", " model=model,\n", " messages=[\n", " {\"role\": \"user\", \"content\": prompt},\n", " ],\n", ")\n", "print(response.message.content)" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Controlling the responses...\n", "\n", "Can only do so much with raw text... Let's up the controllability!\n", "\n", "First off, there are a few common parameters that can be used to tune the outputs.\n", "Some are related to \"creativity\", whereas some control the predictability and determinism of the outputs.\n", "\n", "```python\n", "\"num_ctx\": \"Maximum number of tokens the model can process in a single input.\"\n", "\"seed\": \"Random seed for deterministic generation.\"\n", "\"num_predict\": \"Maximum number of tokens to generate in output.\"\n", "\"top_k\": \"Limits sampling to the top K most probable tokens.\"\n", "\"top_p\": \"Limits sampling to the smallest set of tokens with cumulative probability >= top_p.\"\n", "\"temperature\": \"Controls randomness in generation; higher values increase randomness.\"\n", "\"repeat_penalty\": \"Penalty for repeated tokens to reduce repetition in output.\"\n", "```\n", "\n", "We can also add a system prompt (see the `role` in the messages below) to guide the model in the right direction. Being explicit on its role can help both the behavior and output formats." ] }, { "cell_type": "code", "execution_count": 8, "metadata": {}, "outputs": [ { "name": "stdout", "output_type": "stream", "text": [ "```\n", "{\n", " \"recipe\": {\n", " \"name\": \"Citrusy Cake\",\n", " \"servings\": 8-10,\n", " \"ingredients\": [\n", " {\n", " \"name\": \"all-purpose flour\",\n", " \"quantity\": \"250g\"\n", " },\n", " {\n", " \"name\": \"granulated sugar\",\n", " \"quantity\": \"200g\"\n", " },\n", " {\n", " \"name\": \"unsalted butter, softened\",\n", " \"quantity\": \"150g\"\n", " },\n", " {\n", " \"name\": \"large eggs\",\n", " \"quantity\": \"3\"\n", " },\n", " {\n", " \"name\": \"freshly squeezed orange juice\",\n", " \"quantity\": \"120ml\"\n", " },\n", " {\n", " \"name\": \"zest of 1 orange\",\n", " \"quantity\": \"30g\"\n", " },\n", " {\n", " \"name\": \"zest of 1 lemon\",\n", " \"quantity\": \"30g\"\n", " },\n", " {\n", " \"name\": \"baking powder\",\n", " \"quantity\": \"10g\"\n", " },\n", " {\n", " \"name\": \"salt\",\n", " \"quantity\": \"5g\"\n", " }\n", " ],\n", " \"instructions\": [\n", " {\n", " \"step\": \"Preheat oven to 180°C. Grease two 20cm round cake pans and line the bottoms with parchment paper.\"\n", " },\n", " {\n", " \"step\": \"In a medium bowl, whisk together flour, sugar, baking powder, and salt.\"\n", " },\n", " {\n", " \"step\": \"In a large bowl, using an electric mixer, beat the butter until creamy. Add eggs one at a time, beating well after each addition.\"\n", " },\n", " {\n", " \"step\": \"With the mixer on low speed, gradually add the flour mixture to the butter mixture in three parts, alternating with the orange juice, beginning and ending with the flour mixture. Beat just until combined.\"\n", " },\n", " {\n", " \"step\": \"Divide the batter evenly between the prepared pans and smooth the tops.\"\n", " },\n", " {\n", " \"step\": \"Bake for 25-30 minutes or until a toothpick inserted in the center comes out clean.\"\n", " },\n", " {\n", " \"step\": \"Let the cakes cool in the pans for 5 minutes, then transfer to a wire rack to cool completely.\"\n", " }\n", " ]\n", " }\n", "}\n", "```\n" ] } ], "source": [ "from ollama import ChatResponse, chat\n", "from pydantic.types import JsonSchemaValue\n", "from typing import Optional, List\n", "\n", "system_prompt = \"You are a helpful assistant that provides clear and concise answers to the user's needs. Always answer in a JSON format.\"\n", "\n", "def generate(\n", " prompt: str,\n", " json_format: Optional[JsonSchemaValue] = None,\n", " model=model,\n", " system_prompt=system_prompt,\n", ") -> str:\n", " response: ChatResponse = chat(\n", " model=model,\n", " messages=[\n", " {\"role\": \"system\", \"content\": system_prompt},\n", " {\"role\": \"user\", \"content\": prompt},\n", " ],\n", " options={\n", " \"num_ctx\": 4096,\n", " \"num_predict\": 1024,\n", " \"top_k\": 50,\n", " \"top_p\": 0.95,\n", " \"temperature\": 0.0,\n", " \"seed\": 0, # this is not needed when temp is 0\n", " \"repeat_penalty\": 1.0, # remain default for json outputs, from experience.\n", " },\n", " format=json_format,\n", " stream=False,\n", " )\n", " return response.message.content\n", "\n", "print(generate(prompt))" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "## Even more control\n", "Now, the format is already following a JSON from the system prompt, but we cannot know beforehand what fields are inside it. Let's fix this by introducing a **schema**, a structured definition of our output.\n", "\n", "We start building our schema through a typed BaseModel in pydantic (which will be converted to a grammar-like format called GBNF, that you can read about here: )\n", "\n", "If you were not to use ollama, you could pass a schema directly, which again will be converted to GBNF.\n", "\n", "Here is an example of a schema that forces the output to contain three fields: \"questions\", \"score\", \"summary\" - three fields very useful for extracting information around a larger document. Note how you can specify the types, and even constrain the \"score\" to specific values through the `enum` keyword, along with `min/maxItems` for arrays.\n", "\n", "```python\n", "schema = {\n", " \"type\": \"object\",\n", " \"properties\": {\n", " \"questions\": {\n", " \"type\": \"array\",\n", " \"minItems\": 1,\n", " \"maxItems\": 3,\n", " \"items\": {\n", " \"type\": \"object\",\n", " \"properties\": {\"question\": {\"type\": \"string\"}},\n", " \"required\": [\"question\"],\n", " },\n", " },\n", " \"score\": {\"type\": \"integer\", \"enum\": [0, 1, 2, 3]},\n", " \"summary\": {\"type\": \"string\"},\n", " },\n", " \"required\": [\"questions\", \"score\", \"summary\"],\n", "}\n", "```\n", "\n", "However, the easiest and most programmatic way of handling this is define interfaces that are automatically parsed as a schema before being sent through to the llama.cpp api. We continue with the recipe data format!" ] }, { "cell_type": "code", "execution_count": 9, "metadata": {}, "outputs": [ { "data": { "text/plain": [ "{'title': 'Citrusy Cake Recipe',\n", " 'ingredients': [{'name': 'all-purpose flour', 'quantity': 250, 'unit': 'g'},\n", " {'name': 'granulated sugar', 'quantity': 200, 'unit': 'g'},\n", " {'name': 'unsalted butter, softened', 'quantity': 150, 'unit': 'g'},\n", " {'name': 'large eggs', 'quantity': 3, 'unit': ''},\n", " {'name': 'freshly squeezed orange juice', 'quantity': 120, 'unit': 'ml'},\n", " {'name': 'freshly squeezed lemon juice', 'quantity': 60, 'unit': 'ml'},\n", " {'name': 'zest of 1 orange', 'quantity': 20, 'unit': 'g'},\n", " {'name': 'zest of 1 lemon', 'quantity': 10, 'unit': 'g'},\n", " {'name': 'baking powder', 'quantity': 5, 'unit': 'g'},\n", " {'name': 'salt', 'quantity': 2, 'unit': 'g'}],\n", " 'instructions': [{'step': 1,\n", " 'description': 'Preheat oven to 180°C. Grease two 20cm round cake pans and line the bottoms with parchment paper.'},\n", " {'step': 2,\n", " 'description': 'In a medium bowl, whisk together flour, sugar, baking powder, and salt.'},\n", " {'step': 3,\n", " 'description': 'In a large bowl, using an electric mixer, beat the butter until creamy. Add eggs one at a time, beating well after each addition.'},\n", " {'step': 4,\n", " 'description': 'With the mixer on low speed, gradually add the flour mixture to the butter mixture in three parts, alternating with the orange and lemon juices, beginning and ending with the flour mixture. Beat just until combined.'},\n", " {'step': 5,\n", " 'description': 'Divide the batter evenly between the prepared pans and smooth the tops.'},\n", " {'step': 6,\n", " 'description': 'Bake for 25-30 minutes or until a toothpick inserted in the center comes out clean.'},\n", " {'step': 7,\n", " 'description': 'Remove from the oven and let cool in the pans for 5 minutes. Then, transfer to a wire rack to cool completely.'}],\n", " 'tools': ['electric mixer',\n", " 'whisk',\n", " 'measuring cups',\n", " 'measuring spoons',\n", " 'parchment paper',\n", " 'wire rack']}" ] }, "execution_count": 9, "metadata": {}, "output_type": "execute_result" } ], "source": [ "from pydantic import BaseModel\n", "\n", "class Ingredient(BaseModel):\n", " name: str\n", " quantity: float\n", " unit: str\n", "\n", "class RecipeInstruction(BaseModel):\n", " step: int\n", " description: str\n", "\n", "class Recipe(BaseModel):\n", " title: str\n", " ingredients: List[Ingredient]\n", " instructions: List[RecipeInstruction]\n", " tools: List[str]\n", "\n", "# we can now use eval to properly format the json as an object\n", "# using `eval`from the output of an API is generally not safe, but we can safely do it from the JSON-output of a local model.\n", "eval(generate(prompt, json_format=Recipe.model_json_schema()))" ] } ], "metadata": { "kernelspec": { "display_name": "local-llm", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.12.8" } }, "nbformat": 4, "nbformat_minor": 2 }