<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Gláucio</title>
    <description>The latest articles on DEV Community by Gláucio (@glaucioripol).</description>
    <link>https://dev.to/glaucioripol</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F438002%2F568d20e3-b717-47d0-b6ca-29ccc428b774.jpg</url>
      <title>DEV Community: Gláucio</title>
      <link>https://dev.to/glaucioripol</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/glaucioripol"/>
    <language>en</language>
    <item>
      <title>Pare de Brincar com LLMs Locais: Leve a IAG Open Source para a Produção na Magalu Cloud</title>
      <dc:creator>Gláucio</dc:creator>
      <pubDate>Thu, 05 Feb 2026 10:30:00 +0000</pubDate>
      <link>https://dev.to/magalucloud/pare-de-brincar-com-llms-locais-leve-a-iag-open-source-para-a-producao-na-magalu-cloud-4p6c</link>
      <guid>https://dev.to/magalucloud/pare-de-brincar-com-llms-locais-leve-a-iag-open-source-para-a-producao-na-magalu-cloud-4p6c</guid>
      <description>&lt;p&gt;Nos últimos anos, tivemos um salto no processamento de linguagem natural (NLP). O surgimento da IA Generativa não apenas refinou tarefas antigas, mas abriu bastantes possibilidades em lidar com textos e dados não estruturados, melhorias em experiências de aplicações já existentes, classificações mais simples e demais tarefas. No entanto, para muitas organizações, trouxe um dilema crítico: como escalar o uso dessas tecnologias mantendo o controle absoluto sobre os dados?&lt;/p&gt;

&lt;p&gt;Para setores que lidam com informações sensíveis ou que operam sob regulamentações rígidas de privacidade, submeter dados a LLMs públicos via APIs externas não é apenas um risco, mas muitas vezes uma impossibilidade jurídica. É aqui que a soberania de dados se torna o pilar central da estratégia de tecnologia. Rodar modelos abertos em uma infraestrutura nacional e controlada, como a &lt;a href="https://magalu.cloud/?utm_source=site&amp;amp;utm_medium=organico&amp;amp;utm_campaign=artigos+glaucio&amp;amp;utm_id=artigos+glaucio" rel="noopener noreferrer"&gt;Magalu Cloud&lt;/a&gt;, não é mais apenas uma alternativa de custo, é uma necessidade de segurança e conformidade.&lt;/p&gt;

&lt;p&gt;Neste tutorial, vamos explorar como você pode unir o estado da arte dos modelos do &lt;a href="https://huggingface.co/" rel="noopener noreferrer"&gt;Hugging Face&lt;/a&gt; com o &lt;a href="https://docs.vllm.ai/en/stable/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt;, garantindo performance de nível de produção sem abrir mão da privacidade.&lt;/p&gt;

&lt;h2&gt;
  
  
  O que é vLLM?
&lt;/h2&gt;

&lt;p&gt;O vLLM não é apenas uma biblioteca de inferência; é um sistema de gerenciamento de memória virtualizado para GPUs.&lt;/p&gt;

&lt;p&gt;O problema central da inferência de LLMs não é computação (FLOPs), é I/O de memória. Engines tradicionais alocam VRAM contígua para o Key-Value (KV) Cache, resultando em fragmentação massiva (60-80% de desperdício) e impedindo o paralelismo.&lt;/p&gt;

&lt;p&gt;O vLLM resolve isso com o PagedAttention: um algoritmo de atenção que permite armazenar o KV Cache em blocos de memória não contíguos. Isso desacopla a memória lógica (sequência de tokens) da memória física (VRAM), permitindo:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Desperdício Zero: Quase nenhuma fragmentação de memória (&amp;lt;4%).&lt;/li&gt;
&lt;li&gt;Continuous Batching: A capacidade de injetar novas requisições na GPU token-a-token, sem esperar que o lote anterior termine.&lt;/li&gt;
&lt;li&gt;Copy-on-Write: Mecanismo eficiente para decodificação paralela (como em Beam Search), onde múltiplos "caminhos" compartilham a mesma memória física até divergirem.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Você pode conhecer mais sobre o vLLM neste &lt;a href="https://blog.vllm.ai/2023/06/20/vllm.html" rel="noopener noreferrer"&gt;blog post&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  Passo a passo para configuração do ambiente
&lt;/h2&gt;

&lt;p&gt;Para este tutorial, provisionamos uma máquina virtual equipada com a &lt;a href="https://www.nvidia.com/en-us/data-center/l40s/" rel="noopener noreferrer"&gt;GPU NVIDIA L40S&lt;/a&gt;. A escolha é estratégica pois a L40S é baseada na arquitetura Ada Lovelace, possui o Transformer Engine com suporte nativo a precisão FP8. Isso garante que o modelo &lt;a href="https://huggingface.co/Qwen/Qwen3-30B-A3B-FP8" rel="noopener noreferrer"&gt;Qwen3-30B-A3B-FP8&lt;/a&gt; rode muito bem tirando proveito dos 18.176 núcleos CUDA e da alta disponibilidade de memória. Com 48GB de VRAM GDDR6 (ECC) e uma largura de banda de 864 GB/s, eliminamos gargalos de transferência, assegurando estabilidade e alta vazão durante a inferência.&lt;/p&gt;

&lt;p&gt;Então dado o contexto vamos iniciar a mão na massa.&lt;/p&gt;

&lt;h3&gt;
  
  
  Criação da máquina
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://console.magalu.cloud/virtual-machine/new" rel="noopener noreferrer"&gt;Link para criação de máquinas virtuais&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;

&lt;h4&gt;
  
  
  Etapas para criarmos a máquina
&lt;/h4&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;1 - Início e Seleção do Serviço&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;No Console da Magalu Cloud, localize o card com o texto "Virtual Machine".&lt;/li&gt;
&lt;li&gt;Clique no botão azul "Criar instância".&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;2 - Configurações de região e sistema operacional&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Zona de Disponibilidade: Selecione a zona desejada (no meu caso, foi escolhida a &lt;code&gt;br-se1-a&lt;/code&gt; mas escolha o que preferir).&lt;/li&gt;
&lt;li&gt;Escolha de Imagem: Selecione a distribuição Ubuntu.&lt;/li&gt;
&lt;li&gt;Versão: No menu suspenso que aparece, escolha a versão Ubuntu 24.04 LTS.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;3 - Hardware&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Habilitar GPU (muito importante): Marque a caixa de seleção "Habilitar GPU" para visualizar as instâncias de alta performance.&lt;/li&gt;
&lt;li&gt;Tipo de Instância:&lt;/li&gt;
&lt;li&gt;Role até a seção de GPUs (GPU NVIDIA L40S, indicada para IA generativa e LLMs).&lt;/li&gt;
&lt;li&gt;Selecione a configuração desejada. Neste caso escolha a instância L40S-1x-DP8-64-100 (que oferece 8 vCPUs, 64 GB de RAM e 100 GB de disco).&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;4 - Conectividade e Acesso&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Acesso Público: Mantenha marcada a opção "Atribuir IPv4 público para essa instância" (padrão).&lt;/li&gt;
&lt;li&gt;Chave SSH (muito importante para poder acessar sua máquina):&lt;/li&gt;
&lt;li&gt;Selecione a opção "Selecionar chave já utilizada anteriormente" caso tenha uma ou cadastre uma nova clicando em " Inserir uma chave própria nova".&lt;/li&gt;
&lt;li&gt;Caso esteja inserindo uma nova chave, consulte-a com o seguinte comando Linux/macos &lt;code&gt;cat ~/.ssh/id_ed25519.pub&lt;/code&gt;(o nome da sua chave deve ser diferente de &lt;code&gt;id_ed25519.pub&lt;/code&gt;, mas ela deve finalizar com &lt;code&gt;.pub&lt;/code&gt;), copie o valor que aparecer no seu terminal e cole no campo "Insira sua chave SSH" e depois dê um nome para essa chave ssh ser utilizada.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;5 - Nomeando a máquina&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Nome da Instância: Defina um nome identificável para o servidor. No meu caso, usei o nome &lt;code&gt;llm_server-vllm&lt;/code&gt;.&lt;/li&gt;
&lt;li&gt;Clique no botão "Criar instância" no canto inferior direito.&lt;/li&gt;
&lt;li&gt;Você será redirecionado para a lista de instâncias, onde poderá ver seu novo servidor com o status "Criando".&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;h3&gt;
  
  
  Acessando nossa máquina virtual para efetuarmos nossas configurações
&lt;/h3&gt;

&lt;p&gt;Depois de criarmos nossa máquina virtual, agora precisaremos acessar ela para fazermos as instalações necessárias para rodarmos nossa LLM em produção com proxy reverso e SSL.&lt;/p&gt;

&lt;p&gt;Requisito obrigatório:&lt;br&gt;
Para nosso proxy reverso funcionar corretamente, será preciso configurar seu domínio apontando o IP da sua máquina que poderá pegar essa informação acessando a página de detalhes da nossa VM.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkuhdi1n57undqhkwpx9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fjkuhdi1n57undqhkwpx9.png" alt="Configuração da VM na Magalu Cloud" width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No meu caso utilizo a Cloudflare para gerir meu DNS, e você pode seguir a &lt;a href="https://developers.cloudflare.com/dns/get-started/" rel="noopener noreferrer"&gt;documentação deles para configurar seu DNS&lt;/a&gt;, mas caso use outro local para gerenciar seu DNS busque e faça a configuração, para este tutorial eu vou utilizar o seguinte domínio &lt;code&gt;vllm-mgc.glaucio.tec.br&lt;/code&gt; , dito isto vamos seguir para acesso.&lt;/p&gt;
&lt;h4&gt;
  
  
  Primeiro acesso a nossa VM via SSH
&lt;/h4&gt;

&lt;p&gt;Na página da nossa máquina virtual como no print acima, há um ícone azul para copiar o seu IPv4 Público e seu usuário.&lt;/p&gt;

&lt;p&gt;e execute o comando &lt;code&gt;ssh ubuntu@201.23.79.32&lt;/code&gt; no seu terminal como no print abaixo&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe41rjdm49wpa88tvrv1q.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fe41rjdm49wpa88tvrv1q.png" alt="Acessando nossa máquina via SSH." width="800" height="163"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;No primeiro acesso irá aparecer esta mensagem da imagem acima no seu terminal, apenas digite &lt;code&gt;yes&lt;/code&gt; e aperte enter e agora com acesso ao servidor podemos iniciar a instalação das dependências.&lt;/p&gt;
&lt;h4&gt;
  
  
  Instalação do Docker
&lt;/h4&gt;

&lt;p&gt;Para instalar o Docker rapidamente vamos automatizar esta parte, para isso abra algum editor pelo terminal como VIM ou nano, no meu caso para ser mais fácil irei utilizar o nano para criar o script de instalação do nosso Docker com o comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nano install_docker.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;e você irá ver a seguinte tela, com o nosso editor nano aberto com um arquivo de texto em memória sem conteúdo algum&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkm7oj1anncrsb3rzjjg9.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fkm7oj1anncrsb3rzjjg9.png" alt="Editor nano aberto com um arquivo de texto em memória sem conteúdo algum." width="800" height="489"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Para facilitar esta etapa de instalação do Docker, copie e cole o shell script abaixo.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="c"&gt;# 0. Antes de rodar o script lembre de executar `chmod +x install_docker.sh`&lt;/span&gt;

&lt;span class="c"&gt;# 1. Limpeza de tentativas anteriores (Resolve o erro 'Malformed entry')&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🧹 Limpando configurações antigas..."&lt;/span&gt;
&lt;span class="nb"&gt;sudo rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /etc/apt/sources.list.d/docker.list
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update

&lt;span class="c"&gt;# 2. Instalar dependências essenciais&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"📦 Instalando dependências..."&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; ca-certificates curl gnupg lsb-release

&lt;span class="c"&gt;# 3. Baixar Chave GPG&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🔑 Configurando chave GPG..."&lt;/span&gt;
&lt;span class="nb"&gt;sudo install&lt;/span&gt; &lt;span class="nt"&gt;-m&lt;/span&gt; 0755 &lt;span class="nt"&gt;-d&lt;/span&gt; /etc/apt/keyrings
&lt;span class="c"&gt;# Se a chave já existir, removemos para baixar a mais nova&lt;/span&gt;
&lt;span class="nb"&gt;sudo rm&lt;/span&gt; &lt;span class="nt"&gt;-f&lt;/span&gt; /etc/apt/keyrings/docker.asc
&lt;span class="nb"&gt;sudo &lt;/span&gt;curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://download.docker.com/linux/ubuntu/gpg &lt;span class="nt"&gt;-o&lt;/span&gt; /etc/apt/keyrings/docker.asc
&lt;span class="nb"&gt;sudo chmod &lt;/span&gt;a+r /etc/apt/keyrings/docker.asc

&lt;span class="c"&gt;# 4. Adicionar Repositório (Fixo para 'noble' - Ubuntu 24.04)&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"📂 Adicionando repositório 'noble'..."&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"deb [arch=&lt;/span&gt;&lt;span class="si"&gt;$(&lt;/span&gt;dpkg &lt;span class="nt"&gt;--print-architecture&lt;/span&gt;&lt;span class="si"&gt;)&lt;/span&gt;&lt;span class="s2"&gt; signed-by=/etc/apt/keyrings/docker.asc] https://download.docker.com/linux/ubuntu noble stable"&lt;/span&gt; | &lt;span class="nb"&gt;sudo tee&lt;/span&gt; /etc/apt/sources.list.d/docker.list &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null

&lt;span class="c"&gt;# 5. Instalar Docker&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⬇️  Instalando pacotes do Docker..."&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-plugin

&lt;span class="c"&gt;# 6. Configurar permissão de usuário&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"👤 Adicionando usuário '&lt;/span&gt;&lt;span class="nv"&gt;$USER&lt;/span&gt;&lt;span class="s2"&gt;' ao grupo docker..."&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;usermod &lt;span class="nt"&gt;-aG&lt;/span&gt; docker &lt;span class="nv"&gt;$USER&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"---"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"✅ Instalação concluída"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⚠️  Para usar imediatamente, rode: newgrp docker"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Valide instalação com docker run hello-world"&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;e deve ficar da seguinte forma&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuckol2636vejaxg2jkqa.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fuckol2636vejaxg2jkqa.png" alt="Acessando nossa máquina via SSH." width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Agora aperte as teclas &lt;code&gt;Ctrl&lt;/code&gt; e &lt;code&gt;x&lt;/code&gt; para salvar e confirme para salvar o conteúdo que adicionamos ao nosso arquivo.&lt;/p&gt;

&lt;p&gt;Feito isso, confirme se o nosso script foi salvo com sucesso e deve ter o mesmo texto do script acima, rode o comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;cat &lt;/span&gt;install_docker.sh 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;e confira se está tudo ok. Feito isso precisamos fazer nosso script ser executável, para isso vamos precisar rodar o seguinte comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x install_docker.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Feito isso, podemos rodar nosso script de instalação do Docker com o comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./install_docker.sh 
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;e correndo tudo bem irá aparecer no seu terminal as seguintes mensagens:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1b8igzau9o7s11fth6ho.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1b8igzau9o7s11fth6ho.png" alt="Acessando nossa máquina via SSH." width="800" height="177"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e então siga as instruções que foram exibidas para alterar o GID (Group ID) da sua sessão do terminal para o grupo "docker", sem que você precise deslogar e logar e rodar um container de hello-world para dar sorte haha e validar se a instalação correu bem.&lt;/p&gt;

&lt;p&gt;Ao rodar os comandos sugeridos, você deve ter o seguinte resultado no seu terminal:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf36wca0peafguw3odlp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fyf36wca0peafguw3odlp.png" alt="Validando instalação do Docker com docker run hello-world" width="800" height="603"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Se você teve os mesmos resultados dos prints acima, estamos prontos para instalar as dependências para que o Docker consiga se comunicar com a nossa GPU.&lt;/p&gt;

&lt;h4&gt;
  
  
  Instalação de drivers e nvidia-container-toolkit
&lt;/h4&gt;

&lt;p&gt;Assim como a etapa de instalação do Docker criaremos um script para agilizar nossa instalação e facilitar a reprodutibilidade.&lt;/p&gt;

&lt;p&gt;Crie nosso arquivo de setup da nossa máquina com o seguinte comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nano setup_nvidia.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;cole o script abaixo como fizemos na etapa do Docker:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;#!/bin/bash&lt;/span&gt;
&lt;span class="nb"&gt;set&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt;

&lt;span class="c"&gt;# Configuração - Altere conforme necessário&lt;/span&gt;
&lt;span class="nv"&gt;NVIDIA_DRIVER_VERSION&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NVIDIA_DRIVER_VERSION&lt;/span&gt;&lt;span class="k"&gt;:-&lt;/span&gt;&lt;span class="nv"&gt;570&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# Cores para output&lt;/span&gt;
&lt;span class="nv"&gt;RED&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[0;31m'&lt;/span&gt;
&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[0;32m'&lt;/span&gt;
&lt;span class="nv"&gt;YELLOW&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[1;33m'&lt;/span&gt;
&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="s1"&gt;'\033[0m'&lt;/span&gt; &lt;span class="c"&gt;# No Color&lt;/span&gt;

&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;╔════════════════════════════════════════════════════════╗&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;║  Instalação NVIDIA Driver + Container Toolkit          ║&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;║  Driver versão: &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NVIDIA_DRIVER_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;                                       ║&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;╚════════════════════════════════════════════════════════╝&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;

&lt;span class="c"&gt;# 0. Verificar se está rodando como root ou com sudo&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="nv"&gt;$EUID&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt; &lt;span class="nt"&gt;-eq&lt;/span&gt; 0 &lt;span class="o"&gt;]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RED&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;❌ Não execute este script como root. Use seu usuário normal.&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi&lt;/span&gt;

&lt;span class="c"&gt;# 1. Verificar se driver já está instalado&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🔍 Verificando instalação existente..."&lt;/span&gt;
&lt;span class="k"&gt;if &lt;/span&gt;&lt;span class="nb"&gt;command&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; nvidia-smi &amp;amp;&amp;gt; /dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;YELLOW&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;⚠️  Driver NVIDIA já está instalado:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    nvidia-smi &lt;span class="nt"&gt;--query-gpu&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;driver_version,name &lt;span class="nt"&gt;--format&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;csv,noheader
    &lt;span class="nb"&gt;read&lt;/span&gt; &lt;span class="nt"&gt;-p&lt;/span&gt; &lt;span class="s2"&gt;"Deseja continuar mesmo assim? (s/N): "&lt;/span&gt; &lt;span class="nt"&gt;-n&lt;/span&gt; 1 &lt;span class="nt"&gt;-r&lt;/span&gt;
    &lt;span class="nb"&gt;echo
    &lt;/span&gt;&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;[[&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nv"&gt;$REPLY&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt;~ ^[Ss]&lt;span class="nv"&gt;$ &lt;/span&gt;&lt;span class="o"&gt;]]&lt;/span&gt;&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
        &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"Instalação cancelada."&lt;/span&gt;
        &lt;span class="nb"&gt;exit &lt;/span&gt;0
    &lt;span class="k"&gt;fi
fi&lt;/span&gt;

&lt;span class="c"&gt;# 2. Verificar se Docker está instalado&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🐳 Verificando Docker..."&lt;/span&gt;
&lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="o"&gt;!&lt;/span&gt; &lt;span class="nb"&gt;command&lt;/span&gt; &lt;span class="nt"&gt;-v&lt;/span&gt; docker &amp;amp;&amp;gt; /dev/null&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="k"&gt;then
    &lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;RED&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;❌ Docker não encontrado. Execute primeiro o script 01-docker_install_script.sh&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
    &lt;span class="nb"&gt;exit &lt;/span&gt;1
&lt;span class="k"&gt;fi
&lt;/span&gt;&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;✓ Docker encontrado&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;

&lt;span class="c"&gt;# 3. Atualizar sistema&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"📦 Atualizando lista de pacotes..."&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update

&lt;span class="c"&gt;# 4. Instalar Driver NVIDIA&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🎮 Instalando driver NVIDIA versão &lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NVIDIA_DRIVER_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;..."&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nvidia-driver-&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NVIDIA_DRIVER_VERSION&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;

&lt;span class="c"&gt;# 5. Instalar NVIDIA Container Toolkit&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"📦 Configurando repositório NVIDIA Container Toolkit..."&lt;/span&gt;

&lt;span class="c"&gt;# Adicionar chave GPG&lt;/span&gt;
curl &lt;span class="nt"&gt;-fsSL&lt;/span&gt; https://nvidia.github.io/libnvidia-container/gpgkey | &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nb"&gt;sudo &lt;/span&gt;gpg &lt;span class="nt"&gt;--dearmor&lt;/span&gt; &lt;span class="nt"&gt;-o&lt;/span&gt; /usr/share/keyrings/nvidia-container-toolkit-keyring.gpg &lt;span class="nt"&gt;--yes&lt;/span&gt;

&lt;span class="c"&gt;# Adicionar repositório&lt;/span&gt;
curl &lt;span class="nt"&gt;-s&lt;/span&gt; &lt;span class="nt"&gt;-L&lt;/span&gt; https://nvidia.github.io/libnvidia-container/stable/deb/nvidia-container-toolkit.list | &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nb"&gt;sed&lt;/span&gt; &lt;span class="s1"&gt;'s#deb https://#deb [signed-by=/usr/share/keyrings/nvidia-container-toolkit-keyring.gpg] https://#g'&lt;/span&gt; | &lt;span class="se"&gt;\&lt;/span&gt;
    &lt;span class="nb"&gt;sudo tee&lt;/span&gt; /etc/apt/sources.list.d/nvidia-container-toolkit.list &lt;span class="o"&gt;&amp;gt;&lt;/span&gt; /dev/null

&lt;span class="c"&gt;# Instalar toolkit&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"⬇️  Instalando nvidia-container-toolkit..."&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get update
&lt;span class="nb"&gt;sudo &lt;/span&gt;apt-get &lt;span class="nb"&gt;install&lt;/span&gt; &lt;span class="nt"&gt;-y&lt;/span&gt; nvidia-container-toolkit

&lt;span class="c"&gt;# 6. Configurar Docker para usar runtime NVIDIA&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🔧 Configurando Docker para usar GPU NVIDIA..."&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;nvidia-ctk runtime configure &lt;span class="nt"&gt;--runtime&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;docker

&lt;span class="c"&gt;# 7. Reiniciar Docker&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"🔄 Reiniciando serviço Docker..."&lt;/span&gt;
&lt;span class="nb"&gt;sudo &lt;/span&gt;systemctl restart docker

&lt;span class="c"&gt;# 8. Resumo final&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;╔════════════════════════════════════════════════════════╗&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;║  ✅ Instalação concluída!                              ║&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;╚════════════════════════════════════════════════════════╝&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;YELLOW&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;⚠️  IMPORTANTE: Reinicie a máquina para carregar o driver:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"    sudo reboot"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="nt"&gt;-e&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;GREEN&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;Após reiniciar, valide a instalação com:&lt;/span&gt;&lt;span class="k"&gt;${&lt;/span&gt;&lt;span class="nv"&gt;NC&lt;/span&gt;&lt;span class="k"&gt;}&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"    # Verificar driver no host"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"    nvidia-smi"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"    # Testar GPU no Docker"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;"    docker run --rm --gpus all nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi"&lt;/span&gt;
&lt;span class="nb"&gt;echo&lt;/span&gt; &lt;span class="s2"&gt;""&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;salve com as teclas &lt;code&gt;Ctrl&lt;/code&gt; e &lt;code&gt;x&lt;/code&gt; e siga os caminhos para salvar e agora dê permissão para que ele também seja executável com o comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;chmod&lt;/span&gt; +x setup_nvidia.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;e agora vamos rodar com o comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;./setup_nvidia.sh
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;e correndo tudo bem na instalação, você verá o seguinte resultado no seu terminal&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70ycmik493lk3fs8qgz8.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F70ycmik493lk3fs8qgz8.png" alt="Instruções de sucesso para instalação dos drivers e container kit." width="800" height="663"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e seguindo as instruções, vamos reiniciar a máquina para aplicar as novas configurações de driver.&lt;br&gt;
com o comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;sudo &lt;/span&gt;reboot
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;aguarde alguns segundos para a máquina reiniciar, aproveite para pegar um café, um chá ou uma água.&lt;/p&gt;

&lt;p&gt;Agora vamos precisar acessar nossa máquina via SSH novamente, com o seguinte comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;ssh ubuntu@ip-da-sua-maquina
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Antes de seguirmos para a mão na massa para subir o nosso servidor e LLM, vamos validar se nossos passos anteriores funcionaram e tudo foi instalado corretamente com o comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker run &lt;span class="nt"&gt;--rm&lt;/span&gt; &lt;span class="nt"&gt;--gpus&lt;/span&gt; all nvidia/cuda:12.3.1-base-ubuntu22.04 nvidia-smi
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;e caso tenha o seguinte resultado, suas instalações estão corretas e podemos ir para a etapa de subir o nosso servidor de LLM. 🎉&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53pqymbavzwurgkfekh2.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F53pqymbavzwurgkfekh2.png" alt="nvidia-smi rodando dentro do container Docker" width="800" height="367"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Subindo aplicação
&lt;/h4&gt;

&lt;p&gt;Agora antes de partirmos para configurações e deploy, vale passarmos pela forma que foi pensada e arquitetada a essa nossa aplicação, passando por como ela vai ser servida, protegendo rotas do vLLM que não validam nosso token de autenticação e podem ser alvo de exploração por entidades mal-intencionadas.&lt;/p&gt;

&lt;h5&gt;
  
  
  Arquitetura da aplicação
&lt;/h5&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hcare8x9s8608bqiihi.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F1hcare8x9s8608bqiihi.png" alt="Desenho de como funcionará nossa aplicação, com um proxy reverso utilizando caddy" width="800" height="401"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Como podemos observar na imagem acima, a porta de entrada para o modelo é o servidor &lt;a href="https://caddyserver.com/" rel="noopener noreferrer"&gt;Caddy&lt;/a&gt;, pois quando expomos um servidor de LLM à internet, não basta configurar uma API Key e achar que está tudo protegido. O vLLM, por padrão, expõe endpoints como &lt;code&gt;/health&lt;/code&gt; , &lt;code&gt;/metrics&lt;/code&gt; e &lt;code&gt;/docs&lt;/code&gt; que não validam autenticação, criando uma superfície de ataque significativa para scanners automatizados e potenciais explorações. Seguindo as recomendações oficiais de segurança do vLLM, implementamos uma arquitetura de "defense in depth": o &lt;a href="https://docs.vllm.ai/en/stable/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt; roda em uma rede interna do Docker, completamente isolado da internet, enquanto o Caddy atua como nosso portão de entrada único. No Caddyfile, configuramos um allowlist restrita apenas com as rotas &lt;code&gt;/v1/chat/completions&lt;/code&gt; , &lt;code&gt;/v1/completions&lt;/code&gt; , &lt;code&gt;/v1/models&lt;/code&gt; (opcional e pode ser removida) e &lt;code&gt;/v1/embeddings&lt;/code&gt; (opcional e pode ser removida) que são encaminhadas para o vLLM. Todo o resto, incluindo os endpoints críticos &lt;code&gt;/metrics&lt;/code&gt; e o &lt;code&gt;/health&lt;/code&gt; interno do &lt;a href="https://docs.vllm.ai/en/stable/" rel="noopener noreferrer"&gt;vLLM&lt;/a&gt;, é bloqueado com abort, retornando 403 Forbidden. Isso significa que mesmo que alguém descubra o IP do servidor, não consegue extrair métricas de performance, status detalhados do modelo ou acessar documentação sensível. Complementamos com TLS 1.2/1.3 forçado via Caddy e a API Key do vLLM protegendo os endpoints OpenAI e com isto temos um servidor que segue as recomendações de segurança da &lt;a href="https://docs.vllm.ai/en/stable/usage/security/?h=secur#unprotected-endpoints-no-api-key-required" rel="noopener noreferrer"&gt;documentação oficial do vLLM&lt;/a&gt;, minimizamos os endpoints expostos, colocamos tudo atrás de um proxy reverso confiável, e mantemos o motor de inferência completamente inacessível diretamente e sua LLM roda segura, privada e pronta para produção.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Não cheguei a configurar o rate limit, mas deixo como dever de casa caso tenha interesse.

&lt;ul&gt;
&lt;li&gt;&lt;a href="https://github.com/mholt/caddy-ratelimit" rel="noopener noreferrer"&gt;caddy-ratelimit&lt;/a&gt;&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;p&gt;Agora contextualizados, vamos à mão na massa.&lt;/p&gt;

&lt;h5&gt;
  
  
  Configurações importantes
&lt;/h5&gt;

&lt;p&gt;Para subirmos a aplicação, vamos precisar criar os seguintes arquivos:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;.env&lt;/code&gt;: Onde incluiremos as nossas variáveis de ambiente usadas pelo Caddy e pelo vLLM.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;Caddyfile&lt;/code&gt;: Configurações do nosso servidor de aplicação para fazer proxy reverso para o vLLM.&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;compose.yml&lt;/code&gt;: Docker compose para orquestrarmos nossa aplicação de forma fácil.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;e para termos uma organização, vamos criar uma pasta para envolver os arquivos que vamos precisar criar, e para criar a pasta e os arquivos rode o seguinte comando no seu terminal:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="nb"&gt;mkdir &lt;/span&gt;llm-server&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;cd &lt;/span&gt;llm-server&lt;span class="p"&gt;;&lt;/span&gt; &lt;span class="nb"&gt;touch&lt;/span&gt; .env Caddyfile compose.yml
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;e depois valide se tudo correu bem, se sim terá um resultado como o seguinte:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52evqhlk6hu06od9143m.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F52evqhlk6hu06od9143m.png" alt="Criando pastas e arquivos para servir nossa aplicação e mostrando arquivos da pasta." width="800" height="168"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Feito isto, vamos configurar nosso &lt;code&gt;.env&lt;/code&gt; com as variáveis necessárias para rodar nossos containers, e para isso vamos utilizar o nano, o conteúdo que vamos precisar inserir no nosso arquivo são:&lt;/p&gt;

&lt;p&gt;O que é cada variável?&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;p&gt;vLLM&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;VLLM_API_KEY: Será o token que a API do vLLM vai validar a presença para permitir a inferência, pode gerar um para nosso teste com o comando &lt;code&gt;openssl passwd -6 "uma chave bem segura aqui"&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;HF_TOKEN: Token para baixar modelos privados, caso você tenha algum fine tuning e deseja servir ele, para conseguir este token pode seguir a &lt;a href="https://huggingface.co/docs/hub/en/security-tokens" rel="noopener noreferrer"&gt;documentação oficial&lt;/a&gt;. Neste caso como vamos utilizar um modelo aberto, pode deixar qualquer valor.&lt;/li&gt;
&lt;li&gt;MODEL_NAME: Nome do &lt;a href="https://huggingface.co/Qwen/Qwen3-30B-A3B-FP8" rel="noopener noreferrer"&gt;modelo do Hugging&lt;/a&gt; Face que iremos utilizar&lt;/li&gt;
&lt;li&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fu9iec3dg50ixd2hb7jfz.png" alt="Onde copiar o nome do modelo" width="800" height="414"&gt;&lt;/li&gt;
&lt;li&gt;Fica até como desafio, servir outro modelo como o &lt;a href="https://huggingface.co/openai/gpt-oss-20b" rel="noopener noreferrer"&gt;gpt-oss-20b&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;li&gt;

&lt;p&gt;Caddy&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;DOMAIN: O domínio em que vamos servir nossa aplicação.&lt;/li&gt;
&lt;li&gt;EMAIL_SSL: O email que o caddy utiliza para que o CertMagic renove seus certificados.
&lt;/li&gt;
&lt;/ul&gt;


&lt;/li&gt;

&lt;/ul&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;&lt;span class="c"&gt;# vLLM Configuration&lt;/span&gt;
&lt;span class="nv"&gt;VLLM_API_KEY&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;sua-chave-api-segura 
&lt;span class="nv"&gt;HF_TOKEN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;hf_seu_token_huggingface
&lt;span class="nv"&gt;MODEL_NAME&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;Qwen/Qwen3-30B-A3B-FP8

&lt;span class="c"&gt;# Caddy/SSL Configuration&lt;/span&gt;
&lt;span class="nv"&gt;DOMAIN&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;vllm-mgc.glaucio.tec.br
&lt;span class="nv"&gt;EMAIL_SSL&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;meuemail+vllm-server@lorem.com
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agora contextualizados vamos preencher nosso &lt;code&gt;.env&lt;/code&gt; , abra o arquivo com o comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;nano .env
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;e cole suas variáveis preenchidas com seus dados, use &lt;code&gt;Ctrl + x&lt;/code&gt; e quando aparecer "Save modified buffer?" confirme apertando a tecla &lt;code&gt;y&lt;/code&gt; e depois quando aparecer "File Name to Write: .env", aperte &lt;code&gt;Enter&lt;/code&gt; .&lt;/p&gt;

&lt;p&gt;Vamos para o nosso arquivo &lt;code&gt;Caddyfile&lt;/code&gt; , abra ele com o nano como já descrito no passo do &lt;code&gt;.env&lt;/code&gt; e cole nele o seguinte conteúdo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;{$DOMAIN} {
    # Habilita compressão Gzip/Zstd para performance
    encode gzip zstd

    # Configuração de Log (Opcional, bom para auditoria)
    log {
        output file /var/log/caddy/access.log
    }

    # REGRAS DE SEGURANÇA E ROTEAMENTO
    # Bloqueia tudo por padrão, exceto o bloco 'handle' abaixo

    # Página de status para verificar se o Caddy está online
    handle /health {
        respond "OK - Caddy is running on {$DOMAIN}" 200
    }

    # Página inicial com informações básicas
    handle / {
        header Content-Type text/html
        respond `&amp;lt;!DOCTYPE html&amp;gt;
&amp;lt;html&amp;gt;
&amp;lt;head&amp;gt;
    &amp;lt;title&amp;gt;vLLM API Server&amp;lt;/title&amp;gt;
    &amp;lt;style&amp;gt;
        body { font-family: system-ui, sans-serif; max-width: 600px; margin: 50px auto; padding: 20px; background: #1a1a2e; color: #eee; }
        h1 { color: #00d9ff; }
        .status { background: #16213e; padding: 20px; border-radius: 8px; margin: 20px 0; }
        .ok { color: #00ff88; }
        code { background: #0f3460; padding: 2px 6px; border-radius: 4px; }
    &amp;lt;/style&amp;gt;
&amp;lt;/head&amp;gt;
&amp;lt;body&amp;gt;
    &amp;lt;h1&amp;gt;vLLM API Server&amp;lt;/h1&amp;gt;
    &amp;lt;div class="status"&amp;gt;
        &amp;lt;p class="ok"&amp;gt;Proxy: Online&amp;lt;/p&amp;gt;
        &amp;lt;p&amp;gt;Domain: {$DOMAIN}&amp;lt;/p&amp;gt;
    &amp;lt;/div&amp;gt;
    &amp;lt;h3&amp;gt;API Endpoints:&amp;lt;/h3&amp;gt;
    &amp;lt;ul&amp;gt;
        &amp;lt;li&amp;gt;&amp;lt;code&amp;gt;/v1/chat/completions&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
        &amp;lt;li&amp;gt;&amp;lt;code&amp;gt;/v1/completions&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
        &amp;lt;li&amp;gt;&amp;lt;code&amp;gt;/v1/models&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
        &amp;lt;li&amp;gt;&amp;lt;code&amp;gt;/v1/embeddings&amp;lt;/code&amp;gt;&amp;lt;/li&amp;gt;
    &amp;lt;/ul&amp;gt;
    &amp;lt;p&amp;gt;&amp;lt;small&amp;gt;Health check: &amp;lt;a href="/health"&amp;gt;/health&amp;lt;/a&amp;gt;&amp;lt;/small&amp;gt;&amp;lt;/p&amp;gt;
&amp;lt;/body&amp;gt;
&amp;lt;/html&amp;gt;` 200
    }

    # Define o matcher para rotas OpenAI (Chat, Completions, Models, Embeddings)
    @openai_routes {
        path /v1/chat/completions
        path /v1/completions
        path /v1/models # pode remover caso não queria expor
        path /v1/embeddings # como vamos servir apenas LLM não é muito útil aqui no nosso caso
    }

    # Processa apenas as rotas permitidas
    handle @openai_routes {
        reverse_proxy vllm:8000 {
            # Configurações de timeout para streaming longo (LLMs)
            flush_interval -1
            transport http {
                response_header_timeout 300s
            }
        }
    }

    # Captura qualquer outra rota (metrics, health, root) e retorna 403 Forbidden
    handle {
        abort
    }

    # TLS Configuration (Automático, mas forçando protocolos seguros)
    tls {$EMAIL_SSL} {
        protocols tls1.2 tls1.3
    }
}
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;E por último nosso arquivo &lt;code&gt;compose.yml&lt;/code&gt; para subirmos a aplicação, abra ele com &lt;code&gt;nano compose.yml&lt;/code&gt; e cole o seguinte conteúdo:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight yaml"&gt;&lt;code&gt;&lt;span class="na"&gt;services&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
  &lt;span class="na"&gt;vllm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm/vllm-openai:latest&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;vllm_inference&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;oom_score_adj&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;-500&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;VLLM_API_KEY=${VLLM_API_KEY}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;HUGGING_FACE_HUB_TOKEN=${HF_TOKEN}&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./vllm_cache:/root/.cache/huggingface&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;6.0'&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;56G&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;4.0'&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;32G&lt;/span&gt;
          &lt;span class="na"&gt;devices&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
            &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="na"&gt;driver&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;nvidia&lt;/span&gt;
              &lt;span class="na"&gt;count&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;1&lt;/span&gt;
              &lt;span class="na"&gt;capabilities&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="nv"&gt;gpu&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
    &lt;span class="na"&gt;ipc&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;host&lt;/span&gt;
    &lt;span class="na"&gt;shm_size&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;16gb'&lt;/span&gt;
    &lt;span class="na"&gt;command&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;&amp;gt;&lt;/span&gt;
      &lt;span class="s"&gt;--model ${MODEL_NAME}&lt;/span&gt;
      &lt;span class="s"&gt;--served-model-name "qwen3-30b-a3b-fp8"&lt;/span&gt;
      &lt;span class="s"&gt;--host 0.0.0.0&lt;/span&gt;
      &lt;span class="s"&gt;--port 8000&lt;/span&gt;
      &lt;span class="s"&gt;--max-model-len 16384&lt;/span&gt;
      &lt;span class="s"&gt;--gpu-memory-utilization 0.95&lt;/span&gt;
      &lt;span class="s"&gt;--kv-cache-dtype fp8&lt;/span&gt;
      &lt;span class="s"&gt;--enable-prefix-caching&lt;/span&gt;
      &lt;span class="s"&gt;--max-num-seqs 8&lt;/span&gt;
      &lt;span class="s"&gt;--disable-log-requests&lt;/span&gt;
      &lt;span class="s"&gt;--reasoning-parser deepseek_r1&lt;/span&gt;
    &lt;span class="na"&gt;healthcheck&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;test&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="pi"&gt;[&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;CMD-SHELL"&lt;/span&gt;&lt;span class="pi"&gt;,&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;curl&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;-f&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;http://localhost:8000/health&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;||&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;exit&lt;/span&gt;&lt;span class="nv"&gt; &lt;/span&gt;&lt;span class="s"&gt;1"&lt;/span&gt;&lt;span class="pi"&gt;]&lt;/span&gt;
      &lt;span class="na"&gt;interval&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;15s&lt;/span&gt;
      &lt;span class="na"&gt;timeout&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;10s&lt;/span&gt;
      &lt;span class="na"&gt;retries&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="m"&gt;5&lt;/span&gt;
      &lt;span class="na"&gt;start_period&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;300s&lt;/span&gt;  &lt;span class="c1"&gt;# 5 minutos para carregar o modelo&lt;/span&gt;

  &lt;span class="na"&gt;proxy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
    &lt;span class="na"&gt;image&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;caddy:alpine&lt;/span&gt;
    &lt;span class="na"&gt;container_name&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;caddy_secure_proxy&lt;/span&gt;
    &lt;span class="na"&gt;restart&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;unless-stopped&lt;/span&gt;
    &lt;span class="na"&gt;deploy&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;resources&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;limits&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;1.0'&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;512M&lt;/span&gt;
        &lt;span class="na"&gt;reservations&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
          &lt;span class="na"&gt;cpus&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s1"&gt;'&lt;/span&gt;&lt;span class="s"&gt;0.5'&lt;/span&gt;
          &lt;span class="na"&gt;memory&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;64M&lt;/span&gt;
    &lt;span class="na"&gt;extra_hosts&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;host.docker.internal:host-gateway"&lt;/span&gt;
    &lt;span class="na"&gt;ports&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;80:80"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;443:443"&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="s"&gt;443:443/udp"&lt;/span&gt;
    &lt;span class="na"&gt;environment&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;DOMAIN=${DOMAIN}&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;EMAIL=${EMAIL_SSL}&lt;/span&gt;
    &lt;span class="na"&gt;volumes&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./Caddyfile:/etc/caddy/Caddyfile&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./caddy_data:/data&lt;/span&gt;
      &lt;span class="pi"&gt;-&lt;/span&gt; &lt;span class="s"&gt;./caddy_config:/config&lt;/span&gt;
    &lt;span class="na"&gt;depends_on&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
      &lt;span class="na"&gt;vllm&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt;
        &lt;span class="na"&gt;condition&lt;/span&gt;&lt;span class="pi"&gt;:&lt;/span&gt; &lt;span class="s"&gt;service_healthy&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Com nossos arquivos preenchidos, podemos executar o comando para subir nosso docker compose:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nt"&gt;--env-file&lt;/span&gt; .env up
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;e com isso começará a baixar as imagens que vamos utilizar:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6i3qqw5ftmakvo2pmlx.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fy6i3qqw5ftmakvo2pmlx.png" alt="Baixando imagens dos nossos containers" width="800" height="301"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Depois de ter baixado as imagens dos nossos containers, vai aparecer na sua tela a inicialização do vLLM&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F056m7z49sfa11bazhrw5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F056m7z49sfa11bazhrw5.png" alt="Baixando imagens dos nossos containers" width="800" height="493"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e após o vLLM baixar os pesos do modelo ele irá iniciar o caddy e aguarde um tempo para configurações de SSL sejam feitas&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4xmez22xknnucb3x8ejw.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F4xmez22xknnucb3x8ejw.png" alt="Servidor de aplicação subindo" width="800" height="494"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Aí agora para deixar os containers rodando em background, aperte a tecla &lt;code&gt;d&lt;/code&gt; como sugerido na parte inferior do terminal&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixlrt7e9ct571fvvri1f.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fixlrt7e9ct571fvvri1f.png" alt="Mostrando como liberar terminal" width="800" height="871"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e para não ter de passar por isso na próxima, suba os containers com &lt;code&gt;-d&lt;/code&gt; no comando para subir os containers e já liberar o terminal&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose &lt;span class="nt"&gt;--env-file&lt;/span&gt; .env up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Agora feito tudo isso, nossa aplicação está rodando lindamente, mas ainda não está respondendo pois precisaremos fazer as configurações de rede para liberar as portas 80 e 443 do nosso servidor.&lt;/p&gt;

&lt;p&gt;No próximo passo irei explicar como resolver o erro do servidor não responder&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2pvqjhrx1lqt0so0o95.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fn2pvqjhrx1lqt0so0o95.png" alt="Erro que vai aparecer por não termos configurado a porta 443 para nossa VM." width="800" height="481"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;h4&gt;
  
  
  Configuração de rede para liberar acessos externos pelo nosso DNS
&lt;/h4&gt;

&lt;p&gt;Agora vamos para a última etapa de configuração antes de podermos usar nossa LLM, e fazer nossas tão desejadas chamadas à API.&lt;/p&gt;

&lt;p&gt;Acesse novamente seu &lt;a href="https://console.magalu.cloud/" rel="noopener noreferrer"&gt;console do Magalu Cloud&lt;/a&gt;, estando na tela inicial, clique no menu sanduíche no lado superior direito e com isto irá abrir o menu lateral.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffue6zvanpcywwo1yf73j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffue6zvanpcywwo1yf73j.png" alt="Como Chegar na tela Grupo de Segurança" width="800" height="777"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;depois clique na opção com o texto "Network", ele irá abrir um submenu flutuante à esquerda dele e então clique na opção "Grupo de Segurança" e você será redirecionado para a tela de Grupo de Segurança.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fony6174e0e66wjtm4m8n.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fony6174e0e66wjtm4m8n.png" alt="Tela Grupo de Segurança" width="800" height="479"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Chegando na tela Grupo de Segurança, clique no botão "Criar grupo de segurança" e irá abrir uma tela para nomear e dar uma descrição para seu novo grupo de segurança, e preencha com um nome que vá lembrar, no meu caso coloquei "tutorial-vllm-l40s" como pode ser visto na imagem abaixo.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqd3b96oy1py61gbip18.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Foqd3b96oy1py61gbip18.png" alt="Criando um Grupo de Segurança" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;e confirmando a criação do nosso grupo, voltaremos para a página inicial de Grupo de Segurança, agora vamos precisar clicar no novo grupo que criamos, no meu caso foi "tutorial-vllm-l40s".&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftxpuzqi87f7n4hjb404l.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftxpuzqi87f7n4hjb404l.png" alt="Criando um Grupo de Segurança" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clicando nele, iremos para a página do nosso Grupo de Segurança e nela iremos clicar no botão "Adicionar regra", preencha o formulário para que nosso servidor possa responder às requisições que faremos para ele. Devemos liberar a porta 443 para entrada, aceitando requisições de qualquer origem; ou, caso você tenha um IP fixo que sempre chamará a API, pode incluí-lo no preenchimento. O meu ficou da seguinte forma:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcsv4mgea7xcz0twblp5b.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fcsv4mgea7xcz0twblp5b.png" alt="Liberando porta 443" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Clique em "Adicionar regra", e após fechar a tabela onde lista as regras deve conter agora a regra de entrada para a porta 443:&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7z3ttk5p1jj59gowa5zm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F7z3ttk5p1jj59gowa5zm.png" alt="Regra aplicada para entrada" width="800" height="41"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;agora vá até a página da nossa máquina virtual e clique na "Tabzinha" (Contornada na imagem em vermelho) de rede e verá uma tela como a do print abaixo.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffd93dm596tlvn607y6g5.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ffd93dm596tlvn607y6g5.png" alt="Tela de rede da máquina virtual" width="800" height="480"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Estando nesta página, clique no botão azul com o texto "Adicionar grupo de segurança", na modal que irá abrir, busque o Grupo de segurança que criamos e configuramos.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F63qo4o8tcair9p70xy5j.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F63qo4o8tcair9p70xy5j.png" alt="Modal para aplicar regras de grupo de segurança" width="800" height="617"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Após adicionar, podemos validar se o nosso domínio está respondendo às requisições&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x0kqyxw9w0051fem7bm.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2F0x0kqyxw9w0051fem7bm.png" alt="Tela de rede da máquina virtual" width="800" height="478"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Caso suas configurações estiverem corretas, você verá uma tela como acima, mas com o seu domínio.&lt;/p&gt;

&lt;p&gt;Agora podemos fazer chamadas para nosso modelo com o seguinte comando:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;curl &lt;span class="nt"&gt;-X&lt;/span&gt; POST &lt;span class="s1"&gt;'https://vllm-mgc.glaucio.tec.br/v1/chat/completions'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s1"&gt;'Content-Type: application/json'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--header&lt;/span&gt; &lt;span class="s1"&gt;'Authorization: Bearer sua-chave-api-segura'&lt;/span&gt; &lt;span class="se"&gt;\&lt;/span&gt;
  &lt;span class="nt"&gt;--data&lt;/span&gt; &lt;span class="s1"&gt;'{
  "model": "qwen3-30b-a3b-fp8",
  "messages": [
    {
      "role": "system",
      "content": "Você é um assistente muito inteligente, que pensa antes de qualquer resposta, que segue a ciência, e deve ser muito dedicado a acertar."
    },
    {
      "role": "user",
      "content": "você conhece o acre? /no_think"
    }
  ]
}'&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Como estamos servindo um modelo com capacidade de "raciocínio", para habilitar enviamos na mensagem o texto &lt;code&gt;/think&lt;/code&gt; e para desabilitar envie &lt;code&gt;/no_think&lt;/code&gt; e configuramos o parser &lt;code&gt;--reasoning-parser deepseek_r1&lt;/code&gt; , então quando o "raciocínio" for ativado ele virá em &lt;code&gt;choices[0].message.reasoning&lt;/code&gt; e caso esteja desativado, essa chave terá o valor nulo.&lt;/p&gt;

&lt;p&gt;Exemplo de resposta com &lt;code&gt;/think&lt;/code&gt; :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chatcmpl-871a1a570fd3a0d9"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat.completion"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"created"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1769888901&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen3-30b-a3b-fp8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Sim, conheço o **acre**! Ele pode se referir a **dois conceitos principais**:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;---&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;### 1. **Unidade de área (acre)**  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;O **acre** é uma unidade de medida de área do sistema imperial, amplamente usada em países como os Estados Unidos, Reino Unido e outros.  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Definição**: 1 acre = **4.046,86 metros quadrados** (m²).  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Equivalência**:  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;  - 1 acre ≈ 0,4047 hectares (ha).  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;  - 1 hectare ≈ 2,471 acres.  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Uso comum**: Medição de terrenos, campos de futebol, fazendas, etc.  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;  - Por exemplo, um campo de futebol oficial tem cerca de **0,71 acre**.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;---&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;### 2. **Estado do Acre (Brasil)**  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;O **Acre** também é um **estado da Amazônia brasileira**, localizado no **norte do Brasil**, na região da **Amazônia Ocidental**.  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Capital**: **Rio Branco**.  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Características**:  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;  - Faz parte da **Bacia do Rio Amazonas**.  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;  - Conhecido por sua **floresta tropical**, riqueza em biodiversidade e culturas indígenas.  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;  - Tem uma economia baseada em **extrativismo (como a seringueira)** e **agricultura**.  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;  - O **Parque Nacional do Acre** e a **Reserva da Biosfera do Acre** são áreas protegidas.  &lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **População**: Cerca de **900.000 habitantes** (estatísticas de 2023).  &lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;---&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Se você se referia a algo específico (como uma empresa, termo técnico ou outro contexto), me avise que posso aprofundar! 😊"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"refusal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"annotations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"audio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"function_call"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"tool_calls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Okay, the user is asking if I know about Acre. First, I need to confirm what exactly they're referring to. Acre can mean a few things. The most common is the unit of area, which is 4,046.86 square meters. But there's also Acre as a place, like the Acre state in Brazil, or maybe even a company or a term in another context.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;I should start by acknowledging that Acre can refer to multiple things. Let me break it down. First, the unit of area. Explain that it's a unit used in the imperial system, commonly used in countries like the US, UK, and others. Mention its conversion to square meters and other units. Then, mention the Brazilian state, Acre, located in the northwest of Brazil, part of the Amazon region. Highlight its geography, maybe the capital, Rio Branco, and some key points about the state, like its natural resources or cultural aspects.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Wait, the user might be asking about the state, but I should cover both possibilities. Also, check if there's another context, like a company or a term in a different field. But I think the two main ones are the unit and the state. Make sure to present both clearly. Avoid any confusion by structuring the answer with headings or bullet points. Also, verify the accuracy of the information, like the exact area of an acre, the location of Acre state, and its capital. Maybe mention that the state is known for its rainforests and biodiversity. Double-check the conversion factors to ensure they're correct. Alright, that should cover the main points.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"reasoning_content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;Okay, the user is asking if I know about Acre. First, I need to confirm what exactly they're referring to. Acre can mean a few things. The most common is the unit of area, which is 4,046.86 square meters. But there's also Acre as a place, like the Acre state in Brazil, or maybe even a company or a term in another context.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;I should start by acknowledging that Acre can refer to multiple things. Let me break it down. First, the unit of area. Explain that it's a unit used in the imperial system, commonly used in countries like the US, UK, and others. Mention its conversion to square meters and other units. Then, mention the Brazilian state, Acre, located in the northwest of Brazil, part of the Amazon region. Highlight its geography, maybe the capital, Rio Branco, and some key points about the state, like its natural resources or cultural aspects.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Wait, the user might be asking about the state, but I should cover both possibilities. Also, check if there's another context, like a company or a term in a different field. But I think the two main ones are the unit and the state. Make sure to present both clearly. Avoid any confusion by structuring the answer with headings or bullet points. Also, verify the accuracy of the information, like the exact area of an acre, the location of Acre state, and its capital. Maybe mention that the state is known for its rainforests and biodiversity. Double-check the conversion factors to ensure they're correct. Alright, that should cover the main points.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"logprobs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"finish_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stop"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"stop_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"token_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"service_tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"system_fingerprint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;58&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;836&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"completion_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;778&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt_tokens_details"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt_logprobs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt_token_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"kv_transfer_params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Exemplo de resposta com &lt;code&gt;/no_think&lt;/code&gt; :&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight json"&gt;&lt;code&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"id"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chatcmpl-a0710ba10d25e658"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"object"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"chat.completion"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"created"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;1769889464&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"model"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"qwen3-30b-a3b-fp8"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"choices"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"index"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;0&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"message"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"role"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"assistant"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Claro, eu conheço o Acre! O Acre é um estado da Amazônia brasileira, localizado no extremo oeste do Brasil, fazendo fronteira com a Bolívia e o Peru. É um dos estados mais extensos do Brasil e é conhecido por sua vasta floresta tropical, rica em biodiversidade e por ser um dos últimos grandes territórios preservados da Amazônia.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;Alguns pontos interessantes sobre o Acre:&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;- **Capital**: Rio Branco&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **População**: Cerca de 900 mil habitantes (dados aproximados)&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Cultura**: Mistura de influências indígenas, brasileiras e bolivianas.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Economia**: Baseada principalmente na agricultura (como a seringueira, o cauim, o açaí), na pecuária e na exploração de recursos naturais.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **Ecologia**: É uma das regiões mais preservadas da Amazônia, com muitas áreas de proteção ambiental.&lt;/span&gt;&lt;span class="se"&gt;\n&lt;/span&gt;&lt;span class="s2"&gt;- **História**: Foi cedido ao Brasil pela Bolívia em 1903, através do Tratado de Petrópolis.&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;O Acre também é famoso por sua paisagem única, com rios, florestas e uma cultura rica e diversificada. Se quiser, posso te contar mais sobre algum aspecto específico do Acre!"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"refusal"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"annotations"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"audio"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"function_call"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"tool_calls"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;[],&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"reasoning"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
        &lt;/span&gt;&lt;span class="nl"&gt;"reasoning_content"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="se"&gt;\n\n&lt;/span&gt;&lt;span class="s2"&gt;"&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"logprobs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"finish_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="s2"&gt;"stop"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"stop_reason"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
      &lt;/span&gt;&lt;span class="nl"&gt;"token_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;],&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"service_tier"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"system_fingerprint"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"usage"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;60&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"total_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;380&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"completion_tokens"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="mi"&gt;320&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
    &lt;/span&gt;&lt;span class="nl"&gt;"prompt_tokens_details"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt_logprobs"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"prompt_token_ids"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;&lt;span class="w"&gt;
  &lt;/span&gt;&lt;span class="nl"&gt;"kv_transfer_params"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt;&lt;span class="w"&gt; &lt;/span&gt;&lt;span class="kc"&gt;null&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;&lt;span class="w"&gt;
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Então a partir de agora é só ser feliz.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8pcd9g9e2xxftkj9aex.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fa8pcd9g9e2xxftkj9aex.png" alt="Imagem de encerramento do looney tunes" width="400" height="300"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Em breve trarei mais um post fazendo múltiplas requisições para nosso modelo, trazendo números de tempo de resposta e qualidade das respostas do modelo.&lt;/p&gt;

</description>
      <category>ai</category>
      <category>llm</category>
      <category>vllm</category>
      <category>docker</category>
    </item>
  </channel>
</rss>
