why i did this
i spend a lot of time analysing company reports — annual filings, quarterly earnings, financial statements. i was using chatgpt and claude for this but had a few problems:
- privacy — i didn't want to upload sensitive financial documents to third party AI services
- cost — the subscriptions add up, especially if you want the good models
- control — i wanted to customise the system prompt, use specific models, and share access with a friend who does similar analysis
so i decided to set up my own private chatgpt-like interface running on a rented GPU. the whole thing costs me less than £1 per session and my data never leaves a server i control.
here's exactly how i did it.
what you're building
by the end of this guide you'll have:
- a chatgpt-like web interface you can access from any browser
- the ability to upload PDFs and chat about them (company reports, annual filings, etc.)
- a vision model that can read charts, tables and images from reports
- multi-user access so you can share it with a colleague or friend
- a web research tool that searches the internet and analyses results using your local model
and it all runs on a rented GPU that you can stop and start whenever you want.
the stack
- Vast.ai — rent a GPU by the hour (way cheaper than buying one)
- Ollama — runs AI models on the GPU
- Open WebUI — the chatgpt-like interface with PDF upload, conversation history, user management
- Qwen2.5:32b — brilliant open source model for financial analysis
- Qwen2.5-VL:7b — vision model for reading charts and images
total cost: roughly $0.40/hour when running, basically nothing when stopped.
step 1: rent a GPU on vast.ai
go to cloud.vast.ai and create an account if you haven't already. add some credit — $5-10 is enough to get started.
picking the right GPU
click Search and look for instances with:
- GPU: RTX 5090 (32GB), RTX 4090 (24GB), RTX 3090 (24GB), or A6000 (48GB)
- VRAM: 24GB minimum, 32GB+ ideal
- Disk: 60GB+
- Reliability: 95%+
for company report analysis, the 32B parameter model gives the best results and needs around 20GB of VRAM. if you can only get a 24GB card, the 14B model still works well.
selecting the template
this is the key bit that saves you loads of setup time.
- click Select Template on the left panel
- scroll down and find "Open Webui (Ollama)"
- select it
this template comes pre-configured with Ollama and Open WebUI already installed. no messing about with docker commands or manual installation.
before you click rent
- set Container Size to at least 80GB (models take up space)
- pick your instance from the list
- click RENT
a word of warning
i tried an RTX 5090 instance first and got a GPU error — Error: GPU error, unable to start instance. this happens sometimes on vast.ai, especially with newer hardware. if this happens to you, just destroy the instance and rent a different one. my second attempt with a different host worked within seconds.
step 2: wait for it to boot
once you click rent, go to the Instances tab in the left sidebar. you'll see your instance with a status indicator.
it goes through a few stages:
- Loading — downloading the docker image
- Starting — preparing GPUs
- Running — ready to go
the first boot can take anywhere from 30 seconds to 20 minutes depending on whether the host has the docker image cached. if it's taking ages, check the status text — you should see progress like "Pull complete" and "Verifying Checksum".
once it shows a blue "Open" button, you're good.
step 3: download the AI models
click Open on your instance. you'll see the vast.ai applications dashboard with several options.
click "Launch Application" on Jupyter Terminal. this opens a command line in your browser.
run these commands:
ollama pull qwen2.5:32b
this downloads the main analysis model (~19GB). takes about 10 minutes depending on the connection speed.
then download the vision model for reading charts and images:
ollama pull qwen2.5vl:7b
note: it's qwen2.5vl not qwen2.5-vl — i got tripped up by this myself. the download is about 6GB.
you can verify both models are there by running:
ollama list
step 4: open the chat interface
go back to the applications dashboard and click "Launch Application" on Open Webui.
this opens the chatgpt-like interface. the first thing you'll see is a signup page.
important: the first person to sign up becomes the admin. so make sure you create your account before sharing the link with anyone.
once you're in, you'll see a familiar chat interface with a model dropdown at the top. select qwen2.5:32b and start chatting.
step 5: set up the system prompt
this makes a massive difference to the quality of analysis you get.
in Open WebUI, go to Settings (the sliders icon at the top) and find the System Prompt section. paste this:
You are a senior financial analyst with 20 years of experience analysing
company reports, annual filings, quarterly earnings, and market data.
When analysing any document or question:
- Always cite specific numbers with their context (page, section, table)
- Flag inconsistencies between different sections of a report
- Compare metrics against industry benchmarks
- Identify both risks and opportunities
- Be precise — if you're unsure about a number, say so
- Never fabricate or hallucinate data points
Structure detailed analyses as:
1. EXECUTIVE SUMMARY
2. KEY FINANCIALS (revenue, profit, margins, growth)
3. RISKS & RED FLAGS
4. OPPORTUNITIES
5. OUTLOOK & RECOMMENDATION
For quick questions, respond concisely without this structure.
the difference between a generic model response and one with a good system prompt is night and day. it stops the model from being wishy-washy and forces it to give you structured, actionable analysis.
step 6: upload and analyse PDFs
click the "+" button at the bottom-left of the chat input to upload files.
Open WebUI extracts text from PDFs automatically using its built-in RAG (retrieval augmented generation) pipeline. so you can upload an annual report and ask things like:
- "summarise the key financials from this report"
- "what are the main risk factors mentioned?"
- "compare the revenue growth year over year"
- "what does management say about future outlook?"
- "extract all the numbers from the balance sheet"
for charts and images
switch to the qwen2.5vl:7b model using the dropdown at the top. this model can see and understand images. upload a screenshot of a chart or table and ask:
- "what does this chart show?"
- "extract the data from this table"
- "what trend is this graph showing?"
step 7: share with your friend
this was one of my main requirements — being able to share access with someone else for joint analysis.
the URL in your browser when you have Open WebUI open is your access link. it looks something like:
https://something-something.trycloudflare.com/
send that to your friend. they can create their own account with their own username and password.
lock it down
once your friend has signed up, you don't want random people creating accounts. as the admin:
- click your profile icon (bottom-left)
- go to Admin Panel
- navigate to Settings → General
- toggle off "Enable New Sign Ups"
- save
now only you two can access it.
step 8: stop it when you're done
this is crucial for managing costs.
when you're done analysing for the day:
- go to your vast.ai instances page
- click the stop button (the power icon) — NOT destroy
- your instance goes to sleep
when stopped, you only pay a tiny storage fee (around $0.02/hr). all your models, conversations, and settings are preserved.
when you want to use it again, just click resume and wait a minute for it to start up. everything will be exactly as you left it.
destroy the instance only when you're completely done and don't need it anymore.
bonus: web research tool
i also set up a python script that searches the web and feeds the results to the local model for analysis. this is useful for enriching your PDF analysis with current market data, news, and competitor information.
SSH into your instance or use the Jupyter Terminal and create this file:
# web-research.py
from openai import OpenAI
from duckduckgo_search import DDGS
client = OpenAI(
base_url="http://localhost:11434/v1",
api_key="ollama"
)
def research(query):
# search the web
print(f"searching: {query}")
with DDGS() as ddgs:
results = list(ddgs.text(query, max_results=8))
search_text = "\n".join(
[f"[{i+1}] {r['title']}: {r['body']}" for i, r in enumerate(results)]
)
# analyse with local model
response = client.chat.completions.create(
model="qwen2.5:32b",
messages=[
{"role": "system", "content": "You are a financial analyst. Cite sources, flag risks, be precise."},
{"role": "user", "content": f"Research: {query}\n\nResults:\n{search_text}\n\nProvide analysis."}
],
temperature=0.3
)
print(response.choices[0].message.content)
# usage
import sys
if len(sys.argv) > 1:
research(" ".join(sys.argv[1:]))
install the dependencies:
pip install openai duckduckgo-search
run it:
python3 web-research.py "Tesla Q4 2025 earnings analysis"
python3 web-research.py "Compare Nvidia vs AMD data centre revenue"
costs breakdown
let's talk real numbers.
| what | cost |
|---|---|
| RTX 5090 (32GB) per hour | ~$0.40 |
| a typical 2-3 hour analysis session | ~$0.80-1.20 |
| storage while stopped per day | ~$0.50 |
| monthly usage (2hrs/day, 20 days) | ~$25-35 |
compare that to chatgpt plus at $20/month or claude pro at $20/month — you're getting a much more powerful model with complete privacy and control for roughly the same price.
what i learned
a few things worth noting from going through this process:
template selection matters. using the pre-built "Open Webui (Ollama)" template on vast.ai saved me hours of setup time. trying to install everything manually from a bare ubuntu image is painful and error-prone.
host reliability varies. my first GPU instance failed with a GPU error. the second one worked instantly. if something fails, just destroy it and try a different host. don't waste time debugging someone else's hardware.
model names can be tricky. qwen2.5-vl:7b doesn't exist but qwen2.5vl:7b does. small details like hyphens matter. always check the ollama library for exact model names.
system prompts make a huge difference. the same model with a generic prompt gives you generic answers. with a targeted financial analysis prompt, it gives you structured, specific, actionable insights. invest time in getting your system prompt right.
the 32B model is the sweet spot. it's significantly better than 14B models at understanding context, catching nuances in financial documents, and giving structured analysis. if you can get a 32GB VRAM GPU, go for the 32B model. if not, 14B is still decent.
the model choice guide
depending on what GPU you can rent:
| VRAM available | best model | quality |
|---|---|---|
| 24GB | qwen2.5:14b | good for summaries and basic analysis |
| 32GB | qwen2.5:32b | excellent — catches nuances, structured output |
| 48GB+ | qwen2.5:72b | best possible — deep analysis, cross-referencing |
for the vision model (charts, images), qwen2.5vl:7b works great on any of these setups.
wrapping up
the whole setup takes about 30 minutes from zero to a working system. the first time i did it, it took longer because of the GPU error and the model name typo, but once you know the steps it's straightforward.
if you're doing any kind of company analysis, financial research, or document review, this is genuinely worth setting up. you get the power of a large language model with complete privacy and control over your data.
the best part? when you're not using it, you stop the instance and it costs almost nothing. when you need it again, it boots up in a minute with everything preserved.
if you've got questions or run into issues, drop a comment and i'll try to help.
tools used: vast.ai | ollama | open webui | qwen2.5
Top comments (0)