Nathan Sportsman

Posted on Feb 4 • Originally published at praetorian.com

Shadow AI Is Everywhere: Meet Julius, the Open-Source LLM Fingerprinting Tool

#security #opensource #go #ai

The Growing Shadow AI Problem

Over 14,000 Ollama server instances are publicly accessible on the internet right now.

A recent Cisco analysis found that 20% of these actively host models susceptible to unauthorized access. Separately, BankInfoSecurity reported discovering more than 10,000 Ollama servers with no authentication layer, the result of hurried AI deployments by developers under pressure.

This is the new shadow IT: developers spinning up local LLM servers for productivity, unaware they've exposed sensitive infrastructure to the internet. And Ollama is just one of dozens of AI serving platforms proliferating across enterprise networks.

The security question is no longer "are we running AI?" but "where is AI running that we don't know about?"

What is LLM Service Fingerprinting?

LLM service fingerprinting identifies what server software is running on a network endpoint, not which AI model generated text, but which infrastructure is serving it.

The LLM security space spans multiple tool categories, each answering a different question:

"What ports are open?" → Nmap

"What service is on this port?" → Praetorian Nerva (will be open-sourced)

"Is this HTTP service an LLM?" → Praetorian Julius

"Which LLM wrote this text?" → Model fingerprinting

"Is this prompt malicious?" → Input guardrails

"Can this model be jailbroken?" → Nvidia Garak, Praetorian Augustus (will be open-sourced)

Julius answers the third question: during a penetration test or attack surface assessment, you've found an open port. Is it Ollama? vLLM? A Hugging Face deployment? Some enterprise AI gateway? Julius tells you in seconds.

Julius follows the Unix philosophy: do one thing and do it well. It doesn't port scan. It doesn't vulnerability scan. It identifies LLM services, nothing more, nothing less.

This design enables Julius to slot into existing security toolchains rather than replace them.

Why Existing Detection Methods Fall Short

Manual Detection is Slow and Error-Prone

Each LLM platform has different API signatures, default ports, and response patterns:

Ollama: port 11434, /api/tags returns {"models": […]}

vLLM: port 8000, OpenAI-compatible /v1/models

LiteLLM: port 4000, proxies to multiple backends

LocalAI: port 8080, /models endpoint

Manually checking each possibility during an assessment wastes time and risks missing services.

Shodan Queries Have Limitations

A Cisco study found ~1,100 Ollama instances were indexed on Shodan. While interesting, replicating the research requires a Shodan license.

Ollama-only detection → Misses vLLM, LiteLLM, and 15+ other platforms

Passive database queries → Data lags behind real-time deployments

Requires Shodan subscription → Cost barrier for some teams

No model enumeration → Can't identify what's deployed

Introducing Julius

Julius is an open-source LLM service fingerprinting tool that detects 17+ AI platforms through active HTTP probing. Built in Go, it compiles to a single binary with no external dependencies.

# Installation
go install github.com/praetorian-inc/julius/cmd/julius@latest   

# Basic usage
julius probe https://target.example.com:11434

Example output:

TARGET                        SERVICE   SPECIFICITY  CATEGORY     MODELS
https://target.example.com    ollama    100          self-hosted  llama2, mistral

Julius vs Alternatives

How Julius Works

Julius uses a probe-and-match architecture optimized for speed and accuracy:

Target URL → Load Probes → HTTP Probes → Rule Match → Specificity Scoring → Report Service

Architectural Decisions

Julius is designed for performance in large-scale assessments:

Concurrent scanning with errgroup → Scan 50+ targets in parallel without race conditions

Response caching with singleflight → Multiple probes hitting /api/models trigger only one HTTP request

Embedded probes compiled into binary → True single-binary distribution, no external files

Port-based probe prioritization → Target on :11434 runs Ollama probes first for faster identification

MD5 response deduplication → Identical responses across targets are processed once

Project Structure

cmd/julius/          CLI entrypoint                
pkg/                                                                 
  runner/            Command execution (probe, list, validate)
  scanner/           HTTP client, response caching, model extraction 
  rules/             Match rule engine (status, body, header pattern)
  output/            Formatters (table, JSON, JSONL)
  probe/             Probe loader (embedded YAML + filesystem)    
  types/             Core data structures
probes/              YAML probe definitions (one per service)

Detection Process

Target Normalization: Validates and normalizes input URLs
Probe Selection: Prioritizes probes matching the target's port (if :11434, Ollama probes run first)
HTTP Probing: Sends requests to service-specific endpoints
Rule Matching: Compares responses against signature patterns
Specificity Scoring: Ranks results 1-100 by most specific match
Model Extraction: Optionally retrieves deployed models via JQ expressions

Specificity Scoring: Eliminating False Positives

Many LLM platforms implement OpenAI-compatible APIs. If Julius detects both "OpenAI-compatible" (specificity: 30) and "LiteLLM" (specificity: 85) on the same endpoint, it reports LiteLLM first.

This prevents the generic "OpenAI-compatible" match from obscuring the actual service identity.

Match Rule Engine

Julius uses six rule types for fingerprinting:

status → HTTP status code (example: 200 confirms endpoint exists)

body.contains → JSON structure detection (example: "models": identifies list responses)

body.prefix → Response format identification (example: {"object": matches OpenAI-style)

content-type → API vs HTML differentiation (example: application/json)

header.contains → Service-specific headers (example: X-Ollama-Version)

header.prefix → Server identification (example: uvicorn ASGI fingerprint)

All rules support negation with not: true, crucial for distinguishing similar services. For example: "has /api/tags endpoint" AND "does NOT contain LiteLLM" ensures Ollama detection doesn't match LiteLLM proxies.

Julius also caches HTTP responses during a scan, so multiple probes targeting the same endpoint don't result in duplicate requests. You can write 100 probes that check / for different signatures without overloading the target. Julius fetches the page once and evaluates all matching rules against the cached response.

Probes Included in Initial Release

Self-Hosted LLM Servers

Ollama (port 11434) → /api/tags JSON response + "Ollama is running" banner

vLLM (port 8000) → /v1/models with Server: uvicorn header + /version endpoint

LocalAI (port 8080) → /metrics endpoint containing "LocalAI" markers

llama.cpp (port 8080) → /v1/models with owned_by: llamacpp OR Server: llama.cpp header

Hugging Face TGI (port 3000) → /info endpoint with model_id field

LM Studio (port 1234) → /api/v0/models endpoint (LM Studio-specific)

Nvidia NIM (port 8000) → /v1/metadata with modelInfo + /v1/health/ready

Proxy & Gateway Services

LiteLLM (port 4000) → /health with healthy_endpoints or litellm_metadata JSON

Kong (port 8000) → Server: kong header + /status endpoint

Enterprise Cloud Platforms

Salesforce Einstein (port 443) → Messaging API auth endpoint error response

ML Demo Platforms

Gradio (port 7860) → /config with mode and components

RAG Platforms

AnythingLLM (port 3001) → HTML containing "AnythingLLM"

Chat Frontends

Open WebUI (port 3000) → /api/config with "name":"Open WebUI"

LibreChat (port 3080) → HTML containing "LibreChat"

SillyTavern (port 8000) → HTML containing "SillyTavern"

Better ChatGPT (port 3000) → HTML containing "Better ChatGPT"

Generic Detection

OpenAI-compatible (varied ports) → /v1/models with standard response structure

Extending Julius with Custom Probes

Adding support for a new LLM service requires ~20 lines of YAML, no code changes:

# probes/my-service.yaml
name: my-llm-service
description: Custom LLM service detection
category: self-hosted
port_hint: 8080
specificity: 75
api_docs: https://example.com/api-docs

requests:
  - type: http
    path: /health
    method: GET
    match:
      - type: status
        value: 200
      - type: body.contains
        value: '"service":"my-llm"'

  - type: http
    path: /api/version
    method: GET
    match:
      - type: status
        value: 200
      - type: content-type
        value: application/json

models:
  path: /api/models
  method: GET
  extract: ".models[].name"

Validate your probe:

julius validate ./probes

Real World Usage

Single Target Assessment

julius probe https://target.example.com                               

julius probe https://target.example.com:11434

julius probe 192.168.1.100:8080

Scan Multiple Targets From a File

julius probe -f targets.txt

Output Formats

# Table (default) - human readable                                                                                                    
julius probe https://target.example.com                                                                                               

# JSON - structured for parsing                                                                                                       
julius probe -o json https://target.example.com                                                                                       

# JSONL - streaming for large scans                                                                                                   
julius probe -o jsonl -f targets.txt | jq '.service'

What's Next

Julius is the first tool release of our "The 12 Caesars" open source tool campaign where we will be releasing one open source tool per week for the next 12 weeks.

Julius focuses on HTTP-based fingerprinting of known LLM services. We're already working on expanding its capabilities while maintaining the lightweight, fast execution that makes it practical for large-scale reconnaissance.

On our roadmap: additional probes for cloud-hosted LLM services, smarter detection of custom integrations, and the ability to analyze HTTP traffic patterns to identify LLM usage that doesn't follow standard API conventions. We're also exploring how Julius can work alongside AI agents to autonomously discover LLM infrastructure across complex environments.

Contributing & Community

Julius is available now under the Apache 2.0 license at github.com/praetorian-inc/julius

We welcome contributions from the community. Whether you're adding probes for services we haven't covered, reporting bugs, or suggesting new features, check the repository's CONTRIBUTING.md for guidance on probe definitions and development workflow.

Ready to start? Clone the repository, experiment with Julius in your environment, and join the discussion on GitHub.

Star the project if you find it useful, and let us know what LLM services you'd like to see supported next.

What shadow AI have you discovered lurking in your environment? Drop your stories in the comments.

Top comments (4)

Ofri Peretz • Feb 9

The 14,000 publicly accessible Ollama instances stat is wild but not surprising — I've seen the same pattern with every new dev tool that prioritizes DX over security defaults. No auth by default is a design choice that ages terribly once something goes from "local experiment" to "running on a server with a public IP."

Smart call-making probes are extensible via YAML rather than hardcoding signatures. The AI serving landscape is fragmenting fast, and a compiled probe set would be outdated within months.

Nathan Sportsman • Feb 9

Ollama shipping with 0.0.0.0 binding and no auth was a conscious DX decision that made sense for local experimentation, but the problem is developers don't change defaults when they move to production. That gap between "works on my machine" and "running on a server with a public IP" is exactly where shadow AI thrives.

And yeah, the YAML extensibility was a deliberate choice for exactly the reason you mentioned. The AI serving landscape is moving too fast for hardcoded signatures to stay current. We wanted anyone to be able to add a new probe in 20 lines of YAML without touching Go code, so the detection library can keep pace with the ecosystem through community contributions rather than waiting on us to push releases. If a new serving framework drops tomorrow, you can write a probe for it the same day.

PEACEBINFLOW • Feb 5

This is a very timely post. The “shadow AI” framing is spot-on — most teams aren’t malicious or careless, they’re just moving fast and spinning up local LLMs the same way we used to spin up Redis or Elasticsearch… except now the blast radius is way bigger.

What I really like about Julius is the restraint. It doesn’t try to be a scanner, a guardrail, and a policy engine all at once. It answers one question cleanly: “Is this endpoint an LLM service, and which one?” That’s exactly the missing link between port scanning and actual AI risk awareness.

The specificity scoring is also a smart touch. Anyone who’s dealt with OpenAI-compatible APIs knows how noisy detection can get, and ranking concrete services above generic matches feels like the difference between a usable security tool and a demo.

This also highlights a bigger shift: AI infra is becoming just another layer of shadow IT, but our discovery tooling hasn’t caught up yet. Treating LLM servers as first-class assets that need fingerprinting, inventory, and visibility feels inevitable.

Curious to see where this goes next — especially around detecting custom or heavily wrapped deployments that don’t follow clean API conventions. Either way, Julius feels like one of those tools people won’t realize they need… until they really do.

Nathan Sportsman • Feb 5

Thanks, PEACEBINFLOW!

Really appreciate this comment, especially the Redis/Elasticsearch comparison. That's exactly the mental model. Developers aren't doing anything wrong, they're solving problems the same way they always have. The difference is that a misconfigured Redis instance leaks cache data. A misconfigured Ollama instance with a fine-tuned model might leak training data, proprietary context, or become a pivot point into internal systems.

The restraint point is something we debated internally. There's always pressure to add more features, but scope creep kills tools. The moment Julius tries to also port scan and also check for jailbreaks and also enforce policy, it becomes slower, harder to maintain, and worse at the one thing it should do well. Glad that landed.

On the custom/wrapped deployments question: that's the hard problem. Clean API conventions are easy. The messy reality is internal teams wrapping models behind custom authentication, proxying through API gateways, or building bespoke interfaces that don't look like any standard LLM service. We're exploring a few approaches there, including traffic pattern analysis and some heuristics around response timing and structure. No promises yet, but it's on the radar.