DEV Community

soy profile picture

soy

Patent lawyer turned AI engineer. Processed 4M patents with local LLM on RTX 5090. Building PatentLLM — AI-powered patent search. Also ranked #1 on Floodgate (shogi AI). Writing about local LLM etc.

Critical CVEs, AI RCE, & Supply Chain Malware Hits HWMonitor

Critical CVEs, AI RCE, & Supply Chain Malware Hits HWMonitor

Comments
4 min read

Want to connect with soy?

Create an account to connect with soy. You can also sign in below to proceed if you already have an account.

Already have an account? Sign in
Smriti: Hybrid Vector DB for AI Agents, Claude Code LSP Integration & Workflow Automation with LLMs

Smriti: Hybrid Vector DB for AI Agents, Claude Code LSP Integration & Workflow Automation with LLMs

Comments
3 min read
PostgreSQL O(delta) MV Refreshes, pg_lake for Data Lakes, & ADBC for Columnar Data

PostgreSQL O(delta) MV Refreshes, pg_lake for Data Lakes, & ADBC for Columnar Data

Comments
3 min read
CUDA SGEMM Bug on RTX 5090, Kernel-Fusing for SGEMV, & Radeon RX 9070 XT Price Surge

CUDA SGEMM Bug on RTX 5090, Kernel-Fusing for SGEMV, & Radeon RX 9070 XT Price Surge

Comments
4 min read
Claude AI Expands Enterprise Features, Developer Tools & CLI Automation

Claude AI Expands Enterprise Features, Developer Tools & CLI Automation

Comments
3 min read
Gemma4 Tool Calling Fixes in llama.cpp, RTX cuBLAS MatMul Bug, & Local Ollama + Whisper UI

Gemma4 Tool Calling Fixes in llama.cpp, RTX cuBLAS MatMul Bug, & Local Ollama + Whisper UI

Comments
3 min read
CUPS RCE-to-Root, AI Sandbox Escape, & LittleSnitch for Linux

CUPS RCE-to-Root, AI Sandbox Escape, & LittleSnitch for Linux

Comments
3 min read
AI Agents: Cost-Optimized Orchestration & Robust Text-to-SQL with Python

AI Agents: Cost-Optimized Orchestration & Robust Text-to-SQL with Python

Comments
4 min read
SQLite Join Benchmarks, PostgreSQL for AI Graphs with pgvector, & pGenie for SQL Validation

SQLite Join Benchmarks, PostgreSQL for AI Graphs with pgvector, & pGenie for SQL Validation

Comments
3 min read
LLM GPU Breakthroughs: RT Cores, Llama.cpp Parallelism, AMD Optimizations

LLM GPU Breakthroughs: RT Cores, Llama.cpp Parallelism, AMD Optimizations

Comments
3 min read
Cloud AI & Dev: Gemini 3D, Claude Agent Patterns, Embedding Compression

Cloud AI & Dev: Gemini 3D, Claude Agent Patterns, Embedding Compression

Comments
4 min read
Llama.cpp Tensor Parallelism, Gemma 4 Stability, & OmniVoice Local TTS

Llama.cpp Tensor Parallelism, Gemma 4 Stability, & OmniVoice Local TTS

Comments
3 min read
LLM Code Vulnerabilities, GRU Router Exploits & `dnsight` CLI DNS Auditor

LLM Code Vulnerabilities, GRU Router Exploits & `dnsight` CLI DNS Auditor

Comments
3 min read
Anthropic Launches Managed Agents, Optimize LLM Context, Python Memory Needed

Anthropic Launches Managed Agents, Optimize LLM Context, Python Memory Needed

Comments
3 min read
SQLite Internals, PostgreSQL Extensions & Performance Tuning Updates

SQLite Internals, PostgreSQL Extensions & Performance Tuning Updates

Comments
3 min read
New AMD RX 9000 GPUs, DLSS/FSR Mod, & Deep Dive into CUDA LLVM Bitcode

New AMD RX 9000 GPUs, DLSS/FSR Mod, & Deep Dive into CUDA LLVM Bitcode

Comments
3 min read
Anthropic Launches Managed Agents; Claude Opus 4.6 Reasoning Fluctuation, and Code Resurrections

Anthropic Launches Managed Agents; Claude Opus 4.6 Reasoning Fluctuation, and Code Resurrections

Comments
4 min read
Gemma 4 GGUFs, CLI Coding Agent, & Pi 5 Ollama Benchmarks Lead Local AI

Gemma 4 GGUFs, CLI Coding Agent, & Pi 5 Ollama Benchmarks Lead Local AI

Comments
3 min read
Nissan PSR Simulation

Nissan PSR Simulation

Comments
9 min read
Why I’m Engineering My FIRE with Python — A Manifesto

Why I’m Engineering My FIRE with Python — A Manifesto

1
Comments
5 min read
Why I’m Engineering My FIRE with Python — A Manifesto

Why I’m Engineering My FIRE with Python — A Manifesto

Comments
6 min read
Cloud Supply Chain & AWS CodeBuild PrivEsc Exposed; GDDR6 Rowhammer to Root Shell

Cloud Supply Chain & AWS CodeBuild PrivEsc Exposed; GDDR6 Rowhammer to Root Shell

Comments
3 min read
Claude Code Powers AI Workflows: Ultraplan for Agent Orchestration & App Store Automation

Claude Code Powers AI Workflows: Ultraplan for Agent Orchestration & App Store Automation

Comments
3 min read
SQLite Internals: Memory Leak, Security Vuln; PostgREST Goes Edge

SQLite Internals: Memory Leak, Security Vuln; PostgREST Goes Edge

Comments
3 min read
CUDA Memory Hierarchy, Tile Programming, & DLSS 310.6 Driver Enhancements

CUDA Memory Hierarchy, Tile Programming, & DLSS 310.6 Driver Enhancements

Comments
3 min read
Claude Code Enhances Dev Workflows, Open-Source AI Outperforms Sonnet on Benchmarks

Claude Code Enhances Dev Workflows, Open-Source AI Outperforms Sonnet on Benchmarks

Comments
3 min read
Gemma 4 Benchmarks, iMac G3 Local LLM, and Ollama Android Client for On-Device Inference

Gemma 4 Benchmarks, iMac G3 Local LLM, and Ollama Android Client for On-Device Inference

Comments
3 min read
Zero-Days, Supply Chain & AI Self-Jailbreaks: Top Security Threats

Zero-Days, Supply Chain & AI Self-Jailbreaks: Top Security Threats

1
Comments
3 min read
AI Agent Autonomy, Audio Transcription Models, & LLM Token Optimization

AI Agent Autonomy, Audio Transcription Models, & LLM Token Optimization

Comments
3 min read
PostgreSQL Performance in the Spotlight: Linux 7.0 Impact, Benchmarking & Vacuum Tuning

PostgreSQL Performance in the Spotlight: Linux 7.0 Impact, Benchmarking & Vacuum Tuning

Comments
3 min read
Hopper/Blackwell Tensor Core Optimization, llama.cpp VRAM Fix & 4W NPU Inference

Hopper/Blackwell Tensor Core Optimization, llama.cpp VRAM Fix & 4W NPU Inference

Comments
3 min read
Claude Ultraplan & API Access Changes for Developers; Cadenza Boosts AI Agent Research

Claude Ultraplan & API Access Changes for Developers; Cadenza Boosts AI Agent Research

Comments
4 min read
Gemma 4 Local Inference: Ollama Benchmarks, llama.cpp KV Cache Fix, NPU Deployments

Gemma 4 Local Inference: Ollama Benchmarks, llama.cpp KV Cache Fix, NPU Deployments

Comments
2 min read
Self-Hosting Docker Mastery, Rust/WASM Browser Engines, & Gesture-Controlled Web

Self-Hosting Docker Mastery, Rust/WASM Browser Engines, & Gesture-Controlled Web

Comments
4 min read
PostgreSQL Performance Crisis, Cloud-Native DB Deployments, & Collation Deep Dive

PostgreSQL Performance Crisis, Cloud-Native DB Deployments, & Collation Deep Dive

Comments
4 min read
Agent Frameworks & Local VLM Tuning: Boosting Dev Productivity

Agent Frameworks & Local VLM Tuning: Boosting Dev Productivity

Comments
4 min read
GPU Power Tools & CUDA Deep Dives for Local LLM Builders

GPU Power Tools & CUDA Deep Dives for Local LLM Builders

Comments
3 min read
Gemma 4 & LLM Ops: Fine-Tuning, Local Inference, and VRAM Management

Gemma 4 & LLM Ops: Fine-Tuning, Local Inference, and VRAM Management

Comments
3 min read
Self-Host Like a Pro: From Security Tools to 100x Faster AI Agent Sandboxing

Self-Host Like a Pro: From Security Tools to 100x Faster AI Agent Sandboxing

Comments
3 min read
SQLite, Go/Postgres, & Petabytes: Database Patterns for Builders

SQLite, Go/Postgres, & Petabytes: Database Patterns for Builders

2
Comments
4 min read
Local LLMs & Agents: Build Real-time Deepfakes, Agent Frameworks & AI Scientists

Local LLMs & Agents: Build Real-time Deepfakes, Agent Frameworks & AI Scientists

Comments
3 min read
Boost Local LLMs: TurboQuant KV Cache, Fast Cold Starts, & Rust GPU Dev

Boost Local LLMs: TurboQuant KV Cache, Fast Cold Starts, & Rust GPU Dev

Comments
4 min read
Local LLM Efficiency & Security: TurboQuant Innovations and Supply Chain Alerts

Local LLM Efficiency & Security: TurboQuant Innovations and Supply Chain Alerts

Comments
3 min read
Self-Host Strong, AI Agents Fast, & Master Your JSON Tools

Self-Host Strong, AI Agents Fast, & Master Your JSON Tools

2
Comments
3 min read
Building High-Performance Data Stacks: Vector Search, SQLite Ops, & Open-Source Monitoring

Building High-Performance Data Stacks: Vector Search, SQLite Ops, & Open-Source Monitoring

1
Comments
4 min read
Local AI Agents, Voice Models & Self-Hosted Research Tools

Local AI Agents, Voice Models & Self-Hosted Research Tools

Comments
4 min read
Local LLM Power-Ups: Voxtral TTS, TurboQuant, & Sub-Second Cold Starts

Local LLM Power-Ups: Voxtral TTS, TurboQuant, & Sub-Second Cold Starts

Comments
3 min read
GPU-Accelerated LLMs: Serving at 1M Tok/s, Voxtral TTS, & 4-bit Weight Quantization

GPU-Accelerated LLMs: Serving at 1M Tok/s, Voxtral TTS, & 4-bit Weight Quantization

Comments
3 min read
DIY Compute: Tesla Hacking, RAG Systems, and Blazing Fast AI Agents

DIY Compute: Tesla Hacking, RAG Systems, and Blazing Fast AI Agents

Comments
4 min read
Building & Monitoring Data Backends: Tools, Architecture, and Observability

Building & Monitoring Data Backends: Tools, Architecture, and Observability

Comments
4 min read
Cutting-Edge AI Agents: Building Multi-Agent Workflows Locally

Cutting-Edge AI Agents: Building Multi-Agent Workflows Locally

Comments
3 min read
Local LLM Unleashed: Faster Inference, Instant Starts, & Open TTS

Local LLM Unleashed: Faster Inference, Instant Starts, & Open TTS

Comments
4 min read
Local LLM Acceleration: Quantization, TTS, and 1M Tokens/Sec

Local LLM Acceleration: Quantization, TTS, and 1M Tokens/Sec

Comments
4 min read
vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

vLLM On-Demand Gateway: Zero-VRAM Standby for Local LLMs on Consumer GPUs

2
Comments 1
4 min read
Databases Are the New AI Moat: Why DB-First Architecture Changes Everything

Databases Are the New AI Moat: Why DB-First Architecture Changes Everything

Comments
5 min read
Local LLM Apps, Persistent Certs & K8s Storage Mastery

Local LLM Apps, Persistent Certs & K8s Storage Mastery

Comments
4 min read
DIY Data Stacks: Building, Optimizing, and Self-Hosting Your Data Infrastructure

DIY Data Stacks: Building, Optimizing, and Self-Hosting Your Data Infrastructure

Comments
3 min read
Next-Gen LLM Dev: APIs, Agents, and Accessible Local Inference

Next-Gen LLM Dev: APIs, Agents, and Accessible Local Inference

Comments
4 min read
New Arc GPUs, Supply Chain Security, and Deep CUDA Optimization

New Arc GPUs, Supply Chain Security, and Deep CUDA Optimization

Comments
3 min read
Local LLMs & Edge AI: Hardware Boost, Security Fixes, and Extreme Compression

Local LLMs & Edge AI: Hardware Boost, Security Fixes, and Extreme Compression

Comments
4 min read
loading...