Tomer goldstein

Posted on Mar 31 • Originally published at ship-safe.co

Is Cursor Safe? I Scanned 100 Apps. 67% Had Critical Vulns.

#webdev #ai #security #cursor

so I've been building ShipSafe — security scanner for AI-generated code — and a few weeks ago I got curious. like, actually curious. not "I wonder if AI code has bugs" curious, more like "how bad is it really and am I just being paranoid" curious.

I grabbed 100 Cursor-built repos off GitHub. not tutorials, not demo apps. real production stuff — SaaS tools, internal dashboards, a couple e-commerce stores, bunch of API backends. found them by searching for .cursorrules files and Cursor-style commit patterns.

then I scanned all of them with ShipSafe.

67%. sixty-seven percent had at least one critical vulnerability. the worst app had 14 separate issues. fourteen. average was 3.2 per app.

ngl I expected some problems but not... that.

	% of apps
had a critical vuln	67%
IDOR	43%
inverted auth	31%
frontend-only admin checks	28%
hardcoded secrets	22%

this tracks with Stanford research that found ~45% of AI-assisted code has vulns. our numbers are worse, probably bc we only looked at shipped production apps vs their lab setup.

anyway. let me show you what I kept finding.

the IDOR thing (43%)

this was by far the most common. and it's so dumb that it's almost funny? Cursor generates an API route, takes an ID from the URL, fetches from the database, returns it. no check on who's asking.

// /api/invoices/[id]
export async function GET(req, { params }) {
  const invoice = await db.invoice.findUnique({
    where: { id: params.id },
  });
  return Response.json(invoice);
}

change /api/invoices/42 to /api/invoices/43. congrats you're reading someone else's invoice now. their name, the amount, payment status, all of it. OWASP literally lists this as vulnerability #1.

the fix btw:

where: {
  id: params.id,
  userId: session.user.id, // this. this is the whole fix.
},

one line. Cursor never adds it. and I get why — the AI is optimizing for "does it work" not "who's allowed to see this." but still. 43%.

the backwards auth thing was the wildest tho

okay 31% of these apps had their auth middleware inverted. and I know that sounds fake but look at this:

export function middleware(req) {
  const token = req.cookies.get("session");

  if (token) {
    return NextResponse.redirect("/login");
  }
  return NextResponse.next();
}

if (token) → kick to login. so logged-in users get redirected away. and if you DON'T have a token? NextResponse.next() — come right in. every protected route, wide open to anyone without a session.

one missing !. that's it. if (!token) vs if (token).

and here's what makes it so nasty — you will never find this by testing your own app. you're logged in while you test. the redirect fires on you and maybe you think "huh that's weird" and hack around it or don't even notice bc your session is cached. meanwhile an attacker shows up with zero cookies and gets full access to everything. no cap the first time I saw this in a real production app I just stared at it for like 30 seconds.

frontend admin checks that do nothing

28% of apps had this pattern where Cursor puts the role check in React but not on the server. and I gotta admit this one annoys me the most bc it's so close to being right.

// this part is fine honestly
if (user.role !== "admin") return <Redirect to="/" />;

// this part is NOT fine
export async function DELETE(req, { params }) {
  await db.user.delete({ where: { id: params.id } });
  return Response.json({ success: true });
}

the admin panel is hidden in the UI. great. but curl -X DELETE /api/users/123 works for literally anyone. the API doesn't check anything. Cursor generated the visual gate and forgot the actual gate.

I started telling people: frontend is what you see, backend is what you can do. Cursor gets the see part right every time. the do part? apparently optional.

hardcoded secrets + a personal L

22% had keys in the source. and tbh this is the one I'm least judgmental about bc I almost did this myself early on.

you're setting up Stripe or whatever. you paste your API key into Cursor's chat for context. Cursor writes the integration and puts the key right there in the file:

export const stripe = new Stripe("sk_live_51N8x...");

you commit, push, and now that key is in your git history permanently. deleting the file doesn't help — git log remembers. GitGuardian says 12.8 million secrets were exposed on GitHub last year and I bet a huge chunk started exactly like this.

I almost shipped a Supabase service role key this way. caught it in a scan literally the day before pushing to prod. that's actually one of the reasons I started building ShipSafe — I needed it for myself first.

why tho

okay so why does Cursor keep doing this? I've thought about it a lot and I think there's two things going on.

first — LLMs learn from GitHub. and most code on GitHub is focused on making things work. security is an afterthought in like 90% of open source repos. so the model generates happy-path code. auth checks, ownership verification, input validation — that stuff doesn't make the app function, it prevents exploitation. the model literally doesn't prioritize it unless you specifically ask.

second thing — and this is the one that's harder to fix — Cursor thinks about one file at a time. it'll generate a totally reasonable API route without knowing that your middleware in some other directory is supposed to handle auth. or that it doesn't. the context window sees the file you're working on, not the security architecture of your whole app.

tbh the second one bugs me more than the first bc you can kinda solve the first one with good .cursorrules but the file-at-a-time thing is structural.

what actually changed my workflow

I'm not gonna stop using Cursor over this. the speed boost is too real. but I did change a few things and it's made a huge difference.

biggest one — I added security-specific rules to my .cursorrules file. things like "always add userId to database where clauses" and "never hardcode secrets, always use process.env." sounds simple but it legit changed the output quality. night and day.

the other thing is I just manually review auth-related code now. everything else, Cursor handles and I trust it. but middleware files, API route guards, anything with role checks — I actually read those line by line. takes like 5 minutes per PR and it's caught issues multiple times.

and obviously I run ShipSafe before deploying. that's the whole reason I built it lol. paste the GitHub URL, get a report, fix the flagged stuff, ship. couple minutes.

oh and I stopped pasting real API keys into Cursor chat. use fakes, swap in real ones through env vars. learned that one the hard way with the Supabase key thing I mentioned.

Cursor makes you stupid fast. I'm not going back to writing everything manually. but "fast" and "secure" aren't the same thing and 67% of production apps having critical vulns is a pretty loud signal.

just add a scan to your deploy flow. like two minutes. saves you the 2am incident response call.

full data + methodology: ship-safe.co/blog/is-cursor-code-secure

free scan: ship-safe.co

Top comments (12)

Julian Oczkowski • Mar 31

The inverted auth middleware example is a perfect illustration of why AI-generated code needs security review as a first-class concern, not an afterthought. The if (token) vs if (!token) bug is particularly dangerous because it passes basic functional testing — the developer is always logged in during manual QA, so the redirect behavior seems correct from their perspective.

What stands out in your data is that these aren't exotic vulnerabilities. IDOR, missing server-side auth, hardcoded secrets — these are well-known patterns that any senior engineer would catch in code review. The issue is that AI coding assistants optimize for "does it work" rather than "is it secure," and developers trusting the output skip the scrutiny they'd apply to code written by a junior teammate.

This makes a strong case for integrating automated security scanning directly into the AI-assisted development workflow — not as a CI gate after the PR is opened, but as real-time feedback while the code is being generated.

angel t. duran • Apr 2

interesting stuff.... thanks

Jonathan Murray • Apr 4

The methodology here is strong: real production repos found by searching for Cursor-specific patterns rather than curated examples. That's a much more honest signal than benchmarks run on toy projects.

67% with security issues is significant but also worth contextualizing: what's the baseline for non-AI-generated code at similar project sizes and types? The question isn't just "does AI code have vulnerabilities" but "does AI code have more vulnerabilities than human code at the same velocity?" If the comparison point is "code written by the same developers without AI but taking 3x as long," the risk calculus changes.

The most interesting data point would be which vulnerability categories are overrepresented in AI-generated code specifically — my guess would be injection vulnerabilities and overly permissive configurations, since those tend to be in boilerplate that AI generates confidently without fully modeling the security context.

Apex Stack • Mar 31

The inverted auth middleware pattern at 31% is wild. That's the kind of bug that passes every test you throw at it because you're always logged in when you test. It's almost adversarial in how invisible it is.

I run a large programmatic site (thousands of pages generated from data pipelines) and while my stack is static-first so most of these server-side vulns don't apply, I've seen the same fundamental issue with AI-generated code: it optimizes for the happy path. When I use AI to generate data processing scripts, it'll skip input validation, ignore edge cases in financial data (negative P/E ratios, missing dividends), and assume every API call succeeds.

Your point about .cursorrules as a mitigation is underrated. I've found that giving AI assistants explicit constraints up front ("always validate inputs", "never trust client-side checks alone") dramatically changes the output quality. It's basically the same principle as writing good acceptance criteria — the AI needs to know what "done" looks like, not just what "working" looks like.

Henry Pautu • Mar 31

Cursor itself is reasonably safe for most developers when you use it carefully and keep it updated, but it's not bulletproof. The real risk often comes from the AI-generated code it produces, which can contain critical vulnerabilities (like SQL injection, XSS, insecure auth, etc.) if you don't review or scan it. Your scan result (67% of 100 apps having critical vulns) is plausible and not surprising — it's more likely our prompts (or lack of security-focused follow-up) are contributing heavily, rather than Cursor being uniquely dangerous. AI models (even strong ones like Claude or GPT) are not a security experts. They optimize for "working code" based on our prompt, not secure code. ✌🏻

Lavie • Apr 1

The inverted auth middleware example is a classic. It really highlights why we need fine-grained control over AI output. Automated security scanning is great, but even just having well-defined architecture rules that the AI is forced to follow can prevent these common 'does it work' but 'is it broken' mistakes before they ever reach a PR.

Lavie • Apr 1

This is a massive issue that doesn't get enough attention. When using AI agents like Cursor, they often prioritize "making it work" over "making it secure" -- like suggesting getSession() in Supabase which doesn't verify the JWT on the server. I've started using strict .mdc rules to force the agent to only use getUser() and await all Promises in Next.js 15. It's the only way to keep the speed of AI while maintaining actual security standards.

Mykola Kondratiuk • Apr 8

been thinking about what happens when teams ship Cursor code straight to prod. your 67% is going to make some security leads very uncomfortable. what were the most common vuln categories?

View full discussion (12 comments)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.