<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Ashutosh Rana</title>
    <description>The latest articles on DEV Community by Ashutosh Rana (@ashutoshrana).</description>
    <link>https://dev.to/ashutoshrana</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3874099%2F4e0f4a1a-9aed-4405-81da-2e162de258db.png</url>
      <title>DEV Community: Ashutosh Rana</title>
      <link>https://dev.to/ashutoshrana</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/ashutoshrana"/>
    <language>en</language>
    <item>
      <title>FERPA Compliance in RAG Pipelines: Five Rules Your Enterprise System Probably Breaks</title>
      <dc:creator>Ashutosh Rana</dc:creator>
      <pubDate>Sat, 11 Apr 2026 20:50:23 +0000</pubDate>
      <link>https://dev.to/ashutoshrana/ferpa-compliance-in-rag-pipelines-five-rules-your-enterprise-system-probably-breaks-5762</link>
      <guid>https://dev.to/ashutoshrana/ferpa-compliance-in-rag-pipelines-five-rules-your-enterprise-system-probably-breaks-5762</guid>
      <description>&lt;p&gt;If you are building a retrieval-augmented generation (RAG) system for a higher-education institution, your pipeline is probably violating FERPA. Not because you meant to — but because the standard RAG tutorial pattern and the regulated record-access pattern are fundamentally different, and most documentation does not explain where they diverge.&lt;/p&gt;

&lt;p&gt;This post covers &lt;strong&gt;five rules&lt;/strong&gt; that most enterprise RAG implementations break, and what the correct pattern looks like for each.&lt;/p&gt;




&lt;h2&gt;
  
  
  What FERPA requires from a retrieval system
&lt;/h2&gt;

&lt;p&gt;FERPA (Family Educational Rights and Privacy Act, 20 U.S.C. § 1232g; implementing regulations at 34 CFR Part 99) governs access to education records at institutions that receive federal funding.&lt;/p&gt;

&lt;p&gt;The relevant requirement for a RAG pipeline is simple: &lt;strong&gt;a student's education records must not be accessible to another student or to an unauthorized third party.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;In a vector store-backed system, "accessible" means more than whether the LLM produces the record in its response. It means whether the record &lt;strong&gt;enters the retrieval pipeline at all&lt;/strong&gt;. A document that is retrieved, ranked, and then discarded by a post-filter has still been surfaced to a process that handles data for a different user.&lt;/p&gt;

&lt;p&gt;Under FERPA's minimum-disclosure principle — and under any reasonable security posture — that is not acceptable.&lt;/p&gt;




&lt;h2&gt;
  
  
  Rule 1: Filter before ranking, not after
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What most systems do:&lt;/strong&gt; Retrieve the top-k documents from the vector store based on semantic similarity, then apply a metadata filter to remove documents that belong to the wrong student.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this breaks FERPA:&lt;/strong&gt; The unauthorized documents are scored, ranked, and processed by the retrieval pipeline before being discarded. If the post-filter has a defect — a misconfigured field name, a missing metadata key, a swallowed exception — the unauthorized content reaches the LLM context window. The failure mode is silent and the blast radius is wide.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The correct pattern:&lt;/strong&gt; Apply the identity constraint as a metadata &lt;strong&gt;pre-filter&lt;/strong&gt; on the vector store query. Unauthorized documents should not exist in the candidate set.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Wrong — retrieve all, then filter
&lt;/span&gt;&lt;span class="n"&gt;all_docs&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="n"&gt;authorized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;all_docs&lt;/span&gt; &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;d&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;student_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt; &lt;span class="o"&gt;==&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct — filter at query time
&lt;/span&gt;&lt;span class="n"&gt;authorized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;vector_store&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;similarity_search&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;query&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;k&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="mi"&gt;20&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="nb"&gt;filter&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;student_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
        &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;institution_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;institution_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="p"&gt;}&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Most vector stores support metadata filtering natively: Pinecone, Weaviate, Qdrant, pgvector, and Chroma all support pre-filter expressions. Use them.&lt;/p&gt;




&lt;h2&gt;
  
  
  Rule 2: Filter on &lt;code&gt;institution_id&lt;/code&gt;, not just &lt;code&gt;student_id&lt;/code&gt;
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What most systems do:&lt;/strong&gt; Filter by &lt;code&gt;student_id&lt;/code&gt; only.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this breaks FERPA:&lt;/strong&gt; In a multi-tenant deployment, a &lt;code&gt;student_id&lt;/code&gt; that is unique within Institution A may collide with a record at Institution B. More fundamentally, a student authorized to access their own records at Institution A should never retrieve records from Institution B — even if their &lt;code&gt;student_id&lt;/code&gt; matches.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The correct pattern:&lt;/strong&gt; Apply a &lt;strong&gt;compound &lt;code&gt;AND&lt;/code&gt; filter&lt;/strong&gt;: &lt;code&gt;student_id == X AND institution_id == Y&lt;/code&gt;. Both conditions must be satisfied.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Wrong — student_id alone
&lt;/span&gt;&lt;span class="nb"&gt;filter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;student_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct — compound identity predicate
&lt;/span&gt;&lt;span class="nb"&gt;filter&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;
    &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$and&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;student_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$eq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
        &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;institution_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;$eq&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;institution_id&lt;/span&gt;&lt;span class="p"&gt;}},&lt;/span&gt;
    &lt;span class="p"&gt;]&lt;/span&gt;
&lt;span class="p"&gt;}&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Never query on &lt;code&gt;student_id&lt;/code&gt; alone in a multi-institution deployment.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Rule 3: Enforce document categories as a second layer
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What most systems do:&lt;/strong&gt; Once the identity filter passes, all of the student's documents are fair game.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this breaks FERPA:&lt;/strong&gt; Not all of a student's records are equally accessible. Counseling records, health records, disciplinary files, and financial aid records each have different access rules. Even if the current retrieval is authorized for identity, the &lt;strong&gt;category&lt;/strong&gt; of document being retrieved matters.&lt;/p&gt;

&lt;p&gt;A financial aid query that incidentally surfaces a counseling note is retrieving the right student's record — but the wrong type of record.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The correct pattern:&lt;/strong&gt; After the identity pre-filter, apply a &lt;strong&gt;category authorization check&lt;/strong&gt;. The authenticated session carries a set of permitted document categories. Documents outside that set are excluded.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# Session carries permitted categories (set by auth layer)
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;allowed_categories&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;academic_record&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;financial_record&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;}&lt;/span&gt;

&lt;span class="c1"&gt;# Second enforcement layer — category filter
&lt;/span&gt;&lt;span class="n"&gt;authorized&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="p"&gt;[&lt;/span&gt;
    &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="k"&gt;for&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;identity_filtered_docs&lt;/span&gt;
    &lt;span class="k"&gt;if&lt;/span&gt; &lt;span class="n"&gt;doc&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;metadata&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;get&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;category&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt; &lt;span class="ow"&gt;in&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;allowed_categories&lt;/span&gt;
&lt;span class="p"&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is the &lt;strong&gt;two-layer enforcement model&lt;/strong&gt;:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Layer 1&lt;/strong&gt; — Identity boundary: who owns this document?&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Layer 2&lt;/strong&gt; — Category authorization: what type of document is this, and is the session permitted to retrieve it?&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  Rule 4: Every retrieval event must produce an audit record
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What most systems do:&lt;/strong&gt; Log at the application level — a timestamped entry that a user made a query.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this breaks FERPA:&lt;/strong&gt; 34 CFR § 99.32 requires institutions to maintain a record of &lt;strong&gt;each disclosure&lt;/strong&gt; of education records. "Disclosure" includes allowing access to records — which includes retrieval by an AI pipeline. The audit record must capture:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Who made the request&lt;/li&gt;
&lt;li&gt;What was disclosed&lt;/li&gt;
&lt;li&gt;The basis for disclosure&lt;/li&gt;
&lt;li&gt;The date&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;An application log that records &lt;code&gt;"user X made a query"&lt;/code&gt; does not satisfy this requirement.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The correct pattern:&lt;/strong&gt; Produce a &lt;strong&gt;typed audit record&lt;/strong&gt; for each retrieval event, containing the count of documents retrieved, the categories accessed, the policy version in effect, and the timestamp. Route it to a durable, student-accessible store — not just an application log.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="n"&gt;audit_record&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nc"&gt;AuditRecord&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;
    &lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;institution_id&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;institution_id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;documents_retrieved&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;raw_docs&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;documents_filtered&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;len&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;authorized_docs&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;categories_accessed&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="nf"&gt;list&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;allowed_categories&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;policy_version&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;v1.2&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt;
    &lt;span class="n"&gt;timestamp&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="n"&gt;datetime&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nf"&gt;now&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;timezone&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;utc&lt;/span&gt;&lt;span class="p"&gt;),&lt;/span&gt;
    &lt;span class="n"&gt;requester_context&lt;/span&gt;&lt;span class="o"&gt;=&lt;/span&gt;&lt;span class="p"&gt;{&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;session_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="nb"&gt;id&lt;/span&gt;&lt;span class="p"&gt;,&lt;/span&gt; &lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;channel&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;:&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;channel&lt;/span&gt;&lt;span class="p"&gt;},&lt;/span&gt;
&lt;span class="p"&gt;)&lt;/span&gt;
&lt;span class="nf"&gt;audit_sink&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;audit_record&lt;/span&gt;&lt;span class="p"&gt;)&lt;/span&gt;  &lt;span class="c1"&gt;# write to compliance database — not application log
&lt;/span&gt;&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;blockquote&gt;
&lt;p&gt;Application logs rotate. FERPA compliance audit trails must be retained for as long as the education records themselves are retained.&lt;/p&gt;
&lt;/blockquote&gt;




&lt;h2&gt;
  
  
  Rule 5: Identity values must come from the session, not the query
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;What most systems do:&lt;/strong&gt; Accept &lt;code&gt;student_id&lt;/code&gt; and &lt;code&gt;institution_id&lt;/code&gt; as parameters in the API request, or extract them from user-supplied query text.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Why this breaks FERPA:&lt;/strong&gt; If the filter values come from the request, an attacker — or a misconfigured agent — can supply a different student's ID and retrieve their records. This is the most common vector for unauthorized record access in multi-tenant educational systems.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;The correct pattern:&lt;/strong&gt; The &lt;code&gt;student_id&lt;/code&gt; and &lt;code&gt;institution_id&lt;/code&gt; used for filtering must come from the &lt;strong&gt;authenticated session token&lt;/strong&gt; — not from the request body, not from the query, not from user input.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight python"&gt;&lt;code&gt;&lt;span class="c1"&gt;# ❌ Wrong — accept from request body
&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;params&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;student_id&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;]&lt;/span&gt;

&lt;span class="c1"&gt;# ✅ Correct — extract from verified session token
&lt;/span&gt;&lt;span class="n"&gt;session&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="nf"&gt;verify_token&lt;/span&gt;&lt;span class="p"&gt;(&lt;/span&gt;&lt;span class="n"&gt;request&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;headers&lt;/span&gt;&lt;span class="p"&gt;[&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="s"&gt;Authorization&lt;/span&gt;&lt;span class="sh"&gt;"&lt;/span&gt;&lt;span class="p"&gt;])&lt;/span&gt;
&lt;span class="n"&gt;student_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;student_id&lt;/span&gt;      &lt;span class="c1"&gt;# set by auth layer, not by user
&lt;/span&gt;&lt;span class="n"&gt;institution_id&lt;/span&gt; &lt;span class="o"&gt;=&lt;/span&gt; &lt;span class="n"&gt;session&lt;/span&gt;&lt;span class="p"&gt;.&lt;/span&gt;&lt;span class="n"&gt;institution_id&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;This is not FERPA-specific — it is a basic authorization principle. In RAG systems it is easy to miss because most tutorials treat the retrieval query as the only input and ignore the access control context entirely.&lt;/p&gt;




&lt;h2&gt;
  
  
  What a compliant pipeline looks like
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;Authenticated session
(student_id + institution_id + allowed_categories — from verified token)
         │
         ▼
Vector store pre-filter query
(metadata filter: student_id AND institution_id — applied at query time)
         │
         ▼
Semantic ranking
(only authorized documents are candidates)
         │
         ▼
Category authorization check
(second enforcement layer — removes out-of-scope document types)
         │
         ▼
Context assembly → LLM call
         │
         ▼
Audit record (34 CFR § 99.32)
(student_id, institution_id, documents retrieved, categories, timestamp)
→ written to durable compliance store
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The identity boundary is enforced &lt;strong&gt;twice&lt;/strong&gt; — at the vector store and at the category level — before any document enters the LLM context window. The audit record is produced for every retrieval event, regardless of whether the LLM produces a response.&lt;/p&gt;




&lt;h2&gt;
  
  
  Reference implementation
&lt;/h2&gt;

&lt;p&gt;The patterns described here are implemented in &lt;a href="https://github.com/ashutoshrana/enterprise-rag-patterns" rel="noopener noreferrer"&gt;&lt;code&gt;enterprise-rag-patterns&lt;/code&gt;&lt;/a&gt;, a MIT-licensed Python library:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;pip &lt;span class="nb"&gt;install &lt;/span&gt;enterprise-rag-patterns
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;It provides:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;StudentIdentityScope&lt;/code&gt;&lt;/strong&gt; — defines the retrieval boundary per student and institution&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;FERPAContextPolicy&lt;/code&gt;&lt;/strong&gt; — two-layer enforcement (pre-filter + category authorization)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;AuditRecord&lt;/code&gt;&lt;/strong&gt; — structured 34 CFR § 99.32 disclosure logging with a typed sink interface&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;&lt;code&gt;make_enrollment_advisor_policy&lt;/code&gt;&lt;/strong&gt; — factory for the most common higher-education RAG use case&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The design is platform-agnostic (any vector store, any LLM provider) and cloud-agnostic (AWS, GCP, Azure, OCI, or on-premises). The same two-layer pattern applies to HIPAA's minimum-necessary standard and GLBA's safeguards rule.&lt;/p&gt;

&lt;p&gt;A companion library &lt;a href="https://github.com/ashutoshrana/regulated-ai-governance" rel="noopener noreferrer"&gt;&lt;code&gt;regulated-ai-governance&lt;/code&gt;&lt;/a&gt; provides policy enforcement and audit for AI agents across FERPA, HIPAA, GDPR, CCPA, GLBA, and SOC 2.&lt;/p&gt;




&lt;h2&gt;
  
  
  Summary
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Rule&lt;/th&gt;
&lt;th&gt;What breaks&lt;/th&gt;
&lt;th&gt;The fix&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;1. Filter before ranking&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Post-retrieval filter leaves unauthorized docs in pipeline&lt;/td&gt;
&lt;td&gt;Metadata pre-filter at vector store query time&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;2. Filter on &lt;code&gt;institution_id&lt;/code&gt;&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;student_id&lt;/code&gt; alone allows cross-institution leakage&lt;/td&gt;
&lt;td&gt;Compound &lt;code&gt;AND&lt;/code&gt; filter: &lt;code&gt;student_id&lt;/code&gt; + &lt;code&gt;institution_id&lt;/code&gt;
&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;3. Enforce document categories&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;All of student's records are accessible regardless of type&lt;/td&gt;
&lt;td&gt;Category authorization as second enforcement layer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;4. Audit every retrieval event&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Application-level logs don't satisfy 34 CFR § 99.32&lt;/td&gt;
&lt;td&gt;Typed &lt;code&gt;AuditRecord&lt;/code&gt; per retrieval, routed to durable store&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;5. Identity from session&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;User-supplied filter values enable unauthorized access&lt;/td&gt;
&lt;td&gt;Filter constructed from verified session token only&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;These are not edge cases. They are the &lt;strong&gt;default failure modes&lt;/strong&gt; of standard RAG architectures when applied to regulated record-access environments. The fix for each is straightforward once you know where to look.&lt;/p&gt;




&lt;p&gt;&lt;em&gt;Reference implementation: &lt;a href="https://github.com/ashutoshrana/enterprise-rag-patterns" rel="noopener noreferrer"&gt;github.com/ashutoshrana/enterprise-rag-patterns&lt;/a&gt;&lt;/em&gt;&lt;/p&gt;

</description>
      <category>rag</category>
      <category>python</category>
      <category>compliance</category>
      <category>enterpriseai</category>
    </item>
  </channel>
</rss>
