Hikikomori Neko

Posted on Nov 18

A Modern Data Governance Framework for Google Cloud: Implementing Just-Enough and Just-in-Time Access

#googlecloud #security #architecture #cloudsecurity

The Risk of Standing Permissions and the Principle of Just-Enough Access (JEA)
Managing Privileged Access with Just-in-Time (JIT) Elevation
- Automating JIT Approvals for Operational Efficiency
- Securing Machine Identities with Just-in-Time Access
Balancing Security and Utility: A Modern Data Governance Strategy

Recently, I encountered a question that illustrates the fundamental trade-off between security and utility inherent in most system designs. As data has become a critical and integrated part of business operations, a robust data platform is designed to ensure reliability, accuracy, and effective data governance. This immediately raises a critical question: Since individuals working with data products often require access to potentially sensitive information to perform their duties effectively, should we default to granting team members permanent, broad privileges? Furthermore, if we restrict access for individual user accounts to critical storage buckets and tables, how do we ensure operational efficiency when the need for data access inevitably arises?

Modern security principles, particularly the Zero Trust philosophy, are grounded in the core assumption that a breach is inevitable. Crucially, effective access control under this framework requires explicit verification, and the assignment of privileged permissions relies on the principles of Just-Enough Access (JEA) and Just-in-Time (JIT). To move beyond the abstract definitions of these access models, we will explore solutions within Google Cloud to better understand how we can accommodate the dual requirements of security and operational efficiency.

The Risk of Standing Permissions and the Principle of Just-Enough Access (JEA)

Highly privileged access, especially when it remains long-standing, is a significant security vulnerability against modern attack vectors. Since individuals working on data products often hold privileged access by default, their accounts offer a high Return on Investment (ROI) for attackers. Threat actors frequently leverage public information, such as job titles and organizational associations found on social media, to craft highly targeted spear phishing campaigns. This combination of high-value targets and broad permissions inherently widens the attack surface. This risk is further amplified if additional access is assigned based on organizational seniority or tenure. Should such a highly privileged account be compromised, an attacker could potentially gain immediate, broad access to sensitive data and critical systems.

However, even with the inherent risk of targeted attacks, restricting user access and network connectivity as with production applications often proves impractical. Data products are fundamentally different: the team's core requirement is to routinely access and work with the data, turning broad restrictions a significant hurdle to operational efficiency. While advanced techniques like Synthetic Data Generation, designed to closely mirror real data with essential statistical properties for development, offer an intriguing path forward, establishing a robust synthetic process is a significant technical challenge in itself. Given this constraint, most data processing lifecycles rely on layered security controls, such as identity management and dynamic data masking, to enforce the Just-Enough Access (JEA) principle of granting only the necessary permissions. To illustrate how we practically enforce this layered security model and achieve the necessary balance between security and efficiency, we will walk through specific Google Cloud solutions that address these data governance challenges. For simplicity, we will focus on assigning IAM roles to individual principals. However, using group principals and defining policies via Infrastructure as Code (IaC) are essential best practices for establishing scalable and auditable access control.

Foundation for JEA: Automated Data Discovery and Classification

The essential first step for a Just-Enough Access (JEA) design is categorizing data based on organizational requirements. The primary challenge at this stage is that the sheer volume of data makes manual discovery and classification infeasible. To address this scalability issue, automation becomes an indispensable element of data governance. In Google Cloud, the Sensitive Data Protection service offers a managed solution that can automatically discover, classify, and profile data. The resulting sensitivity insights can then be used to programmatically apply classification tags to data resources according to predefined rules. These tags then become the basis for enforcing access control in subsequent steps.

Applying Tag-Based Access Control with IAM Conditions

The next logical step after data classification is to define granular access controls. Leveraging the classification results from data discovery, we can then tailor conditional access policies to align with the specific job functions of user principals. In Google Cloud, for example, IAM role bindings support an optional configuration known as an IAM Condition, which limits the role assignment only if the specified condition is met. Using resource tags as the basis for the IAM Condition restricts a principal's access, making the role effective only for resources with matching tags. Crucially, this same conditional process should be applied to Service Accounts used for automation. This practice minimizes standing permissions for machine identities, effectively ensuring that production workloads are both secured and remain uninterrupted.

Applying Column-Level Controls with Dynamic Data Masking

While broad resource restriction based on data discovery provides a necessary foundation for data governance, relying solely on conditional access applied to entire resources often proves insufficient and overly restrictive, particularly when data teams perform exploratory analysis. For example, a marketing analyst may need to query non-sensitive data, such as bucketed age and city, while being simultaneously restricted from accessing highly sensitive data like government IDs within the same table. Another common scenario involves users needing to join multiple tables using sensitive keys, such as user IDs, without the ability to view the underlying sensitive data itself. To effectively address these granular access concerns, a more targeted approach is Dynamic Data Masking (DDM).

In BigQuery, the implementation of Dynamic Data Masking (DDM) involves assigning policy tags to specific columns and defining corresponding masking rules for each tag. By default, users without appropriate permissions will receive a permission error when querying the masked column, while their access to other columns remains unrestricted. To enable data analysis using sensitive columns as join keys, we can leverage a deterministic masking rule like Hash (SHA256). Since this rule generates consistent hashes for identical values, it allows analysts with the specialized roles/bigquerydatapolicy.maskedReader role to query and join tables effectively without revealing the underlying sensitive information, satisfying the need for both security and efficiency.

The process outlined above covers use cases for daily routine access. With careful design of data tags, access policies, and role assignments, these tools significantly enhance our security posture. However, requests for unmasked data access are not uncommon, ranging from pipeline troubleshooting to critical customer inquiries. To streamline the process of granting temporary, unmasked access while retaining essential audit trails, organizations can implement a Privilege Access Management (PAM) workflow to achieve Just-in-Time (JIT) access control.

Managing Privileged Access with Just-in-Time (JIT) Elevation

The core concept of Just-in-Time (JIT) privilege elevation is to grant users the access they need, for a specific period of time, with a complete audit trail. The process typically relies on a predefined role, scope, and duration that target users are authorized to assume. When users request elevation, they are usually required to provide a detailed justification. Approvers then review the requests, adding their own justification if necessary, and grant temporary access for the defined duration. After completing their task, users can manually withdraw the elevated privilege, or the system will automatically revoke it once the duration expires. This time-bound control significantly reduces security risks and prevents privilege sprawl, where unnecessary standing permissions accumulate over time.

In Google Cloud, Privileged Access Manager (PAM) is the managed solution for implementing Just-in-Time (JIT) access and privilege elevation. This managed service seamlessly integrates the entitlement, request, notification, and approval workflows, providing comprehensive audit trails. Returning to the BigQuery example, we can define an entitlement that allows data team members to elevate their permissions to the highly privileged roles/datacatalog.categoryFineGrainedReader role, which grants access to the underlying, unmasked data. When team members require unmasked data access, they initiate a time-bound privilege elevation request that includes a clear duration and justification for subsequent review and approval. This integrated process ensures efficient handling of privileged tasks while guaranteeing that access to sensitive, unmasked data remains just-in-time and fully auditable.

Automating JIT Approvals for Operational Efficiency

The human approval process provides a necessary layer of defense against unauthorized access. However, working with sensitive data can be a frequent requirement in some operational scenarios. Even in these cases, enforcing a privilege elevation process with a complete audit trail remains a critical security control. The key strategy is to automate approvals for common, predictable scenarios using custom logic as a safeguard. To accommodate this, we can assign a service account as the approver principal. This configuration utilizes the elevation request to trigger an automated workflow, such as a Cloud Function. This workflow performs custom validation, for instance, checking ticket identifiers and justifications, and automatically grants the request. This approach dramatically reduces the burden of human review and eliminates potential operational bottlenecks.

Securing Machine Identities with Just-in-Time Access

As previously noted, minimizing standing permissions for machine identities is a security best practice. Granting persistent access to sensitive data significantly expands the attack surface, heightening the risk of software supply chain attacks or compromises within the underlying infrastructure. While network controls are the primary defense against data exfiltration in these scenarios, Just-in-Time (JIT) access control serves a vital complementary role by strictly reducing an attacker's window of opportunity. Instead of relying on standing permissions, automation workflows should be designed to request elevated privileges using an elevation workflow only when necessary. Since these workflows are often predictable, the audit trails from this elevation create high-fidelity telemetry, ideal for machine learning-based security monitoring to detect any unusual elevation pattern. By combining this audit capability with custom logic within the approval workflow, we establish layered defenses to safeguard sensitive data from compromised machine identities.

Balancing Security and Utility: A Modern Data Governance Strategy

Big data governance is a challenging yet fascinating discipline. Our exploration began by highlighting the inherent conflict between security and utility, underscoring that effective system design fundamentally relies on trade-offs. While the business value offered by big data analysis is a critical asset, a secure foundation that protects the organization from compliance and legal risks is equally essential. This tension is especially true with the continued rise of AI and agentic applications, as the speed of data generation and the requirement for data consumption will increase even further.

In this article, we focused on applying security principles to data governance through the lens of identity access controls. However, access controls are only one layer of defense against sophisticated threats. To continue my study of security and effective system design, I will explore how VPC Service Controls offer a crucial, additional layer of defense against data exfiltration, complementing the identity-based controls in a future article.

Thank you for reading!

DEV Community