GKE Platform Engineer

This job has been expired
C2C

Job title: GKE Platform Engineering

Location: NC

Job Description :: OCP Gen AI

Role Summary

The role focuses on GKE platform engineering, infrastructure automation, security, and reliability, with working knowledge of GenAI services and guardrails to support GenAI workloads hosted on the platform. This is not a GenAI model-building role—instead, the engineer ensures that GenAI and non‑GenAI workloads run securely, reliably, and compliantly on GKE.

Key Responsibilities

GKE Platform Engineering

Design, deploy, and manage Google Kubernetes Engine (GKE) clusters for enterprise workloads.
Build and maintain shared Kubernetes platforms supporting multiple application teams.
Implement cluster-level capabilities such as:
Networking and ingress
Autoscaling and capacity planning
High availability and disaster recovery
Standardize GKE configurations following enterprise and security best practices.
Infrastructure as Code (IaC)

Provision and manage GCP infrastructure using Terraform.
Automate creation of:
GKE clusters
Networking, IAM, and service accounts
Supporting platform services
Develop reusable Terraform modules and enforce IaC standards.
Cloud & Platform Operations

Operate and support production-grade GCP environments.
Implement monitoring, logging, and ing for GKE clusters and workloads.
Troubleshoot cluster, networking, and workload-level issues.
Optimize platform reliability, performance, and cost.
Security & Guardrails (GenAI-Aware Platform)

Implement and enforce GCP security guardrails, including:
Model Armor
Sensitive Data Protection (SDP)
Ensure platform compliance with:
Enterprise security standards
Data privacy and access controls
Support secure hosting of GenAI workloads on GKE, without owning model development.
GenAI Platform Enablement (Awareness-Level)

Maintain working knowledge of GCP GenAI services (e.g., Vertex AI) from a platform perspective.
Enable teams to deploy GenAI-enabled applications on GKE securely.
Understand GenAI concepts such as:
Inference workflows
Data sensitivity risks
Responsible AI constraints
Partner with application and AI teams to ensure GenAI workloads meet platform, security, and compliance requirements.
Automation & Scripting

Use Python for:
Platform automation
Operational tooling
Integration scripts
Support CI/CD pipelines for platform and application deployments.
Required Skills & Experience

Core Platform Skills

Strong hands-on experience with GCP / Azure or OCP (Openshift) platform
Deep experience with Google Kubernetes Engine (GKE).
Solid working knowledge of:
Kubernetes concepts (pods, services, ingress, autoscaling)
Cluster operations and troubleshooting
Experience supporting large-scale, multi-team Kubernetes environments.
Infrastructure & Automation

Proven experience using Terraform for IaC on GCP / Azure or OCP (Openshift) platform.
Proficiency in Python for automation and scripting.
CI/CD and Git-based workflows.
Security & Governance

Experience implementing GCP security and governance controls.
Working knowledge of GCP Guardrails, including:
Model Armor
Sensitive Data Protection (SDP)
Strong understanding of IAM, networking, and least-privilege access.
GenAI Conceptual Understanding (Platform-Level)

Good understanding of Generative AI concepts, including:
LLM lifecycle basics
Inference vs. training
Data privacy and security considerations
Ability to support GenAI workloads from a platform and governance standpoint, not application logic.

Scroll to Top