Job Title: Site Reliability Engineer
Location: Reston, VA (Hybrid)
Interview: Video then Face-to-Face (Required)
Work Type: Hybrid (1x onsite per week, some weeks more)
Job Description
Site Reliability Engineer in the Washington DC, Maryland and/or Virginia area for a long-term contract position with our customer in Reston, VA (Hybrid position, 1x onsite per week in Reston VA).
Roles & Responsibilities
- Communicates architectural decisions, plans, goals, and strategies, while highlighting short-term trade-offs vs. long-term commitments and costs
- Engage in and improve the end-to-end lifecycle of services, starting from inception & design, deployment, and operations
- Establish automation capabilities leveraging cloud-native solutions to improve the developer experience
- Support activities including system design consulting, developing software platforms and frameworks, capacity planning, and launch reviews
- Willingness to troubleshoot difficult issues and engage the customer
- Willingness to learn new AWS services and other technologies as required
- Ensure systems scalability and sustainability leveraging automation and continuously improve reliability and velocity
- Experience with enterprise cloud transformation and migration efforts
- Guide customers on cloud-native design and architecture patterns
- Provide consultation on technology infrastructure planning and engineering for assigned systems
- Assess implications of technology strategies on infrastructure capabilities
- Establish strategies to migrate legacy applications into microservices and host on AWS cloud platform
- Leverage cloud-native architecture components including containers, immutable infrastructure, microservices, service mesh, etc.
- Conduct research on global technology trends and their applicability
- Promote modern application design and engineering best practices
- Monitor and manage stability, availability, and performance across IT domains (systems, network, storage, security)
- Analyze systems to identify problems, trends, and improvement opportunities
- Automate end-to-end processes for patching and upgrades in AWS cloud ecosystem
- Make data-driven recommendations to improve software delivery efficiency
- Mentor peers and collaborate across teams
Required Skills
- Minimum of one AWS certification is required
- Minimum of 10 years of IT experience, with at least 5 years in AWS cloud
- Platform engineering and administration experience
- Strong leadership experience driving transformation initiatives
- 3–5 years of experience in a Site Reliability Engineering role
- Experience with SRE principles and transformation
- 3+ years of experience with:
- Containerization (Kubernetes)
- Cloud technologies (AWS, Azure, etc.)
- DevOps toolchain (Ansible, Jenkins, Artifactory, Bitbucket, etc.)
- Technical patterns (IaC, automated provisioning/release, CI/CD)
- Solid understanding of software coding techniques and full software engineering lifecycle (build, integration, testing, release, deployment) using Python
- Experience developing and/or challenging engineering solutions and collaborating with teams and customers
- Platform Engineering Lead with hands-on experience building middleware environments
- Linux system administration experience required
AWS & Technical Requirements
- Strong hands-on experience with AWS services including but not limited to:
- VPC, Networking, Direct Connect
- Subnets, NACLs, Security Groups
- EC2, S3, IAM
- ELB, Lambda
- CloudWatch, CloudTrail
- EKS
- Must have hands-on implementation and production-level AWS experience
- Hands-on experience with automation and infrastructure provisioning is required
- Experience with Infrastructure as Code and Policy as Code
- Must be familiar with:
- Terraform automation
- Ansible playbooks
- Python
- Experience with AWS CloudFormation and CDK
- Experience writing Lambda functions (preferably Python/Boto3)
- Strong Linux Bash scripting skills
- Hands-on experience with containerization and Amazon EKS (big plus)
- Experience with DevOps tools such as Git, Crucible, Jenkins
- Strong understanding of CI/CD toolchain
