JOB TITLE: Senior DevOps Engineer

DEPARTMENT: Software Development

REPORTS TO: Project Manager

PURPOSE:

We are seeking a skilled and detail-oriented Senior DevOps Engineer to join our team. In this role, you will be responsible for designing, implementing, and maintaining scalable and secure infrastructure to support our development and deployment pipelines. The ideal candidate will have extensive experience in cloud services, automation, and CI/CD practices, with a strong focus on enhancing operational efficiency and ensuring system reliability. You will play a critical role in driving continuous improvement, ensuring seamless collaboration between development and operations, and delivering robust, user-focused solutions.

KEY RESPONSIBILTIES:

  • Kubernetes Management: Deploy, manage, and scale applications using Kubernetes. Ensure reliability and performance of Kubernetes clusters.
  • Cloud Functions Development: Design, deploy, and manage cloud functions to support diverse applications and services.
  • Infrastructure Resilience: Maintain and enhance the resilience and uptime of production infrastructure, ensuring high availability through robust failover and replication strategies.
  • Load Balancer Configuration: Configure and manage load balancers to efficiently distribute network and application traffic for high availability and optimal performance.
  • Replica and Cluster Management: Administer Citrus clusters and manage replicas to ensure data redundancy and resource optimization.
  • CI/CD Pipeline Management: Design, implement, and maintain CI/CD pipelines using tools like GitHub to streamline development processes and automate deployments.
  • Monitoring and Alerting: Set up and manage monitoring systems (e.g., Grafana) to ensure real-time insights into system health, performance, and resource utilization.
  • Incident Response & Troubleshooting: Lead incident management, root cause analysis, and implementation of preventive measures to ensure continuous operational stability.
  • Collaboration with Cross-Functional Teams: Work closely with development, QA, and product teams to support seamless application integration and deployment.
  • Distributed Tracing & Performance Monitoring: Implement distributed tracing solutions to track and troubleshoot service requests across distributed systems.
  • Security & DevSecOps: Manage security initiatives such as vulnerability scanning and remediation using tools like GCP's Security Command Center.

QUALIFICATIONS, SKILLS AND EXPERIENCE:

  • Bachelor’s degree in Computer Science, Information Technology, or any engineering discipline.
  • Minimum 6+ years of experience in DevOps or Site Reliability Engineering (SRE).
  • Proven experience with Google Cloud Platform (GCP) infrastructure.
  • Strong expertise in containerization technologies, particularly Docker and Kubernetes.
  • Hands-on experience in managing CI/CD pipelines and automation tools.
  • Proficiency in scripting, with a focus on Bash, and familiarity with additional scripting languages (e.g., Python, NodeJS, Go).
  • Availability for on-call rotation for site reliability.
  • Strong experience with Docker and Kubernetes for containerized application management.
  • Proficient in monitoring systems (Grafana) and distributed tracing.
  • Collaborative across teams, ensuring seamless deployment and integration.
  • Strong incident management and root cause analysis expertise.