Senior Cloud / DevOps / AI/ML Engineer (AWS Platform & MLOps)

Location : Kochi

Employment Type : Full Time

Work Mode : Hybrid

Experience : 7-14 yrs

Job Code : BEO-5202

Posted Date : 12/02/2026

Job Description

Responsibilities

Role Summary
Own our secure, multi-account AWS foundation and the MLOps/GenAI platform that powers
clinician matching, document processing, and safety tooling. You blend SRE discipline with ML
platform pragmatism to deliver compliant, observable, and cost-efficient infrastructure.


Key Responsibilities
• Build and operate a secure AWS landing zone (Organizations, Control Tower), VPC
architecture, private networking, and multi-account guardrails.
• Design CI/CD and IaC at scale (GitHub Actions/CodeBuild/CodePipeline, Terraform and/or
AWS CDK); policy-as-code (Open Policy Agent, AWS SCPs).
• Run compute fabrics for services and data: Amazon EKS (preferred) and ECS Fargate;
autoscaling, HPA/Karpenter, cluster security (IRSA, PodSecurity).
• Observability platform: AWS Distro for OpenTelemetry, CloudWatch, Prometheus/Grafana,
X-Ray; golden signals, SLOs, incident response and on-call.
• Security-by-default: IAM least-privilege, KMS envelope encryption, Secrets
Manager/Parameter Store, AWS WAF/Shield, artifact signing, SBOM/SLSA.
• Resiliency engineering: multi-AZ baselines, chaos testing, backup/DR (AWS Backup), game
days; cost management with CUR/Budgets/rightsizing.
• MLOps: SageMaker projects/pipelines, model registry, feature store, inference endpoints;
safe deployment patterns (shadow/canary/AB) and data drift monitoring.
• GenAI: Amazon Bedrock integration (guardrails, content filters, PII redaction), retrieval with
vector indexes (pgvector on Aurora or OpenSearch k-NN).
• Data platform enablement with S3/Lake Formation/Glue/Athena/EMR; secure data paths for
training/serving; governance and auditability.
• Champion DevSecOps: threat modeling, SBOM scanning, container/image hardening, and
secure software supply chain.

Desired Candidate Profile

Required Qualifications
• 7+ years building/operating cloud platforms; deep hands-on with AWS (networking, IAM,
compute, storage, security).
• Strong Terraform and/or AWS CDK skills; GitOps and CI/CD at scale; Linux, containers,
Kubernetes (EKS) in production.
• Operational excellence: SRE practices, SLO/error budgets, incident management, on-call, and
postmortem culture.
• MLOps experience with SageMaker or equivalent; data pipelines for feature engineering;
real-time/batch inference and monitoring.
• Experience with Bedrock/OpenSearch/pgvector for RAG and vector search; understanding of
prompt/response safety and audit trails.
• Security/compliance literacy (GDPR, logging/retention, key management, network isolation).


Nice to Have

• AWS certifications (Solutions Architect Pro, Security, Data/ML).
• Experience with FHIR/HL7 integrations and healthcare-grade identity (OIDC, SMART on
FHIR).
• Background in cost optimization, FinOps, and incident response leadership.


How We Work & Benefits
• Influence the platform architecture end-to-end; work with a small, senior team.
• Remote-friendly; pairing and design reviews; continuous improvement culture.
• Mission with impact: your reliability and ML tooling improve access to care daily.


Compliance & Notes
• All workloads run in EU regions (e.g., eu-central-1); strict data residency and encryption
baselines.
• GenAI usage must be privacy-preserving with opt-in consent and redaction for PHI/PII;
comprehensive audit logs maintained.

Back

We use cookies to personalize and enhance your browsing experience on our websites. By clicking "Accept all cookies", you agree to the use of cookies. You can read our Cookie Policy to learn more.