Senior Big Data Engineer (Python | AWS)

Location : Kochi

Employment Type : Full Time

Work Mode : Hybrid

Experience : 6-12 yrs

Job Code : BEO-5173

Posted Date : 11/02/2026

Job Description

Responsibilities

Role Summary
Own the data platform that powers clinician discovery, credential unification, and care-quality analytics. Design resilient, low-latency ingestion and transformation at scale (batch + streaming) with GDPR-by-design. Your work underpins search, matching, and ML features in our telemedicine platform across Germany.


Key Responsibilities
• Design and operate AWS-native data lakehouse: Amazon S3 + Lake Formation (governance),
Glue/Athena (ELT), and optional Amazon Redshift for warehousing.
• Build high-throughput ingestion and CDC pipelines from partner APIs, files, and databases
using EventBridge, SQS/SNS, Kinesis/MSK, AWS DMS, and Lambda/ECS Fargate.
• Implement idempotent upserts, deduplication, and delta detection; define source-of-truth
governance and survivorship rules across authorities/insurers/partners.
• Model healthcare provider data (DDD) and normalize structured/semi-structured payloads
(JSON/CSV/XML, FHIR/HL7 if present) into curated zones.
• Engineer vector-aware datasets for clinician/patient matching; operate pgvector on Amazon.
Aurora PostgreSQL or use OpenSearch k-NN for hybrid search.
• Establish data quality (freshness, accuracy, coverage, cost-per-item) with automated checks.
(e.g., Great Expectations/Deequ) and publish KPIs/dashboards.
• Harden security & privacy: IAM least-privilege, KMS encryption, Secrets Manager, VPC
endpoints, audit logs, pseudonymised telemetry; enforce GDPR and right-to-erasure.
• Observability-first pipelines using OpenTelemetry (ADOT), CloudWatch, X-Ray; DLQ
handling, replay tooling, resiliency/chaos tests; SLOs and runbooks.
• Performance tuning for Aurora PostgreSQL (incl. indexing, partitioning, vacuum/analyze)
and cost-aware Spark (EMR/Glue) jobs.
• CI/CD for data (Terraform/CDK, GitHub Actions/CodeBuild/CodePipeline); test automation
(pytest/DBT) and blue/green or canary for critical jobs.

Desired Candidate Profile

• 6+ years in data engineering at scale; proven delivery in production systems (regulated
domains a plus).
• Expertise in Python and SQL; hands-on with Spark (EMR/Glue) and stream processing
(Kinesis/MSK/Flink/Spark Streaming).
• Deep AWS experience across S3, Glue, Athena, Redshift or Aurora PostgreSQL, Lake
Formation, DMS, Lambda/ECS, Step Functions, EventBridge, SQS/SNS.
• PostgreSQL mastery incl. query planning, indexing, and performance tuning; familiarity with
pgvector or OpenSearch vector search.
• Strong grasp of idempotency, deduplication, CDC, schema evolution, SCDs, and contract
testing for data products.
• Observability (OpenTelemetry), CI/CD, and IaC (Terraform/CDK) best practices; strong
incident response and on-call hygiene.
• Security-by-design mindset: data minimization, encryption, secrets, PII-safe logging; working knowledge of GDPR and auditability.
• Effective communicator across Product, Platform, Data Science, and Compliance; pragmatic,metrics-driven delivery.


Nice to Have
• Experience with FHIR/HL7, German TI/ePrescription/ePA integrations.
• DBT for transformations; OpenMetadata/Amundsen for catalog/lineage.
• Go for high-throughput services; experience with Bedrock or SageMaker for embedding
generation.


How We Work & Benefits
• API-first, clean architecture, and pairing culture; mandatory code reviews.
• Remote-friendly with defined core hours; mission-led, patient-safety-first.
• Ownership mindset: you build it, you run it (with sensible SLOs and error budgets).


Compliance & Notes
• All PHI/PII processed within EU regions (e.g., eu-central-1); strict key management via AWS
KMS and Secrets Manager.
• Right-to-erasure and lawful-basis handling embedded in data lifecycle (tombstones, purge
workflows, and immutable audit trails).

Back

We use cookies to personalize and enhance your browsing experience on our websites. By clicking "Accept all cookies", you agree to the use of cookies. You can read our Cookie Policy to learn more.