Role objective: Build data pipelines (crawling/parsing, deduplication/delta, embeddings) and connect external systems and interfaces.
Tasks
• Development of crawling/fetch pipelines (API-first; playwright/requests only where permitted)
• Parsing/normalization of job postings & CVs, deduplication/delta logic (seen hash, repost heuristics)
• Embeddings/similarity search (controlling Azure OpenAI, vector persistence in pgvector)
• Integrations: HR4YOU (API/webhooks/CSV import), SerpAPI, BA job board, email/SMTP
• Batch/stream processing (Azure Functions/container jobs), retry/backoff, dead-letter queues
• Telemetry for data quality (freshness, duplicate rate, coverage, cost per 1,000 items)
• Collaboration with FE for exports (CSV/Excel, presigned URLs) and admin configuration
Backend Data & Integration Engineer
Location : Kochi
Employment Type : Full Time
Work Mode : Hybrid
Experience : 4-8 yrs
Job Code : BEO-5090
Posted Date : 02/09/2025
Job Description
Responsibilities
Desired Candidate Profile
• 4+ years of backend/data engineering experience
• Python (FastAPI, pydantic, httpx/requests, Playwright/Selenium), solid TypeScript for smaller services/SDKs
• Azure: Functions/Container Apps or AKS jobs, Storage/Blob, Key Vault, Monitor/Log Analytics
• Messaging: Service Bus/Queues, idempotence & exactly-once semantics, pragmatic approach
• Databases: PostgreSQL, pgvector, query design & performance tuning
• Clean ETL/ELT patterns, testability (pytest), observability (OpenTelemetry)
Nice-to-haves
• NLP/IE experience (spaCy/regex/rapidfuzz), document parsing (pdfminer/textract)
• Experience with license/ToS-compliant data retrieval, captcha/anti-bot strategies (legally compliant)
• Working method: API-first, clean code, trunk-based development, mandatory code reviews
• Tools/stack: GitHub, GitHub Actions/Azure DevOps, Docker, pnpm/Turborepo (Monorepo), Jira/Linear, Notion/Confluence
• On-call/support: rotating, "you build it, you run it"