Overview
About Docuvera
Our mission is to lead the digital transformation of the Life Science industry, drastic ally improving time to market, productivity, patient safety and patient outcomes.
Docuvera is a global SaaS leader in component-based structured content authoring, transforming how pharmaceutical companies create, manage, and reuse product-related content.
Our platform empowers life sciences organizations to drive digital transformation through efficient, compliant, and multiformat content creation.
Headquartered in Wellington, New Zealand, our distributed team work across New Zealand, Asia, and the United States with customers throughout the world.
The part you'll play
You'll design, build, and run the AI infrastructure that powers Docuvera's enterprise AI and our AI-driven content management platform.
Day to day, you'll turn machine-learning models into reliable products, build scalable data pipelines, and provide the foundation for AI-powered workflows.
You'll lead MLOps practices, stand up vector databases and knowledge graphs, and work closely with data scientists and engineers to deploy and monitor models in production.
Your work enables key programs like company-wide knowledge assistants, AI-assisted bug triage, and intelligent content curation.
You'll automate ML workflows, keep AI services highly available, and ensure everything meets life-sciences regulations (FDA 21 CFR Part 11, GxP).
What you'll focus on
AI infrastructure & MLOps platforms
Design, deploy, and operate AI / ML infrastructure on AWS (e.g., Bedrock, SageMaker, Lambda, S3, Aurora PostgreSQL, DynamoDB, EventBridge, SQS, EKS, ECS, CloudFormation / CDK).
Use the right data store for the job : relational workloads on Aurora PostgreSQL (read replicas, point-in-time recovery, RDS Proxy) and low-latency / key-value workloads on DynamoDB (with DAX when helpful).
Set up and manage vector databases (OpenSearch, Pinecone, or Milvus) and a graph database (Neptune) to support semantic search and enterprise knowledge.
ML pipeline development
Build end-to-end MLOps pipelines with SageMaker Pipelines, MLflow, or Kubeflow for training, validation, versioning, and deployment.
Orchestrate event-driven ML lifecycles with EventBridge; trigger retraining on data-quality signals or upstream changes; decouple components with SQS (FIFO / Standard) and DLQs.
Automate safe releases with A / B tests, canary deployments, and fast rollbacks.
Data engineering for AI
Ingest and prepare data from tools like Confluence, Jira, GitLab, Slack, and SharePoint.
Build real-time sync using AWS Glue, Lambda, EventBridge (buses / schemas), and SQS (including SNS?SQS fan-out and back-pressure handling).
Design reliable data flows : idempotent processing, outbox / CDC patterns, and exactly-once semantics where needed (Aurora logical decoding / CDC, DynamoDB Streams into EventBridge / SQS).
Vector search & retrieval
Deploy and tune vector databases and embedding pipelines for semantic search.
Implement retrieval-augmented generation (RAG) with Bedrock Knowledge Bases, using metadata-based access control and near-real-time content updates via EventBridge and SQS.
Model deployment & monitoring
Containerize and ship models with Docker; run on Kubernetes (EKS), ECS, and serverless where it fits.
Monitor model performance, drift, and quality with CloudWatch and New Relic.
Track SQS health (oldest message age, queue depth, visibility timeout), EventBridge failures, and Aurora performance (Performance Insights, enhanced monitoring).
Knowledge graph operations
Build and maintain Neptune graphs linking documents, code, tickets, and business entities.
Optimize graph queries and relationships to enable cross-domain reasoning.
Compliance & security for AI
Implement AI-specific controls : prompt / response logging, audit trails, and data classification for regulated workflows.
Enforce encryption in transit / at rest (KMS) across Aurora, DynamoDB, SQS, EventBridge.
Apply least-privilege IAM, resource policies for event buses, VPC endpoints / PrivateLink, and database auditing (e.g., pgaudit).
Define data residency, retention, backups, and PITR across databases, object storage, and queues in line with FDA 21 CFR Part 11 and GxP.
CI / CD for models & data platforms
Extend GitLab CI / CD for MLOps : automated tests, validation gates, and deployment orchestration.
Manage infrastructure as code with AWS CDK, CloudFormation, or Terraform (Aurora clusters / subnets / proxies, DynamoDB tables, EventBridge buses / rules / targets, SQS queues / DLQs) with environment guardrails.
What you'll bring to the role
Some or all of these technical skills, experience and knowledge
AWS AI / ML & data : Bedrock, SageMaker, Lambda, S3, Aurora PostgreSQL (Aurora / RDS), DynamoDB, EventBridge, SQS, EKS, ECS, Neptune, OpenSearch; hands-on architecture and cost tuning.
Relational data & PostgreSQL : Strong SQL, schema design, indexing / partitioning, query tuning, connection management (RDS Proxy), HA / DR (Multi-AZ, read replicas, PITR), CDC / outbox patterns.
MLOps platforms : MLflow, Kubeflow, SageMaker Pipelines for lifecycle management, experiment tracking, and automated deployments.
Event-driven systems : EventBridge (rules, schedules, schema registry) and SQS (FIFO / Standard, DLQs, ordering / deduplication) for loosely coupled services at scale.
Vector search & RAG : Implementing and tuning Pinecone / Milvus / Weaviate / OpenSearch and embedding workflows in production RAG systems.
Data pipelines : Real-time ingestion with Glue, Kinesis, Lambda; integrating enterprise APIs / webhooks; EventBridge buses and SQS workers for reliable, idempotent processing.
Containers & Kubernetes : Docker, EKS, and serverless model serving; autoscaling for AI workloads.
Graph databases : Neptune or Neo4j with Gremlin / Cypher / SPARQL.
Programming & automation : Python / Bash and IaC (CDK, Terraform, CloudFormation).
Model operations : Deploying / monitoring LLMs, embeddings, and custom ML models with performance optimization.
Enterprise integration : Model Context Protocol (MCP), API gateways, and connectors for systems like Confluence, Jira, and SharePoint.
Observability & resilience : CloudWatch / New Relic dashboards, SLOs / SLIs, synthetic checks; queue latency / depth alerts; EventBridge failure handling; DB health and slow-query monitoring.
AI governance : Model risk management, validation frameworks, and compliance logging for regulated AI apps.
Come with a positive, hands-on approach to complex AI infrastructure challenges.
Be innovative and resourceful; proactive about improving MLOps.
Are outcome-focused and adaptable to changing AI needs and timelines.
Is a strong communicator who partners well with research, engineering, platform, and business teams.
These core qualities and attributes
Focuses on what matters most– prioritises work that drives real results, aiming for progress not perfection, and using time wisely to make smart, impactful choices
Proactive and curious– shows genuine curiosity and interest, taking initiative, and investing discretionary effort to deliver
Agency and ownership– pragmatic and independent but knows when to seek help or share information early for the benefit of the team
Considered and constructive– listens and responds thoughtfully, is honest about challenges, but looks for ways to improve
Resilient and adaptable– willing to adjust and tackle change with positive intent.
Change is constant; how we respond defines us.
Collaborative yet accountable– will unite as a team but takes notice when things aren't working, acts on it, and will hold self and others responsible
Strong camaraderie– builds others up, is respectful and encouraging, will debating ideas not people, and will recognising effort, giving credit where it's due
Positive and professional outlook– maintains optimism and commitment, and takes pride in delivering quality work
These certifications or qualifications are sought after (desired but not required)
AWS Certified Machine Learning – Specialty
AWS Certified DevOps Engineer – Professional
AWS Certified Solutions Architect – Professional
AWS Certified Security – Specialty
MLOps / AI Engineering certifications (e.g., MLOps Institute, Coursera)
What we can offer you
We operate in a high trust environment, and we really walk the talk.
We aspire for everyone to be themselves and be comfortable at work, so we put great emphasis on ensuring our people have what they need to be at their best.
This includes offering :
offering a digital first, fully flexible working style.
We've embraced asynchronous, hybrid working as a norm.
modern tools and systems, with a big focus on our use of AI
a focus on personal growth with career, learning, and development tools available, plus some dedicated 'tools down' personal development time
in NZ, an additional week of paid leave; staff appreciation leave at Christmas plus a day off for your birthday,
in the USA, we offer unlimited PTO and fully funded health benefits, like dental and medical
all within in a tight knit, supportive, and inclusive global community.
Key information about our recruitment process
We'll then request a minimum of two references and require a clean background check from your home country.
#J-18808-Ljbffr
Engineer • Wellington, Wellington, New Zealand