Build and optimize ETL / ELT processes leveraging Databricks' native capabilities to handle large volumes of structured and unstructured data from various sourcesImplement data quality frameworks and monitoring solutions using Databricks data quality features to ensure data accuracy and reliability across all data productsEstablish best practices for data governance, security, and compliance within the Databricks ecosystem and integrate with enterprise systemsOperational Responsibilities :
- Monitor and maintain production data pipelines to ensure 99.9% uptime and optimal performance across all Databricks workloads and clusters
- Implement comprehensive logging, alerting, and monitoring systems using Databricks monitoring capabilities and integration with enterprise monitoring tools
- Perform regular health checks on Databricks cluster performance, job execution times, and resource utilization to identify and resolve bottlenecks proactively
- Manage incident response procedures for Databricks pipeline failures, including root cause analysis, resolution, and post-incident reviews
- Establish and maintain disaster recovery procedures and backup strategies for critical data assets within the Databricks environment
- Conduct regular performance tuning of Spark jobs and Databricks cluster configurations to optimize cost and execution efficiency
- Implement automated testing frameworks for Databricks-based data pipelines, including unit tests, integration tests, and data validation checks
- Maintain comprehensive documentation for all Databricks operational procedures, runbooks, and troubleshooting guides
- Coordinate scheduled maintenance windows and Databricks system upgrades with minimal business impact
- Manage user access controls, workspace configurations, and security policies within Databricks environments
- Monitor data lineage using Databricks Unity Catalog and maintain metadata management systems to support operational transparency and compliance requirements
- Establish capacity planning processes to forecast Databricks infrastructure needs and manage cloud costs effectively
- Provide technical guidance and mentorship to junior team members on Databricks best practices and data engineering principles
- Participate in on-call rotation for critical production systems with focus on Databricks platform stability
- Lead operational reviews and contribute to continuous improvement initiatives for Databricks platform reliability and efficiency
- Coordinate with infrastructure teams on Databricks cluster provisioning, network configurations, and security implementations
Requirements / Qualifications :
Education & Experience :
- Degree in Computer Science or Computer Engineering
- Minimum 8-10 years working experience in system operations compliance and management areas
- Project hands-on experience specifically with Databricks platform (primary requirement)
- Project experience in cloud operations or cloud architecture
- Must be cloud certified (AWS)
Core Technical Skills :
- Expert-level proficiency in Databricks platform, including workspace management, cluster configuration, and job orchestration
- Strong expertise in Apache Spark within Databricks environment, including Spark SQL, DataFrames, and RDDs
- Extensive experience with Delta Lake, including data versioning, time travel, and ACID transactions
- Proficiency in Databricks Unity Catalog for data governance and metadata management
- Good in-depth understanding of data warehouse concepts, data profiling, data verification and advanced analytics techniques
- Strong knowledge of monitoring, incident management, and cloud cost control
- Databricks (primary and most critical skill)
- AWS cloud services and architecture
- IDMC (Informatica Data Management Cloud)
- ML Ops practices within Databricks environment (Good to have)
- STATA for statistical analysis is advantage (Good to have)
- Amazon SageMaker integration with Databricks (Good to have)
- DataRobot platform integration (Good to have)
Soft Skills & Stakeholder Management :
- Good interpersonal skills with the ability to work with different groups of stakeholders
- Strong problem-solving skills and ability to work independently in a fast-paced environment with minimal supervision
- Excellent communication skills for technical documentation and cross-team collaboration
- Databricks certification (Associate or Professional level) - highly preferred
- Exposure to hospital information / clinical systems is an added advantage
- Understanding of DevOps practices and CI / CD pipelines for Databricks-based data engineering projects
- Knowledge of ITIL frameworks and operational best practices
#J-18808-Ljbffr