Hi,
My name is Rohit Chauhan, and I am a Staffing Specialist at Novia Infotech LLC. I am reaching out to you on an exciting job opportunity with one of our clients.
Job Title: Observability
Solution Architect / Tech Lead
Location: Irvine, CA (Onsite)
Job Summary
We are seeking an experienced Observability Solution Architect / Tech Lead to lead the design and implementation of a next-generation Observability and Anomaly Detection Platform leveraging AI/ML technologies. This platform will support middle-office operations by enabling intelligent monitoring, anomaly detection, data validation, visualization, root cause analysis, and automated remediation capabilities.
The ideal candidate will have deep expertise in observability platforms, telemetry architecture, AI-driven analytics, and enterprise solution architecture. This role requires strong leadership skills, cross-functional collaboration, and hands-on experience designing scalable, secure, and highly available enterprise monitoring ecosystems.
Key Responsibilities
Solution Architecture & Platform Design
- Lead the end-to-end architecture and implementation of an enterprise observability platform.
- Design scalable, secure, and cost-effective observability solutions aligned with business SLAs, SLOs, and reliability goals.
- Define architecture standards, governance frameworks, and best practices for observability and monitoring.
- Develop high-level and low-level architecture designs for telemetry ingestion, processing, storage, and visualization.
Observability & Monitoring
- Architect centralized monitoring and observability solutions using tools such as:
- Splunk
- Grafana
- Datadog
- Build telemetry pipelines for logs, metrics, traces, and events across distributed systems.
- Design proactive monitoring strategies to improve operational visibility and system reliability.
- Enable root cause analysis, performance optimization, and service health monitoring.
AI/ML & Anomaly Detection
- Design and implement AI-driven anomaly detection frameworks.
- Integrate machine learning models for predictive analytics, anomaly detection, and automated remediation.
- Support intelligent alerting, event correlation, and automated operational insights.
- Collaborate with AI/ML teams to integrate advanced analytics capabilities into observability workflows.
Platform Security & Governance
- Ensure platform security, compliance, and governance standards are implemented across observability systems.
- Define access controls, data retention policies, and security best practices.
- Ensure observability architecture aligns with enterprise risk and compliance requirements.
Operationalization & Reliability Engineering
- Support operational readiness, platform scalability, and production deployment activities.
- Define monitoring KPIs, reliability metrics, and operational dashboards.
- Enable observability-driven Site Reliability Engineering (SRE) practices.
- Drive automation and self-healing capabilities across the monitoring ecosystem.
Leadership & Collaboration
- Lead technical architecture discussions and provide guidance to engineering teams.
- Collaborate with infrastructure, DevOps, SRE, AI/ML, and business stakeholders.
- Mentor engineering teams and establish technical standards and best practices.
- Drive stakeholder alignment across cross-functional teams and enterprise initiatives.
Required Skills
- 10+ years of experience in Solution Architecture, Observability, or Platform Engineering.
- Strong expertise in observability platforms including:
- Splunk
- Grafana
- Datadog
- Experience designing enterprise telemetry and monitoring architectures.
- Strong understanding of logs, metrics, traces, event correlation, and distributed observability.
- Experience building AI/ML-powered anomaly detection and monitoring solutions.
- Strong knowledge of:
- Root Cause Analysis (RCA)
- Reliability Engineering
- SLA/SLO-based monitoring
- Incident Management
- Experience with telemetry pipeline engineering and large-scale data processing.
- Strong understanding of cloud-native architectures and distributed systems.
- Knowledge of DevOps, SRE, and automation practices.
- Experience with platform security, governance, and compliance.
- Strong leadership, stakeholder management, and communication skills.
Preferred Skills
- Experience with GenAI-driven operational intelligence solutions.
- Experience in financial services or middle-office environments.
- Knowledge of AIOps and intelligent monitoring frameworks.
- Experience with cloud platforms such as AWS, Azure, or GCP.
- Familiarity with Kubernetes, Docker, and microservices environments.
- Experience with automation and infrastructure-as-code practices.
|
Rohit Chauhan IT Recruiter A: 4421 Avenida Ln, McKinney, TX, 75070
|
You received this message because you are subscribed to the Google Groups "NoviaJobs" group.
To unsubscribe from this group and stop receiving emails from it, send an email to noviajobs+unsubscribe@googlegroups.com.
To view this discussion visit https://groups.google.com/d/msgid/noviajobs/CAJ0-OE8-r%3D5i_nSxmDS7QAo7xiAkCw1hOw%2ByVzsMpZgXeKPXNw%40mail.gmail.com.
No comments:
Post a Comment