Remote Principal Site Reliability Engineer (SRE)

S&P Global Remote

remote engineer data infrastructure design software engineering production cloud team operating s&p monitoring

November 17, 2022

S&P Global

Cheyenne, WY

Position Summary

We are looking for an adept, action-oriented Principal Site Reliability Engineer to design and build out automated processes focused on service and infrastructure stability to enable our soon-to-be-launched digital transformation product which uses advanced NLP, knowledge engineering, and ML to accelerate innovation in engineering, manufacturing, and scientific operations. The perfect candidates will have cloud infrastructure, IAC, and monitoring/instrumentation skills, a proven track record of collaborating and iteratively implementing data-intensive solutions, strong operational skills to drive efficiency and speed, strong project leadership, and a strong vision for how the SRE discipline can proactively create positive impact for companies. You will be a part of an early-stage team. You will educate stakeholders, mentor team members, and have a significant stake in defining the future of the SRE function for the product.

Job Responsibilities

Contribute to a set of best patterns and practices for deploying cloud-based infrastructure as code in a secure, reliable and efficient manner

Support services before they are Generally Available (GA) through activities such as system design consulting, developing software platforms and frameworks, capacity planning, and readiness reviews.

Define and manage SLIs, SLOs, and SLAs for services, infrastructure, and processes running in production

Eliminating toil by automation across all the layers - infrastructure provisioning, configuration management, deployment, testing, and operation

Work closely with data scientists, micro-service developers, and security experts to build out a platform incrementally and securely

Maintain an excellent understanding of the business's long-term goals and strategy and ensures that the design, architecture, and availability are aligned with these

Design for disaster recovery balancing availability and consistency in multi-region scenarios

Research and experiment with emerging technologies and tools related to availability, monitoring, HA, capacity planning, etc....

Establish and reinforce disciplined production software engineering processes and best-practices

Ideal Qualifications

Experience operating high-availability, fault-tolerant, scalable, distributed software/infrastructure in production utilizing GitOps practices (Terraform preferred)

Strong background with either Scala (Java), Go, or Python programming experience

Comfort and ideally substantial experience operating big data infrastructure in a cloud-based ecosystem (AWS preferred)

Solid understanding of traffic management and networking concepts

Experience with stream-processing systems (ksqlDB, Spark Streaming, Apache Beam/Flink, etc.)

Experience with software engineering standard methodologies (unit testing, code reviews, design document, continuous delivery)

Develop and deploy production-grade services, SDK's, and data infrastructure emphasizing performance, scalability, and self-service.

Ability to conceptualize and articulate ideas clearly and concisely

Entrepreneurial or intrapreneurial experience where you helped lead the creation of a new product & organization

Nice to Have's

Be proficient in modern big data architectural approaches (Kappa/Lambda architectures, Data Lake Zones, etc.)

Mastery of operating and designing stream-based data systems (Kafka, AWS Kinesis, GCP PusSub, etc.) particularly under varying load

Deep understanding of the theoretical and practical tradeoffs of various NoSQL stores (Cassandra, Elasticsearch, DynamoDB, etc.) with respect to different read/write patterns and availability/consistency requirements

Experience working with knowledge graphs stores (Stardog, TigerGraph, Ontotext GraphDB, Neo4j) and surrounding semantic technology (OWL, RDF, SWRL, SPARQL, JSON-LD)

Experience working with Snowflake data warehouses or Databricks Lakehouses and dimensional modeling practices

BA/BS or Masters in Computer Science, Math, Physics, or other technical fields

Experience with at least 10+ terabyte datasets, ideally up to multiple petabytes

What We Offer

Competitive base salary and bonus

A comprehensive, benefits package that includes medical, dental, vision and life insurance plans, paid time off, a generous 401k match with no vesting period, parental leave and 3 volunteering days each year. For more information on benefits, please access the benefits page on our careers site: ~~~ .

For work locations in the state of Colorado, the anticipated minimum base salary for this role would be $150,000 - $210,000. Compensation will be determined by the education, experience, knowledge, and abilities of the applicant.

We're building a software solution that connects data in revolutionary ways, illuminating answers that were previously impossible to find and empowering our clients to envision the future so they can determine the best course of action in the present. Join us!

Equal Opportunity Employer:

S&P Global is an equal opportunity employer and all qualified candidates will receive consideration for employment without regard to race/ethnicity, color, religion, sex, sexual orientation, gender identity, national origin, age, disability, marital status, military veteran status, unemployment status, or any other status protected by law. Only electronic job submissions will be considered for employment.

If you need an accommodation during the application process due to a disability, please send an email to: ~~~ and your request will be forwarded to the appropriate person. US Candidates Only:

The EEO is the Law Poster ~~~ describes discrimination protections under federal law.

We're working hard to integrate our businesses. In the meantime, be sure to also visit ~~~/careers to discover all our combined company has to offer.

S&P Global delivers essential intelligence that powers decision making. We provide the world's leading organizations with the right data, connected technologies and expertise they need to move ahead. As part of our team, you'll help solve complex challenges that equip businesses, governments and individuals with the knowledge to adapt to a changing economic landscape.

Our people come prepared every day to creatively approach and solve new problems. They are instrumental in helping our customers see things differently and take on tomorrow's challenges, today.

Report this job

Similar jobs near me

site reliability engineer jobs near me

Remote Principal Site Reliability Engineer (SRE)

Similar jobs near me

Related articles