Affirm

Staff Software Engineer

at Affirm
Technology & Programming Full-Time USA
578 days ago

Description

Affirm’s Infrastructure Platform team is building a large-scale, massively distributed, fault-tolerant global infrastructure shared across multiple financial products, merchants and vendors. Ensuring that our infrastructure is openly available to engineers is a critical part of Affirm’s success story. We pride ourselves on our culture across engineering design, architecture and writing detailed tech specs and capturing feedback before large changes to systems.

We are looking for a Staff Site Reliability Engineer with deep technical knowledge and who’s passionate about Linux, networking topics, microservices and distributed architectures and has experience with handling large scale services to join our Site Reliability Engineering team. Our goal is to enable Affirm's global, service oriented architecture based product and infrastructure stack to be observable, highly resilient, scalable and fault tolerant, while maintaining our high SLA uptime expectations. You will excel if you have passion for digging deep, and a flare for sharp technical communication, prioritization, and organization. You will work directly with our Platform / Infrastructure and Product Development teams to build our next generation “always up” cloud-based platform.

Our work ranges from Observability/Telemetry Engineering, Reliability and Scalability Engineering, Chaos Engineering, Performance Engineering, Capacity Engineering and Disaster Recovery Engineering, and working closely with the security team on managing application level security.

Site Reliability Engineers are hybrid System, Software, Data and Network Engineers who are responsible and accountable to build and scale reliable systems that impresses our customers.

What you'll do

  • Own end to end availability, reliability and performance of the mission critical services

  • Troubleshoot various issues around reliability, resiliency, scalability and availability.

  • Define and measure SLI, SLA and SLO

  • Augment instrumentation to build a cohesive dependency mapping with special attention to points of failure

  • Build command and control automations to quickly fail away to reduce TTR and reduce manual work/eliminate Toil.

  • Assist with oncall and triage rotation

What we look for

  • Linux, Networking and AWS experience

  • Experience with containerization and container platforms. (e.g., Docker, Kubernetes)

  • Familiarity with Elasticsearch, Kibana/Grafana, Logstash, kafka and ways to scale these systems

  • Experience with automation systems (ansible, puppet, terraform) is a plus, saltstack preferred

  • Experience with open source systems a plus 

  • Software development experience in Python/Kotlin/Go is a plus

  • Experience with high performance networking (Quic, network layer optimization) or Real Time transaction protocols/methods (HTTP2, Server Sent Events, MQTT, WebSockets).

  • Recommends or helps architect an entire system. Acts as an expert in understanding and performing TCP dumps, snoop, and other network sniffers. Understands and applies knowledge of most protocols (TCP/IP, HTTP, UDP, etc.) 

USA Pacific base pay range (CA, WA, NY, NJ, CT): $190,000-$284,900

Sapphire base pay range (all other U.S. states): $171,000-$256,500


关注公众号,不定期副业成功案例分享
Follow WeChat

Success story sharing

Want to stay one step ahead of the latest teleworks?

Subscribe Now