Senior Site Reliability Engineer (Remote)
About the team
An engineer in our team works with a global infrastructure that has a great impact on millions of players. To guarantee the best experience possible, we manage and maintain tens of Kubernetes clusters spread around the world and connected to each other. We are on the cutting edge of open-source infrastructure technology. We adopted Kubernetes in production a little after the project was launched. Today we use technologies such as Cilium in our network stack.
We handle billions of logs daily and have hundreds of nodes and thousands of containers to serve more than a million requests per minute. We know this number will only grow, and we're looking for engineers that can help with the challenges of provisioning and operating infrastructure at a large scale.
About the role
Wildlife Studios is searching for infrastructure engineers to join our team. We seek an engineer with solid programming, network, and operation systems knowledge. Since we are always looking for new tools and technologies that better solve our problems, we value professionals that like to learn new things and are autonomous and proactive at implementing their ideas.
We'll need you to understand our systems flows, diagnose problems in the production environment, identify points of improvement and automation, and guarantee that we have the necessary infrastructure to create the best games in the world.
More about you
- Player focused. We are player oriented and infrastructure has a great impact in their experience. You have empathy with our players and focus on ensuring they have an outstanding experience. You seek to guarantee the highest availability possible.
- Automation is key to scaling.We look for engineers that have a history of planning and executing automation projects in order to get rid of any manual and repetitive tasks.
- Calm and pragmatism. When everything seems to be falling apart around you, you have a plan and keep calm.
- Bleeding edge. You are curious and like to study new technologies, test new solutions and measure the impact brought by changes. We want to ensure we are using the best stack possible
What you’ll do
- Develop, monitor and optimize infrastructure clusters (Kubernetes, Elasticsearch, MongoDB, Kafka).
- Define monitoring and observability patterns.
- Troubleshoot and manage incidents in production.
- Automate and improve infrastructure provisioning (Infrastructure as Code).
What you'll need
- Bachelor's degree in Computer Science, Computer Engineering or equivalent experience.
- Linux knowledge.You should be able to discuss in detail what happens under the hood (operating system: kernel + shared libraries + userland, network stack).
- Solid knowledge in at least one programming language. We work mostly with Go and Python.
- Experience with large scale production systems and technologies.
- Experience with Kubernetes.
- Experience with monitoring systems (eg: Datadog, Statsd, Grafana).
- Experience with infrastructure as code tools (eg: Ansible, Terraform).
- Experience with messaging systems such as Kafka and NATS.
- Experience with database management (Postgres, MongoDB, Cassandra, Redis, ElasticSearch).
- Experience with CI/CD pipelines (eg: Jenkins, Travis, etc).
We welcome people from all backgrounds who seek the opportunity to help build the best gaming company, where everyone thrives.
* Indicates a required field
Find more jobs like this