Senior Observability Engineer – Team Lead

Permanent employee, Full-time · Maria01 (Helsinki), Remote (EU)

About:

We’re an ambitious, mission-driven group focused on making the world a better place by delivering affordable, environmentally sustainable AI compute for training and deploying machine learning models at scale.

Responsibilities:

Lead the design, deployment, and scaling of a 360-degree unified observability stack across infrastructure assets (network, storage, cloud, power, servers, VMs, services, security, compliance, customer-facing dashboards, Kubernetes, etc).
Use Grafana, Loki, and ELK stacks to build advanced monitoring, logging, and alerting solutions.
Identify and resolve critical flaws/issues, with a proven track record of saving organisations significant time or cost.
Orchestrate observability data to detect trends, forecast issues, and move from reactive to proactive monitoring.
Partner with engineering, SRE, and operations teams to create dashboards, alerts, and visualisations that enable actionable insights.
Manage end-to-end workflows at a senior level, ensuring observability practices are embedded across projects and aligned with business goals.
Define best practices, set standards for log/metadata organisation, and maintain clear documentation.

Qualifications:

Deep experience with the Grafana stack (Grafana, Loki, Mimir, Alloy).
Strong familiarity with the ELK/Opensearch stack (Elasticsearch, Logstash, Kibana, Fluentd, Filebeat, Metricbeat).
Solid understanding of Prometheus and related tooling (Prometheus, Thanos, Cortex, Exporters).
Strong background working across Linux environments at scale.
Knowledge of network observability tools such as NetFlow and syslog.
Experience with automation/configuration management (e.g., Ansible or similar).
Excellent written and spoken English communication skills, with the ability to influence both technical and non-technical stakeholders.

Nice-to-haves:

Leadership experience in observability or infrastructure teams.
Experience monitoring Kubernetes environments.
Exposure to the Influx stack (Telegraf, InfluxDB).
Familiarity with OpenStack environments.

What we offer:

Company equity - a true stake in our journey.
Competitive salary and benefits, including health insurance, lunch benefit, and an annual personal budget (for sport, transport, wellness, or culture).
Flexible working environment.
Opportunity to work with cutting-edge AI technologies.
Career growth within a mission-driven company.

Assessment Process:

1. Introductory chat (45 mins) - Meet with our Talent Partner to learn more about DataCrunch and share your career goals.
2. Technical interview (60 mins) - A deeper discussion of your expertise and technical experience with future colleagues.
3. Final interview (60 mins) - Meet with our CEO, CTO, and wider team.

Apply for this job

We are looking forward to hearing from you!

Thank you for your interest in DataCrunch. Please fill out the following short form. Should you have difficulties with the upload of your data, please send an email to lena@datacrunch.io