Machine Learning Engineer – Cloud & Edge AI

Our client builds software for frontline workers operating in highly variable connectivity environments. They are building an architecture that relies on cloud-hosted LLMs for online inference and local SLMs for strictly offline execution across iOS, Android, and Windows. You will engineer the pipelines to make this a reality.

Form of cooperation

TPP

Deployment

Hybrid

Location

Bratislava (SK)

Salary

from 2500

I am interested in the position

Job description

Deploy and maintain LLMs on Azure and other cloud infrastructures using inference servers like vLLM or TGI to serve connected mobile clients.
Build the initial Retrieval-Augmented Generation (RAG) pipelines in the cloud, integrating with Resco’s backend data synchronization systems.
Transition cloud-proven capabilities to the edge by fine-tuning and quantizing open-weight models (like Llama 3, Phi, or Qwen) for local mobile execution.
Implement routing logic that switches between cloud AI endpoints and local SLM inference based on network availability and task complexity.
Benchmark performance, latency, and hardware utilization across both cloud GPU instances and constrained mobile chips.

Prerequisites and skills

3 years of software engineering or machine learning experience.
Experience deploying LLMs to cloud environments. Familiarity with Azure is a strong plus. You should understand model hosting and API routing.
A working understanding of model quantization (GGUF, AWQ) and the mechanics of shrinking models for local execution.
Strong proficiency in Python for ML pipelines, plus familiarity with containerization (Docker). Exposure to mobile development, particularly within the .NET ecosystem (C#), is a significant advantage for the edge integration phase.
An engineering mindset focused on system architecture. You need to und

You will own the lifecycle of AI features, starting from high-capacity cloud deployments down to heavily constrained edge devices. This role requires solving concrete architectural problems regarding offline synchronization, state management, and memory limits across fundamentally different hardware profiles.