Workplace Image craftworks GmbH

Our Journey towards productionizing AI

Description

Ciarán Baselmans von craftworks erzählt in seinem devjobs.at TechTalk über die Reise des Unternehmens und wie sich ihre AI Produkte über die Zeit entwickelt haben.

By playing the video, you agree to data transfer to YouTube and acknowledge the privacy policy.

Video Summary

In "Our Journey towards productionizing AI," Ciarán Baselmans traces Craftworks’ shift from web app development (Java/Spring, Angular, PostgreSQL, Jenkins/Docker) to industrial AI using Python, Spark/Pandas, TensorFlow/PyTorch, and MLflow, with examples like virtual sensors for pellet weight and rail cargo ETA predictions at 98% confidence. He outlines why 90% of models never reach production—infrastructure, access control/least‑privilege, out‑of‑distribution and drift monitoring, deployment simplicity, and integration—and presents Navio, a unified MLOps platform with one‑click deployment, UI‑based management, on‑prem/cloud, and deployments to TT Tech Industrial Nerf devices on the shop floor. Viewers can apply practical patterns to bridge PoCs to production: package models with MLflow, enforce least‑privilege access, monitor drift/outliers, prioritize simple deployments and integrations, and empower domain experts via a UI‑centric workflow.

From Web Apps to MLOps: Technical Takeaways from “Our Journey towards productionizing AI” by Ciarán Baselmans (craftworks GmbH)

Why this session matters for engineers

In “Our Journey towards productionizing AI,” Ciarán Baselmans, Product Owner and Software Developer at craftworks GmbH, walks through how a traditional software shop evolved into a team that not only builds machine‑learning solutions but also puts them into reliable production. The talk is intentionally not overly technical, and that is precisely why it’s useful to engineering teams: it surfaces the operational backbone, the architectural choices, and the day‑to‑day practices that decide whether an ML proof of concept (PoC) becomes business value—or gets abandoned.

Below is our DevJobs.at editorial recap focused on the technical narrative, the toolchain, and the lessons we think engineers can apply directly, grounded strictly in what the speaker covered.

The beginning: 2014, values, and a pragmatic software stack

Ciarán anchors the story in 2014—the year of the ice‑bucket challenge and the sunset of a beloved operating system. In Vienna, friends Simon and Jakob, both computer science graduates, realized their passion for IT wasn’t matched by their employers. Bureaucracy stifled creativity, and the relationship felt “purely transactional.”

Together with a friend, they founded Craftworks as a traditional software company with a clear vision: build “a friendly and professional workplace for developers, by developers.” Work started in an apartment and the first customer project was a Transport Management System (TMS) handling API‑based data exchange, customer management, and logistics.

The baseline software stack: Java, Spring Boot, Angular, PostgreSQL, Jenkins, Docker

That early project effectively set a stack that still underpins their software work:

  • Backend: Java (today, Spring/Spring Boot)
  • Frontend: Angular
  • Database: PostgreSQL
  • CI and pipelines: Jenkins
  • Containerization: Docker

As word spread, demand grew. The team expanded with sales support and more frontend and backend engineers. But growth also triggered a deeper shift: a willingness to change.

“To improve is to change; to be perfect is to change often.” Ciarán quoted Churchill not as a claim of perfection, but as a reminder that software demands continual adaptation.

A turn toward AI: augmenting software with machine learning

Innovation and exploration were already part of the culture. With in‑house ML experience, craftworks hypothesized that customers could benefit from going beyond conventional software and applying AI/ML to improve processes and systems. They invested in data science capabilities and hired a dedicated team.

The data science toolchain: Python, Spark, Pandas, TensorFlow, PyTorch, MLflow

For data‑centric work, they standardized on:

  • Python for development
  • Apache Spark and Pandas for analysis and exploration
  • TensorFlow and PyTorch as deep learning frameworks
  • MLflow for experiment tracking and packaging

Industrial use cases—what it looks like in practice

The focus spans several industrial scenarios:

  • Visual inspection
  • Predictive maintenance
  • Predictive quality

One predictive quality example: predicting the weight of plastic pellets going through an extruder. The goal was a “virtual sensor”—not a physical scale, but a model that estimates weight to avoid the high cost of industrial sensors when rolling out at scale.

Returning to the Transport Management context, the team used its ML know‑how to go beyond consuming scheduled arrival times. They predicted Estimated Time of Arrival (ETA) themselves. For rail cargo shipments, craftworks trained a model that, as Ciarán described it, could “accurately predict the estimated time of arrival with 98% degree of confidence.”

The PoC‑to‑production gap: a widespread pain point

After running successful experiments, the hard part often begins: getting models into production. Ciarán pointed out the frustration when efforts don’t make it past the PoC. The business doesn’t see results being used, and data scientists don’t see their work deliver value. As he put it: “In fact, 90% of machine learning models never make it into production.”

The takeaway is clear: the bottleneck isn’t just modeling. It’s everything around the model.

Five hurdles to productionizing AI—what they mean technically

Ciarán laid out five recurring challenges. We map them to practical engineering implications.

1) Infrastructure

Many industrial customers aren’t large software players. They may lack the infrastructure and the people to maintain it. Yet production systems must be robust and highly available. Implications include:

  • Clear operating targets (on‑premises, cloud, or mixed)
  • Containerization to standardize runtime environments (Docker appears early in their history)
  • Repeatable builds and pipelines (Jenkins underpins CI)

2) Access management

Production ML handles sensitive data. Separation of roles and least‑privilege access are essential, both for data security and to prevent accidental, large‑scale failures. Practically, this means:

  • Well‑defined roles (data scientists, domain experts, operations)
  • Isolated environments and controlled interfaces
  • Auditable changes and access

3) Monitoring

Running models must be monitored functionally, not just for uptime:

  • Out‑of‑Distribution detection (OOD): Incoming data differs from training data. Root causes can be obvious (a malfunctioning sensor) or subtle.
  • Data drift: Over time, distributions shift; performance degrades; retraining becomes necessary. As Ciarán highlighted: a model is only as good as the data it was trained on.

Engineering takeaway: define signals for data and model behavior, trigger alerts, and prepare a retraining path.

4) Deployment

“It’s got to be as easy to deploy as possible.” Most solutions are custom; there’s no universal one‑stop shop. Yet handover may go to people without deep technical expertise. Consequence: consistent packaging of models, automated rollout mechanics, and a simple user interface.

5) Integration

Each new model must plug into existing systems without re‑inventing integration every time. That means aligning with current protocols and shaping inputs/outputs so that downstream systems can consume the predictions reliably.

craftworks’ platform response: Navio for MLOps

These hurdles led to Navio: “a unified MLOps platform that simplifies the serving, monitoring, and management of machine learning models.” MLOps here is explicitly the fusion of ML and DevOps to support operations.

Navio aims to bridge the gap between experiments and real business value. Key points Ciarán emphasized:

Principles and features

  • One‑click deployment: “Very, very easy” to spin up containers, with a UI for everything so non‑super‑technical users can manage trained and uploaded models.
  • Scalability: Rollouts across many shop floors.
  • Simplicity: Domain experts are not necessarily tech‑savvy; the interface must be obvious.
  • User‑centric operations: Offered on‑premises and in the cloud. Plus, a partnership with TT Tech Industrial and their “Nerf hardware devices” enables deployment directly onto shop‑floor hardware—predictions where data is produced.

Architectural flow: training to serving

Ciarán described the workflow as follows:

1) Data scientists train models in their preferred frameworks.

2) Models are packaged with MLflow to keep supporting different deep learning libraries.

3) Upload to Navio.

4) A domain expert—often on the customer side—takes over to manage and monitor the model.

5) Integration from Navio into a third‑party application, a machine, or a mobile phone.

The architectural idea is a clean separation of concerns: data scientists work in their favored tooling; MLflow standardizes the artifact; Navio handles serving, monitoring, and integration; domain experts operate the model without having to be the ones who trained it.

Practical engineering patterns you can apply

Grounded in the session, several patterns stand out for teams moving toward productionized ML.

1) Standardize your artifacts

A consistent packaging format is a prerequisite for automation. MLflow, as used here, lets teams track experiments and produce a deployment‑ready artifact. Without this step, reproducible serving workflows are brittle.

2) Design for roles and interfaces

Ciarán repeatedly highlights domain experts as operators. That translates to:

  • A UI first, where appropriate, not a CLI
  • Strict separation of privileges to prevent unintentional system‑wide changes
  • Responsibility handover between model building and model operations

3) Treat monitoring as part of the product

OOD detection and drift monitoring aren’t add‑ons; they’re essential for production. Define metrics, thresholds, and escalation paths—including retraining—early.

4) Match infrastructure to reality

Industrial environments often require on‑premises options and low downtime tolerance. Flexibility across cloud and on‑premises, and the ability to execute close to where data is produced (shop floor devices), expands feasibility.

5) Make integration the primary constraint

Predictions only create value once they’re consumed. Prioritize compatibility with existing protocols and data flows; otherwise, every new model risks turning into a bespoke integration project.

The case studies revisited—how value is created

ETA prediction for rail cargo

Augmenting a Transport Management System with model‑driven ETA predictions shows how ML can improve planning without rewriting the core system. Ciarán’s reported outcome—“98% degree of confidence”—emphasizes that high‑quality prediction is possible. The deeper lesson: creating value hinges on putting the prediction into the planning loop.

Virtual sensor in the extruder

Estimating pellet weight illustrates the “virtual sensor” concept: using existing signals and a model to infer a costly measurement at scale. Key learnings include the need for outlier detection (faulty sensors) and the advantage of software‑based scalability.

Why PoCs stall—and how this session addresses the gap

Ciarán’s analysis is straightforward: many ML efforts fail to reach production because operational needs weren’t accounted for from the start. His five‑part checklist (infrastructure, access, monitoring, deployment, integration) reframes the problem: the hard part is not training a model but running it as a dependable service.

Navio is presented as the response—standardized packaging via MLflow, one‑click deployments, user‑centric management, scalable rollouts, and flexible hosting. The message we took away: in production ML, the platform and processes around the model are as important as the model itself.

Culture as an enabler: empowerment and change

Threaded through the story is a cultural stance: empower employees and maintain a friendly, open workplace where people have “the freedom to grow and change.” Ciarán himself moved from software development into a dual role including product ownership of Navio. That kind of internal mobility often underpins the sustained investment needed to build and operate MLOps capabilities.

Field‑tested guidelines for teams on a similar path

Staying within the scope of the talk, here are distilled guidelines engineers can act on:

  • Build on a stable software foundation (e.g., Java/Spring Boot, Angular, PostgreSQL, Jenkins, Docker) and extend it with a focused data stack (Python, Spark, Pandas, TensorFlow/PyTorch, MLflow).
  • Use MLflow (or an equivalent) to standardize model artifacts; without it, deployments and integrations multiply in complexity.
  • Offer on‑premises and cloud operation modes where required; deploy close to data sources when needed (shop‑floor devices).
  • Embed role separation and least‑privilege into the platform; provide a UI for domain experts to operate models safely.
  • Make OOD and drift monitoring part of the product from day one; define alerting and retraining flows.
  • Treat integration as a first‑class constraint; design inputs/outputs and protocols for the consuming systems you already have.

Closing thoughts: building the bridge from prototype to production

“Our Journey towards productionizing AI” by Ciarán Baselmans (craftworks GmbH) outlines a pragmatic route from software projects to production ML. The individual technologies are familiar; what matters is how they’re combined into a reliable, user‑centric MLOps process. According to the session, Navio’s core contributions are one‑click container deployments, an accessible UI, scalability across shop floors, flexible on‑prem/cloud options, and MLflow‑based packaging that supports heterogeneous frameworks.

Ciarán ended with an invitation: “Will you join us on this journey?” From an engineering perspective, the journey is about removing the operational barriers that stop models from creating value. Tackle infrastructure, access, monitoring, deployment, and integration with the same rigor you apply to model development, and the odds of making it past the PoC rise dramatically.