Crayon
Fair AI
Description
Annalisa Cadonna von Crayon redet in ihrem devjobs.at TechTalk über den unachtsamen Umgang mit AI und welche Verantwortung bei Data Scientists liegt um faire AI Modelle zu schaffen.
By playing the video, you agree to data transfer to YouTube and acknowledge the privacy policy.
Video Summary
In Fair AI, Annalisa Cadonna (Crayon) explains the societal, technical, monitoring, and legal challenges of bias in AI, illustrated by cases from employment services and a COVID-19 grading algorithm that disadvantaged students and was ultimately discarded. She outlines a practical workflow: identify discrimination risks with stakeholders, audit data and models, mitigate via pre-/in-/post-processing, and continuously monitor fairness metrics alongside AUC/precision/recall. Cadonna defines demographic parity and error-balance metrics and highlights IBM AIF360 and Microsoft Fairlearn, enabling teams to build and monitor fairer models, increase trust, and prepare for future legal requirements.
Fair AI at Crayon: Annalisa Cadonna’s blueprint for detecting, measuring, and mitigating bias in real AI systems
Why Fair AI is now an engineering problem
In “Fair AI,” Annalisa Cadonna (Senior Data Scientist, Crayon) presents a grounded, engineering-first approach to fairness: identify how bias enters AI systems, quantify it with the right metrics, and mitigate it with practical techniques and tools. The talk connects the rapid expansion of AI—fueled by abundant data, accessible cloud compute, and breakthroughs in NLP and computer vision—to a simple reality: if AI shapes everyday decisions, engineering teams carry responsibility for transparency, measurement, and informed trade-offs.
Cadonna’s framing is pragmatic. AI is present in daily life—from voice assistants to mobile banking apps—and on both corporate and public agendas. That reach raises the stakes. Fairness cannot be a late-stage add-on; it must be an end-to-end concern spanning data, modeling, deployment, and operations.
Core idea: Fairness is a lifecycle commitment—before, during, and after model development—combining technical methods, stakeholder dialogue, continuous monitoring, and legal awareness.
Two case studies: When historical data amplifies inequity
Cadonna opens with two examples that illustrate a common failure pattern: models trained on historical data can absorb existing inequities and then amplify them at scale.
1) Public employment service in Ostia (RIMS)
A public employment service in Ostia (“RIMS”) used a logistic regression model to estimate the probability that a job seeker would be successfully placed. Based on the predicted probability, job seekers were grouped for resource allocation: those with high probability were assumed to need minimal help, those with low probability were considered not worth intensive support, and the middle group received most resources.
The problem: the model was trained on historical data, inheriting societal biases. Groups that had historically faced disadvantage appeared as less likely to succeed in the model, even when other characteristics were comparable. Cadonna mentions, for instance, women, older individuals, or non-EU citizens. The algorithm effectively amplified historical bias—exactly what fairness-by-design aims to avoid.
2) Algorithmic grading during COVID-19
When in-person exams were not possible during the pandemic, an algorithm was introduced to predict final school grades. Inputs included prior student performance and data about the attended school. The outcome: students from public schools—especially in disadvantaged districts—were assigned lower grades than their teachers had predicted, while students from “good” or private schools were systematically advantaged. Within the European Union, this algorithm was ultimately not used.
These cases demonstrate the mechanism behind algorithmic unfairness: models learn from historical patterns, including problematic ones. Left unchecked, large-scale or public-facing applications can entrench inequities and impose real-world harm.
Four challenges: Societal, technical, monitoring, legal
Cadonna organizes the practical hurdles into four dimensions. For engineering teams, this reads like a blueprint for Fair-AI delivery.
1) Societal
- Understand how AI decisions affect people’s lives.
- Engage stakeholders: Who are the users? Who is indirectly affected? Which groups bear the risk of error?
- Surface context: Which historical or structural biases might be present in the data?
2) Technical
- Audit data: Is the training set imbalanced or historically biased?
- Analyze model behavior: Does the model amplify biases? What do the metrics say?
- Accept trade-offs: Improving fairness often reduces headline performance (e.g., AUC, precision, recall). Navigating this is an explicit design choice, not an afterthought.
3) Monitoring
- Fairness metrics belong in continuous monitoring—on par with business and performance KPIs.
- Concept and data drift affect fairness too. A model may launch “fair” and become skewed in production; monitoring catches these shifts.
4) Legal
- The more AI shapes individual opportunities, the tighter the regulatory frame.
- Regional and global rules matter. Cadonna points to the European Commission’s AI regulation proposal released in April. Teams that embed fairness are better prepared.
Crayon’s approach: Fair by Design across the AI lifecycle
Crayon collaborates with universities and companies in a project called “Fair by Design,” sponsored by the Austrian innovation agency FFG. The goal: translate state-of-the-art research into practical Fair-AI solutions—before, during, and after model training.
Cadonna outlines four core steps:
1) Identify potential for discrimination
- Maintain ongoing communication with business and stakeholders.
- Ask concrete questions: Where do decisions impact people? Which attributes could define sensitive groups? Where might proxies induce indirect discrimination?
2) Check data and models for fairness
- Data review: representativeness, balance, quality, and historical context.
- Fairness auditing: evaluate fairness metrics alongside classical performance metrics.
3) Ensure fairness (bias mitigation)
- Preprocessing: adjust or rebalance data to address structural skew.
- In-processing: incorporate fairness goals into training (e.g., parameter tuning).
- Post-processing: correct outputs to meet fairness criteria.
4) Monitor fairness metrics over time
- Data shifts and model aging are real; fairness must be monitored just like accuracy or business KPIs.
Measuring fairness: Demographic parity and error parity
Which metrics should drive engineering decisions? Cadonna presents two families commonly used in practice.
Demographic parity
- Consider a binary classification with a positive and a non-positive outcome (e.g., “grant” vs. “do not grant”).
- Demographic parity requires that different groups receive the same proportion of positive outcomes—regardless of error distributions.
- Relevance: If access or opportunity should be spread evenly, parity is intuitive. But it’s not a quality metric per se because it ignores errors.
Performance and balance metrics (error parity)
- These metrics require that errors across groups not diverge significantly. In other words, group-wise error rates should be aligned.
- Relevance: In many scenarios—Cadonna mentions computer vision—error equality can be more important than equal positive rates.
The practical takeaway: put a set of fairness metrics on the dashboard—next to AUC, precision, and recall—to make trade-offs explicit and decisions accountable.
Tools: Auditing vs. mitigation with AIF360 and Fairlearn
The ecosystem splits into two tool classes:
- Auditing libraries: compute fairness metrics on existing models to reveal imbalances.
- Mitigation libraries: offer algorithms for pre-, in-, and post-processing.
Cadonna highlights two widely used mitigation packages with active communities:
- AIF360 (IBM)
- Fairlearn (Microsoft)
These tools make fairness goals operational in training and evaluation. Still, they complement—not replace—the conceptual work of defining objectives, metrics, and processes.
Why fairness pays off: Trust, social benefit, and regulatory readiness
Cadonna maps out three benefits:
- Trust: Transparency and measurable fairness build user and public confidence, easing AI adoption.
- Positive social return: Fair systems avoid disadvantaging groups and can reduce inequities—especially in public or large-scale settings.
- Regulation readiness: Teams that integrate fairness are better positioned for evolving legal requirements. Cadonna references the European Commission’s proposal released in April as a signpost.
Responsibilities for data scientists: Speak up and educate
Cadonna emphasizes the role of data scientists in shaping how data and AI are applied. Technical teams are often asked for input—or even the final say—on data use. With that comes responsibility:
- Raise potential discrimination: “We cannot remain silent.” If risks are visible, bring them into the conversation.
- Educate the business: Add fairness metrics to the same review as business KPIs and technical metrics. Equip decision-makers with a shared language for trade-offs.
- Stay up to date: Follow evolving regulation and the fairness community; the field moves quickly.
Practical guidance for engineering teams
Based on the session, here is a compact, engineering-oriented checklist that integrates directly into typical MLOps flows.
1) Embed fairness early in the backlog
- Formulate project objectives with sensitive dimensions in mind.
- Plan stakeholder workshops: Where do decisions affect people? Which groups are most exposed to errors?
2) Build auditable data pipelines
- Group-aware profiling: check representativeness and data quality across groups.
- Document historical context: identify where you expect proxy effects or structural skew.
3) Define fairness metrics alongside performance KPIs
- Choose between demographic parity and error parity according to the use case.
- Set measurement and reporting cadences: fairness belongs on the same dashboards as AUC, precision, and recall.
4) Implement bias mitigation in stages
- Preprocessing: rebalance or transform data when foundations are skewed.
- In-processing: integrate fairness goals during training (e.g., via parameter tuning).
- Post-processing: calibrate outputs pre-launch if parity or error rates are off.
5) Continuous monitoring and alerting
- Add drift detection for fairness metrics.
- Schedule regular re-evaluations: models and data evolve; fairness must keep pace.
6) Account for legal requirements
- Depending on the domain (labor, education, public services, finance), obligations differ.
- Early coordination with compliance reduces rework.
Conclusion: Fairness as a continuous engineering practice
“Fair AI” by Annalisa Cadonna (Crayon) reads as a practical plan rather than a moral appeal. The examples—RIMS in Ostia and algorithmic grading during COVID-19—make clear how quickly AI can reinforce inequities. The countermeasure is a disciplined process: identify discrimination risks, audit data and models, mitigate bias across pre/in/post-processing, and monitor fairness metrics continuously.
Surround this with strong stakeholder communication, a cross-functional understanding of fairness vs. performance trade-offs, and active attention to research, tools, and regulation. Crayon’s guiding belief—that technology should drive the better—frames the goal. Turning that into reality demands that engineering teams make fairness measurable, actionable, and part of everyday operations.