Image

Blog

Inspiration for more impact

Evaluating 16 Years of an EU-funded programme with AI


Image

Eduardo Salvador

October 28, 2025
5 minute read

Turning 30,000 Documents into an Evaluation Report: How we used AI to evaluate 16 years of an EU funded programme

The challenge

For sixteen years, the EU’s Active & Assisted Living (AAL) Programme invested nearly €1 billion in research and innovation to improve the lives of older adults. With over 300 projects, 2,000 SMEs, and tens of thousands of end-users involved, it left behind a vast legacy of knowledge, but also a significative challenge: how to evaluate it all?

Traditional methods (manual reviews, expert panels, selective case studies) simply could not cope with the sheer scale of the programme’s documentation. More than 30,000 documents – proposals, reports, reviews, annexes – spanning over a decade of work made it practically impossible to draw systematic, evidence-based conclusions without missing key insights.

That’s where our approach came in. By combining AI methodologies with human expertise, we were able to transform this massive corpus into a structured knowledge base – and from there, generate a comprehensive evaluation report that not only captures what AAL achieved, but also distils lessons that future large-scale programmes can build upon.

A New way of performing Evaluation of Public Policy

What makes this work innovative is not just the use of AI, but how we used it. We did not simply throw documents into a large language model and ask it for a summary. That approach, while tempting, is both risky and unreliable- especially when dealing with long, complex data. Recent research (such as Chroma’s Context Rot) shows that performance actually degrades as context windows grow, with models losing track of details or being distracted by irrelevant facts.

Instead, we focused on better context, not bigger context. Our methodology was designed around five principles:

1) Step-by-step transformation of raw data:

  • Convert files into usable formats.
  • Classify them by type and by project.
  • Segment documents into meaningful sections.
  • Extract indicators (technologies used, end-user involvement, outcomes, etc.).

2)  Dynamic creation of categories:

When no predefined taxonomy existed – for example, for technologies or solution types – we let the AI propose candidate classifications based on observed patterns. Experts then refined and validated these categories. This way, the data “spoke for itself,” but the analysis still followed a meaningful, domain-relevant rationale.

3)  Iterative loop with evaluation experts:

AI was never left to operate in isolation. Each cycle of analysis was reviewed by experts, who validated indicators, adjusted taxonomies, and ensured results aligned with the programme’s goals. This human – AI collaboration turned raw outputs into trusted insights.

4)  Structured, queryable database:

We converted the entire corpus into a structured, queryable knowledge base . From there, LLMs were used to extract facts and generate consistent indicators that write back to a database. The final product is a reusable data asset that powers analysis and reporting.

5)  Confidentiality by design:

Crucially, all of this was done with open-source AI models in a controlled environment. No sensitive data was ever shared with proprietary systems like ChatGPT. This ensures full confidentiality and compliance – a decisive advantage for governments, research agencies, and organisations handling sensitive or proprietary material.

Why This Matters

This project shows a path forward for evaluating large-scale programmes, where complexity and volume have long been barriers. Instead of cherry-picking a handful of projects or drowning in paperwork, we can now systematically process entire portfolios and extract structured evidence.

More importantly, the methodology demonstrates that the key to AI-assisted evaluation is not automation alone, but orchestration:

  • Knowing what data to include,
  • How to structure it,
  • And how to keep the model focused on the right questions.

The combination of AI expertise, domain and evaluation expertise is what makes this possible. AI can scale the analysis, but only experts can decide which indicators matter, which taxonomies make sense, and how to interpret findings in their policy and societal context.

What this unlocked for AAL

By turning ~30,000 files into a structured evidence base, the evaluation shed light to the evolution of this EU-funded programme. We could see, across 16 years, which societal challenges were actually being tackled (from social isolation to functional decline and caregiver burden), how the technological mix evolved (sensors to cloud-and-ML ecosystems), and how solution archetypes combined in practice (monitoring + coaching + smart-home services). Because indicators were extracted consistently across projects and calls, we could quantify shifts over time.

The approach also made end-user involvement visible and comparable. We tracked how participation patterns changed for older adults, caregivers, and payers; which call designs increased genuine co-creation; and where ethical or operational constraints depressed engagement. This allowed AAL to identify what actually drives better user participation (e.g., living-lab loops, framing around autonomy vs. risk) and to isolate pitfalls (e.g., one-off surveys, late GDPR retrofits).

At ecosystem level, the analysis mapped the network and stakeholder evolution: the rise of SME leadership, the growing presence of payers and public authorities, and the conditions under which regional uptake happens (open interoperability, modular architectures, and early regulatory planning). It analyzed the impact of the programme’s strategic pivot on proposal quality, solution integration, technology choices, and time-to-contract, showing how governance changes translated into on-the-ground results.

The result is not just a retrospective: it’s a playbook for future programmes, with concrete guidance on management instruments, support actions, etc.

Finally, for policymakers considering a similar evaluation, this work shows that AI plus expert oversight can deliver an end-to-end picture – spanning challenges, technologies, user engagement, ecosystem maturation, and strategic governance – without drowning in documentation.

Beyond AAL

While this study was unique in scope -16 years, 300 projects, 30,000 documents- the lessons go far beyond ageing and care technologies. Any large-scale research and innovation programme leaves behind complex documentation that is difficult to evaluate. Our methodology shows how to turn such data into structured evidence, balancing automation with human judgment.

In short:

  • AI can scale evaluation, but only if paired with structured data pipelines and expert validation.
  • Bigger context windows are not the answer – better context is.
  • Dynamic classification lets data reveal new insights, rather than forcing pre-conceived frameworks.
  • Human–AI collaboration is the key to trustworthy, policy-relevant results.

“The AAL Legacy Study was a demonstration of what the future of evaluation looks like: faster, deeper, more systematic”

Outcomes of this work