The Demo Culture is Ruining ML... But It Doesn't Have To

Machine learning demos should drive real AI progress, not just hype—learn how to create iterative, production-ready ML systems that truly work.

Feb 03, 2025

Due to increasing demand, the regular office hours are now 10 minute-long appointments. Premium office hours are still 15 minute slots. If you are a subscriber, THANK YOU! Also please feel free to schedule a time for us to talk, utilizing the free or premium office hours link in this email.

Starting next Monday, MLE Path newsletter will come out at 9am Pacific instead of 3pm.

The Problem with Demo Culture

We need to talk. The way our industry approaches machine learning demos is broken. Instead of being a tool for measuring progress and guiding development, demos have become a way to mask a lack of real results. This culture of flashy but misleading demos is actively harming ML research and production. But it doesn’t have to be this way.

Let me share a real example.

The ML Team That Shipped Nothing for Three Years

At one company I was brought into, the ML team had gone three full years without shipping a single feature. How was that possible? There were two main reasons:

Nepotism: The team members were protected, which meant they weren’t under real pressure to deliver anything of value.
Fake Demos: Whenever leadership or stakeholders started asking questions, the team quickly cobbled together a demo that appeared impressive but was mostly smoke and mirrors.

I challenged them to present the pipeline they claimed to have built over the past year. When they finally showed it, I discovered something shocking: they had committed NVIDIA's publicly available example code to Git just a week before the demo—right after I asked them to show their work. In other words, they were faking progress to avoid accountability.

This isn’t an isolated case. Many ML teams—at both startups and large organizations—fall into the same trap. And it extends beyond individual teams to the industry at large.

The OpenAI Effect: Hype Over Substance

At the highest levels, companies like OpenAI and other AI labs frequently release impressive demos to generate excitement. These demos often show cutting-edge capabilities, capturing headlines and fueling hype cycles. But once the excitement fades, we typically receive a model that is only marginally better than its predecessor.

This isn’t to say OpenAI and similar organizations aren’t doing real work—clearly, they are. But their public demos primarily serve as marketing tools, designed to attract attention and investment rather than to drive genuine innovation. The problem is that many other companies misunderstand this distinction. They see these high-profile demos and mistakenly believe that demos should be the primary vehicle for measuring ML progress.

So, should we ban demos altogether?

Why We Still Need Demos

Not so fast! Despite their misuse, demos can be incredibly valuable when done correctly.

Machine Learning is Iterative

As Andrew Ng once said, "Artificial intelligence is the new electricity." This highlights AI's potential to transform industries, but that transformation happens incrementally. ML is an inherently iterative field—models improve through cycles of experimentation, evaluation, and refinement. Demos, when used properly, help us track that progress in a tangible way.

Demos as Progress Indicators

A well-constructed demo provides a snapshot of where the system currently stands. It helps answer questions like:

Is the model improving in meaningful ways?
Are we tackling the right challenges?
Do we understand the limitations of our current approach?

Unlike fake demos that exist purely to impress, good demos serve as checkpoints in an ongoing development process. They highlight real progress and expose areas that need further work.

What Makes a "Good" Demo?

The Concept of Demo Cascades

I think about demos in terms of demo cascades. Just as cross-entropy loss is difficult to evaluate in absolute terms (because it’s unbounded) but excellent for comparing models against each other, demos should be designed to compare iterations of a system.

Each new demo should build on the last. A few examples:

Bad demo cascade: Last week we could detect cats, and this week we can detect cats and strollers. (Adding an entirely new class suddenly is usually a red flag.)
Good demo cascade: Last week we could detect cats during the day, and this week we can detect cats at night. (We addressed an edge case, improving robustness.)

Converging Toward Production Constraints

Good demo cascades naturally move toward real-world deployment. Each new demo should take into account practical constraints such as:

Memory usage
Latency requirements
Throughput capacity

If your early demos run on massive, unrealistic hardware setups, but later iterations gradually move toward a deployable system, you’re on the right track. If your demos continue to be completely disconnected from production needs, you’re just making cool PowerPoints, not real progress.

Demos Should Reduce Uncertainty

Focus on the Biggest Unknowns

Each successive demo should tackle the aspect of the system that is the most uncertain. The goal is to reduce uncertainty over time. If you already know how to solve a problem, it’s fine to use a placeholder or facade. But if you’re unsure whether a technique will even work, you need to validate it as soon as possible.

A strong demo culture builds trust by:

Defining a limited scope for each iteration
Delivering on that scope
Identifying the next most uncertain piece to focus on

If you instead chase new shiny features arbitrarily, your demos will feel impressive in the short term but lead nowhere long term.

Actionable Checklist for Meaningful Demos

Want to build a strong demo culture? Here’s a checklist to keep your team honest:

✅ Define clear objectives: What specific challenge is this demo meant to address?

✅ Build incrementally: Each new demo should refine or extend the last one.

✅ Respect production constraints: Make sure the demo’s improvements align with real-world deployment needs.

✅ Maintain a focused scope: Don’t chase distractions—target the highest uncertainty first.

✅ Be transparent: Make it clear what the demo does (and does not do).

✅ Engage users early: Get real-world feedback as soon as possible.

✅ Use demos to test, not sell: The goal is to build better ML systems, not just impress stakeholders.

Demos Should Build, Not Sell

Demos can be great. But we need to stop using them as marketing tricks to “sell” machine learning progress that doesn’t actually exist. Instead, we should use them to systematically build better ML systems.

The key is to make demos real. That means designing them to:

Expose weaknesses instead of hiding them
Reduce uncertainty over time
Fit within the constraints of an actual production system

And most importantly, we need to put them in the hands of real users as early as possible. The only way to build ML systems that truly work in the wild is to test them under real-world conditions. That means the demos need to be somewhat real from the start.

If we shift our mindset from “impressive” to “informative,” we can reclaim demos as a useful tool instead of a misleading spectacle.

Let’s stop selling ML with demos and start building ML with them instead.

MLE Path