Smerity.com

Most recent update: 21st November 2024 - 02:05:27 - 17874 characters

Python isn't just glue, it's an implicit JIT ecosystem

Python is known as the glue language.

It's not fast.

It's not magical.

And yes, it always has a breaking point.

But that's exactly what makes it special.

You can:

write it in your sleep
import any_magic as you need (batteries included or a global pip install)
conjure up some Frankensteinian FortRust++ library written in MMIX assembly from the dawn of the machine age
execute your code line by line until it (predictably) breaks, leaving you in an interpreter to introspect the ruins

And here's the thing: this turns out to be a remarkably good process for early development!

The combination is what is most profoundly special, captured in the broader picture of how the Python ecosystem evolves and what directs that evolution.

When you write Python code you're not just writing glue, you an explorer in Python's implicit just-in-time compilation ecosystem.

Every time a new found Python code path becomes hot enough, the ecosystem responds by forging a new component from the barest of metals that is then glued into place.

This glue isn't static. It evolves much as a desire path does, based on the patterns of usage across the ecosystem.

Python's role isn't just connecting components - it's discovering which components need to exist.

The Python performance paradox

Python is slow. It is known.

But maybe, just maybe, we might want it that way?

When a Python code path becomes slow enough to matter, something counter-intuitive happens: the ecosystem doesn't optimize the Python, it glues in something else.

Python is slow to run, but fast to experiment with. You act as a scout, finding a new path, and using it enough to show it matters. If it matters you might just find that dirt path already paved by the time you turn around.

This is an emergent optimization strategy that works better than any planning could hope to.

Python, by itself, is the antithesis of premature optimization. It's all about getting something running and only later deciding to make it work faster (if that even matters).

When new paths are found, the focus is on expressivity, ease of use, and simplicity over performance. You're not going to win the performance battle so you don't even try to fight it.

As we hit friction with Python's speed and the capabilities it offers, we dip into the bucket of Fast™ languages rather than reinventing the wheel.

This Pareto optimal API might cover 80% of the necessary use case (hot path) even if only exposing 20% of the fully fledged bare metal component's capabilities but that's perfectly fine.

Rather than reinventing the wheel in your preferred Fast™ language, done out of love rather being sensible, and trying to cover that 80% of necessary cases (you hope), we instead sticky tape in the best solution that you can back off to (which you could extend to full functionality if your needs go further). It doesn't matter if it's written in FortRust++ as the Python API bends itself towards simple and easy to use.

Python continues bouncing along, optimizing for end user capabilities (ease of use, composition, simplicity, ...) rather than underlying magic. Python doesn't get jealous of other languages in trying to steal that hard won library for itself.

The implicit JIT ecosystem made explicit

Whilst Python is my main language and has lived in my head for half my life, it isn't the only language that exhibits this property.

I might argue it's one of the most successful of the implicit JIT ecosystems but there are many other glue languages with their own claims to fame.

We've seen explicit JIT ecosystems forced into being by many startups and big companies. As the startup grew (or was acquired into a BigCo) we see the Death Star of their engineering orgs burn new hot paths.

Python had Instagram (Facebook) and YouTube (Google)
Ruby / Ruby on Rails had GitHub and Shopify
PHP had Facebook

We've seen the same story play out in more recent times as well, with Figma supercharging Javascript with Rust and (separate to any large companies) a lovely small analysis of adding in ever larger sprinklings of Rust into Javascript to improve performance.

The implicit JIT ecosystem holds true for more than just Python but Python might have been the largest winner.

Torch started life in Lua until it became PyTorch. That language shift was motivated by many aspects but speed was definitely not one of them. The most likely overarching narrative is that Python was preferred, for the simplicity and the ecosystem, so other libraries had to find their way to Python or face decreased usage.

Python has been at the forefront of the data science / ML ecosystem. numpy, pandas, pytorch, tensorflow, scikit-learn, ... All of them were already well optimized. Yet they're still seeing further (slow rolled implicit JIT) gains.

numpy to {JAX, PyTorch}
pandas to Polars

At this stage some of the most computationally expensive projects in the world are run in Python, the slow language.

Slow Python code that's important is a structural flaw and the ecosystem works to correct it.

The Rust realization

All of this crystallized in my head recently in writing a Rust tutorial.

The code was written based around a real world use case. For the tutorial however I simplified it a tiny bit.

That real world task I mentioned needed Rust. Yet this tiny simplification was enough to nudge the task from being on a cold path to a hot path in Python's ecosystem.

My love for both languages was not at all diminished. It was about seeing the connection between the two far more clearly than I'd had before.

Rust first took my interest a decade ago starting when I was at CommonCrawl and was somehow seriously considering a single threaded laptop implementation of PageRank for processing our 128 billion edge web graph.

Over the years it became my preferred Fast™ language - the first Fast™ language that felt comfortable to me. I most recently used Rust to write a robots.txt processing library that was battle tested against 34 million robots.txt files and is now used by at least one unicorn.

It clicks into Python perfectly. Beyond perfectly even.

I mentioned on Reddit that I missed Cython's low overhead auto-compiling where if you import primes it'll search for prime.pyx and auto-compile + cache before import. Three hours later, messense created a ticket. A short while later the Maturin Import Hook allowing you to import stand-alone Rust files.

If you put these files in a folder (after installing Rust and pip install maturin maturin_import_hook) you can just run python fib.py and everything works.

### rfib.rs
use pyo3::prelude::*;

#[pyfunction]
fn fib(x: usize) -> usize {
    match x {
        0 => 0,
        1 => 1,
        x => fib(x - 1) + fib(x - 2),
    }
}

#[pymodule]
fn rfib(m: &Bound<'_, PyModule>) -> PyResult<()> {
    m.add_wrapped(wrap_pyfunction!(fib))?;
    Ok(())
}

### fib.py
import maturin_import_hook
maturin_import_hook.install()

from rfib import fib

print([fib(x) for x in range(0, 16)])

Thanks to this new found love I might have overindexed on Python a bit.

Infrastructure (texting_robots) and heavy projects.

Yet when I went to write a tutorial for Rust ... the Python implementation was almost there!

I'd made the mistake of forgetting Python's implicit JIT ecosystem didn't exist - or that I was special enough to assume where I was going had never been tread on and that my solution would be faster.

Python for more than half my life, Rust for the last few, and even though I love Python I've realized I want to shuffle away from it not as it's bad, but it's the bare metal I was missing for Python
Tutorial that I was translating from real world Big Data that hit the hot path in Python so was damned close to Rust
- My internal (can't write about it) use case required a few extra twists and turns, so Rust made sense, but the second I fell back to the Pareto optimal path Python won in most cases

The glue language is the LLM language

The next era of interest to me, in regards to Python, is the interaction with LLMs.

Python was optimized to be concise, forgiving, and (relatively) simple for humans - which is the exact same needs for LLMs.

For most tasks you likely don't want anything beyond three lines (import, setup, execution) if you can help it. Python is optimally placed for that, with no overhead to play, a full ecosystem of interoperable components to thread together, and a set of APIs built to be forgivingly simple to a wayward programmer.

The Python interpreter itself becomes a powerful environment for self play, suggesting Python's role as a glue language might be more central to the age of AI than giving you a hot path to run the matrix multiplications.

The circle completes itself - the implicit JIT ecosystem gets a true JIT

After decades of being the catalyst for an implicit JIT ecosystem, Python is finally experimenting with its own JIT in 3.13.

It's a fitting evolution. That slowness we once cursed might well have been a necessary catalyst for the ecosystem to build hot paths that Python alone would never have been capable of.

Python's performance constraints didn't just create an ecosystem of optimized components, they allowed us to see exactly what paths needed to be optimized, no more and no less. Sometimes the best path forward is by going both fast and slow.

Python's true strength isn't just in being a glue language - it's in being the scout that finds and shapes what needs to be glued. Its perceived weaknesses - being slow, interpretive, and flexible - are actually features that drive ecosystem-wide optimization. When Python code becomes hot enough, the community doesn't optimize the Python - they create and integrate blazingly fast components in languages like Rust, keeping Python's interface clean and simple. This organic, usage-driven evolution has made Python the backbone of some of computing's most demanding tasks, from processing web-scale data to training massive ML models. As we enter the age of AI and LLMs, Python's role as an exploratory glue language positions it perfectly for the next wave of innovation: where being human-friendly and LLM-friendly might just be the same thing.

Python's genius lies not in its speed but in its role as an ecosystem catalyst. Its "slowness" drives the creation of optimized components exactly where they're needed, while its simplicity keeps those components accessible and composable. As we move into the era of AI, this balance of human-friendly interfaces and ecosystem-driven optimization may prove to be Python's greatest strength yet.

Full notes: