Python isn't just glue, it's an implicit JIT ecosystem
Python by default is known as the glue language. It's not fast. It's not magical. And it always has a breaking point.
Yet there is something special to that.
You can write it in your sleep, import any_magic
as you need (batteries included or a global pip
instal), and conjure up some Frankensteinian FortRust++ library written in MMIX assembly at the dawn of the machine age to do that weird thing you need, finally stepping line by line through your code as it breaks horrifically, revealing that you are not, in fact, perfect.
This turns out to be a reasonably good process for most of what we do!
There are aspects of "special" already embedded in the process above - but the part that seems most profoundly special is in the broader picture of how the Python ecosystem evolves.
When you write Python code you're not just writing glue, you an explorer in Python's implicit just-in-time compilation ecosystem.
Every time a new found Python code path becomes hot enough, the ecosystem responds by forging a new component from the barest of metals that is then glued into place.
This glue isn't static. It evolves much as a desire path does, based on the patterns of usage across the ecosystem.
Python's role isn't just connecting components - it's discovering which components need to exist.
The Python performance paradox
- Introduce Python slow, slow being the catalyst to glue fast bare metal components
Python is slow. It is known.
But maybe, just maybe, we might want it that way?
When a Python code path becomes slow enough to matter, something counter-intuitive happens: the ecosystem doesn't optimize the Python, it glues in something else.
Python is slow to run, but fast to experiment with. You act as a scout, finding a new path, and using it enough to show it matters. If it matters you might just find that dirt path already paved by the time you turn around.
This is an emergent optimization strategy that works better than any planning could hope to.
Python, by itself, is the antithesis of premature optimization. It's all about exploring, getting something running, and then later deciding to make it work faster (if that even turns out to matter).
When new paths are found, the focus is on expressivity, ease of use, and simplicity over performance. You're not going to win the performance battle so you don't even try to fight it.
As we hit friction with Python's speed and the capabilities it offers, we dip into our bucket of Fast™ languages rather than reinventing the wheel.
This Pareto optimal API might cover 80% of the necessary use case (hot path) even if only exposing 20% of the fully fledged bare metal component's capabilities and that's perfectly fine.
Rather than reinventing the wheel in your preferred Fast™ language, done out of love rather than sense and trying to cover that 80% of necessary cases (maybe?), you sticky tape in the best solution that you can back off to with full functionality even if it's written in FortRust++.
Python continues bouncing along, optimizing for end user capabilities (ease of use, composition, simplicity, ...) rather than underlying magic.
The Rust realization
I was lucky for Python to be my first serious language. It's my pseudo-code and my old slightly deranged friend. For half my life it has nestled in the folds of my brain.
Rust took hold in the last few years, starting when I was at CommonCrawl and I was somehow seriously considering a single threaded laptop implementation of PageRank for processing our 128 billion edge web graph.
Since then I fell in love with it. I'd found my preferred Fast™ language after many years of looking!
It clicks into Python perfectly. Beyond perfectly even.
I mentioned on Reddit that I missed Cython's low overhead auto-compiling where if you import primes
it'll search for prime.pyx
and auto-compile + cache before import.
Three hours later, messense
created a ticket. A short while later the Maturin Import Hook allowing you to import stand-alone Rust files.
If you put these files in a folder (after installing Rust and pip install maturin maturin_import_hook
) you can just run python fib.py
and everything works.
### rfib.rs
use pyo3::prelude::*;
#[pyfunction]
fn fib(x: usize) -> usize {
match x {
0 => 0,
1 => 1,
x => fib(x - 1) + fib(x - 2),
}
}
#[pymodule]
fn rfib(m: &Bound<'_, PyModule>) -> PyResult<()> {
m.add_wrapped(wrap_pyfunction!(fib))?;
Ok(())
}
### fib.py
import maturin_import_hook
maturin_import_hook.install()
from rfib import fib
print([fib(x) for x in range(0, 16)])
Thanks to this new found love I might have overindexed on Python a bit.
Infrastructure (texting_robots
) and heavy projects.
Yet when I went to write a tutorial for Rust ... the Python implementation was almost there!
I'd made the mistake of forgetting Python's implicit JIT ecosystem didn't exist - or that I was special enough to assume where I was going had never been tread on and that my solution would be faster.
- Python for more than half my life, Rust for the last few, and even though I love Python I've realized I want to shuffle away from it not as it's bad, but it's the bare metal I was missing for Python
- Tutorial that I was translating from real world Big Data that hit the hot path in Python so was damned close to Rust
- My internal (can't write about it) use case required a few extra twists and turns, so Rust made sense, but the second I fell back to the Pareto optimal path Python won in most cases
The implicit JIT ecosystem made explicit
- Note the {Python, Ruby, PHP} with {Instagram / YouTube / ..., GitHub / Shopify / ..., Facebook} and Javascript + Rust for Figma
The glue language is the LLM language
- Python was optimized to be concise, forgiving, and (relatively) simple for humans - which is the exact same needs for LLMs
- The Python interpreter could give feedback as a tick tock step during training
- I don't want anything beyond three lines (
import
, setup, execution) if I can help it and few languages offer that with such low overhead and with such a strong ecosystem that continues to be built out for this case
Full notes:
- Title could also be "Of Emojis, Python, Rust, and Common Crawl"
- "Gluing Python, Rust, Emojis, and Common Crawl together"
- Python is the magical glue programming language I've been happy with my entire life
- Low cognitive overhead, rich ecosystem, excellent interoperability, fast development, strong standard library
- Minimal lines of code to get a task done that can be done without a single lookup for most tasks (internal knowledge for syntax and base libraries)
- Most of those lines are high efficiency especially if it's on a common path
- For the uncommon paths that become common it's a structural fault to have it slow because ...
- Python is an ecosystem that agglomerates high efficiency libraries together (glue programming language)
numpy
,pandas
,pytorch
,tensorflow
,scikit-learn
, ...- Even if not everything is exposed to Python it's usually a Pareto optimal selection of the 20% of the API that covers 80% of the tasks
- Reminded, starkly, of this when writing a tutorial for Rust code and writing a Python implementation
- The example task was "too simple" in that it fell within Python's hot path (file I/O, decompression, JSON, HTTP requests, ...) and hence was only a tad less efficient than Rust
- Glue language and passing over to the fast custom language for the compute heavy core task is a winning strategy
- Perhaps reframe as "Reach for Rust less when you're playing with glue"
- Or, better worded, "Python as Glue: When (Not) to Reach for Rust"
- Python is the optimal glue language, very much meant to have things glued into it, and Rust is the optimal fast language that's made to be glued in
- Almost all of my large scale work has been Python first, efficiency when needed, and Python gets you most of the way most of the time
- Python's "slow" reputation misses the point
- Glue languages are also likely the optimal target for LLMs
- Future LLM training could heavily leverage the Python interpreter which isn't an option for non-interpretive or slow languages
- In training {Generate code (GPU), try to run code (CPU)} is already a natural gap so no training / inference time is lost on the "first pass" through the training dataset
- From later: LLM ends up writing (suboptimal) code that will likely nudge towards the hot path, as like Keras the defaults were usually a good balance (Pareto optimal again)
- Whilst Google might have started with a Python based web crawler I'd have transitioned to Rust at this stage - and indeed I know that my
robots.txt
Rust library that I wrote about creating and testing with 34 millionrobots.txt
files has been used by at least one Fortune 500 for their large scale web work - If I'm going to start a new project of unknown scope, or I'm exploring a library / API / protocol, I'll want to be doing it in Python
python
and an import or two is sufficient to start (batteries built in and your extra global batteries are always available (global imports)) versus Rust which could be a single file but ... won't be- Python interpreter to slowly pull apart a hairy JSON response or to poke and prod an API that's kinda documented but not really documented y'know?
- When productivity isn't about being fast, it's about building the right parts fast
- Why hasn't Python seen the same level of optimization as Javascript regarding JITs and faster implementation..? Perhaps the main question is ... does it matter that much, when all the parts you glue to are insanely fast? Possibly so - Python and ML have JITs custom to the task, but that's more limited scope (and again usually targeting Pareto optimality)
- Note about Mojo trying to super-optimize Python itself (perhaps the Javascript JIT compiler we've been waiting for) and then a big question on whether that's good or bad for the Python implicit JIT ecosystem as a whole
- The Python performance paradox - core only Python is slow and hence the solution when you stray off the optimal path isn't to develop / redevelop a hindered solution, it's to patch in the fastest solution
- Python isn't a JIT language but it is a JIT ecosystem (slow paths become optimized)
- Python favors ease of composition / use over speed but once the path is set the optimization follows
- Is ecosystem-level optimization more effective than language level optimization?
- Optimization on real world usage patterns (i.e. not premature optimization)
- Is this a more sustainable model for performance optimization given how complex systems have become? Python is half about making an entire stack of already glued components (PyTorch, CUDA, datasets in varying formats, data loaders requiring different formats or libraries or decompression, ...) into a (relatively) painless task
- Note that PyTorch started life as Torch in Lua, which was honestly a good deal faster for many parts (Javascript JIT adjacent) but lacked the ecosystem
- Follows the same pattern as suggested for MVPs in startup land except it's at the ecosystem level
- We see a tiny version of the JIT ecosystem historically made explicit in {Python, Ruby, PHP} with {Instagram/YouTube/..., GitHub, Facebook} but Python seems to have the strongest global JIT ecosystem narrative
- What other languages adopt the "glue first" mentality and how do they overcome such an extreme ecosystem disadvantage?
- Need to make it clear - everyone knows Python is a glue language, and I've been writing Python for more than half my life, yet this recent revelation hit me hard - potentially as Rust really is an amazing language and I realized even with that I should be falling back towards Python
- Desire paths for optimization where Python is a park and other languages can feel more like a jungle
- Kelsi mentions she has this with writing - outline versus detailed, where outline allows her to explore
- This is an anti-fragile approach, working out what's broken before effort