Most recent update: 19th November 2024 - 08:23:11 - 9755 characters

Python isn't just glue, it's a JIT ecosystem

Python has always been known as the glue language. It's not fast. It's not particularly special. And it always has a breaking point.

Yet there is something special to it. You can write it in your sleep, import any_magic as you need (batteries or pip installed) to conjure up some scary library written in FortRust++ to do that weird thing you need, and then step through the code as it breaks horrifically revealing that you're not, in fact, perfect.

There are aspects of "special" already embedding in that - but the part that seems most profoundly special is about how the Python ecosystem evolves.

When you write Python code you're not just writing glue code, you're participating as an explorer in Python's implicit just-in-time compilation ecosystem.

Every time a new found Python code path becomes hot enough, the ecosystem responds by forging a new component from the barest of metals to then glue in place.

This glue isn't static. It evolves based on the ecosystem's usage patterns. We're all a distributed performance profiler raising tickets and pitching in, whether you realized it or not. Python's role isn't just connecting components - it's discovering which components need to exist

The Python performance paradox

  • Introduce Python slow, slow being the catalyst to glue fast bare metal components

Python is slow. We all know this. Maybe the heretical side of me says, though, that we might almost want this?

When a Python code path becomes slow enough to matter, something counter-intuitive happens: the ecosystem doesn't optimize the Python, it forges an entirely new solution from bare metal.

This is no accident, it's an emergent optimization strategy that works better than any planning could hope to.

Python, by itself, is the antithesis of premature optimization. It's all about exporing, getting something running, and then deciding to make it work faster (if necessary) later.

This means composition over premature optimization. This means real world driven optimization.

Python as an implicit JIT ecosystem

  • Favoring ease of use and composition in a single glue language means finding the best paths to take to solve problems - even if they're not fast
  • Real world profiler driven optimization means those dirt roads are rapidly upgraded to bullet trains
    • Pareto optimal desire paths mean you'll get most of the functionality but you might need to dip into the fully fledged bare metal component if necessary - yet that's still likely a better option that a Fastâ„¢ language reinventing the wheel

As a glue language you might think optimizing on focused on ease of use and prototyping over speed and efficiency will permanently hobble you.

The Rust realization

  • Python for more than half my life, Rust for the last few, and even though I love Python I've realized I want to shuffle away from it not as it's bad, but it's the bare metal I was missing for Python
  • Tutorial that I was translating from real world Big Data that hit the hot path in Python so was damned close to Rust
    • My internal (can't write about it) use case required a few extra twists and turns, so Rust made sense, but the second I fell back to the Pareto optimal path Python won in most cases

The glue language is the LLM language

  • Python was optimized to be concise, forgiving, and (relatively) simple for humans - which is the exact same needs for LLMs
  • The Python interpreter could give feedback as a tick tock step during training

The implicit JIT ecosystem made explicit

  • Note the {Python, Ruby, PHP} with {Instagram/YouTube/..., GitHub, Facebook} and Javascript + Rust for Figma

Full notes:

  • Title could also be "Of Emojis, Python, Rust, and Common Crawl"
    • "Gluing Python, Rust, Emojis, and Common Crawl together"
  • Python is the magical glue programming language I've been happy with my entire life
    • Low cognitive overhead, rich ecosystem, excellent interoperability, fast development, strong standard library
  • Minimal lines of code to get a task done that can be done without a single lookup for most tasks (internal knowledge for syntax and base libraries)
  • Most of those lines are high efficiency especially if it's on a common path
  • For the uncommon paths that become common it's a structural fault to have it slow because ...
  • Python is an ecosystem that agglomerates high efficiency libraries together (glue programming language)
    • numpy, pandas, pytorch, tensorflow, scikit-learn, ...
    • Even if not everything is exposed to Python it's usually a Pareto optimal selection of the 20% of the API that covers 80% of the tasks
  • Reminded, starkly, of this when writing a tutorial for Rust code and writing a Python implementation
    • The example task was "too simple" in that it fell within Python's hot path (file I/O, decompression, JSON, HTTP requests, ...) and hence was only a tad less efficient than Rust
  • Glue language and passing over to the fast custom language for the compute heavy core task is a winning strategy
  • Perhaps reframe as "Reach for Rust less when you're playing with glue"
    • Or, better worded, "Python as Glue: When (Not) to Reach for Rust"
    • Python is the optimal glue language, very much meant to have things glued into it, and Rust is the optimal fast language that's made to be glued in
  • Almost all of my large scale work has been Python first, efficiency when needed, and Python gets you most of the way most of the time
  • Python's "slow" reputation misses the point
  • Glue languages are also likely the optimal target for LLMs
    • Future LLM training could heavily leverage the Python interpreter which isn't an option for non-interpretive or slow languages
    • In training {Generate code (GPU), try to run code (CPU)} is already a natural gap so no training / inference time is lost on the "first pass" through the training dataset
    • From later: LLM ends up writing (suboptimal) code that will likely nudge towards the hot path, as like Keras the defaults were usually a good balance (Pareto optimal again)
  • Whilst Google might have started with a Python based web crawler I'd have transitioned to Rust at this stage - and indeed I know that my robots.txt Rust library that I wrote about creating and testing with 34 million robots.txt files has been used by at least one Fortune 500 for their large scale web work
  • If I'm going to start a new project of unknown scope, or I'm exploring a library / API / protocol, I'll want to be doing it in Python
    • python and an import or two is sufficient to start (batteries built in and your extra global batteries are always available (global imports)) versus Rust which could be a single file but ... won't be
    • Python interpreter to slowly pull apart a hairy JSON response or to poke and prod an API that's kinda documented but not really documented y'know?
    • When productivity isn't about being fast, it's about building the right parts fast
  • Why hasn't Python seen the same level of optimization as Javascript regarding JITs and faster implementation..? Perhaps the main question is ... does it matter that much, when all the parts you glue to are insanely fast? Possibly so - Python and ML have JITs custom to the task, but that's more limited scope (and again usually targeting Pareto optimality)
    • Note about Mojo trying to super-optimize Python itself (perhaps the Javascript JIT compiler we've been waiting for) and then a big question on whether that's good or bad for the Python implicit JIT ecosystem as a whole
  • The Python performance paradox - core only Python is slow and hence the solution when you stray off the optimal path isn't to develop / redevelop a hindered solution, it's to patch in the fastest solution
    • Python isn't a JIT language but it is a JIT ecosystem (slow paths become optimized)
    • Python favors ease of composition / use over speed but once the path is set the optimization follows
    • Is ecosystem-level optimization more effective than language level optimization?
    • Optimization on real world usage patterns (i.e. not premature optimization)
    • Is this a more sustainable model for performance optimization given how complex systems have become? Python is half about making an entire stack of already glued components (PyTorch, CUDA, datasets in varying formats, data loaders requiring different formats or libraries or decompression, ...) into a (relatively) painless task
    • Note that PyTorch started life as Torch in Lua, which was honestly a good deal faster for many parts (Javascript JIT adjacent) but lacked the ecosystem
    • Follows the same pattern as suggested for MVPs in startup land except it's at the ecosystem level
    • We see a tiny version of the JIT ecosystem historically made explicit in {Python, Ruby, PHP} with {Instagram/YouTube/..., GitHub, Facebook} but Python seems to have the strongest global JIT ecosystem narrative
  • What other languages adopt the "glue first" mentality and how do they overcome such an extreme ecosystem disadvantage?
  • Need to make it clear - everyone knows Python is a glue language, and I've been writing Python for more than half my life, yet this recent revelation hit me hard - potentially as Rust really is an amazing language and I realized even with that I should be falling back towards Python