Interoperable math in python

Scientific computing often brings with it problems that reach large scale and require great resources to compute answers in a reasonable timeframe. There are several ways to reduce the running time of a program. The cheekiest example would be to optimize your code, but what about when your code is already fairly optimized? Well, you could add multithreading/multiprocessing. Depending on the language at hand, this can vary in difficulty. Depending on the nature of your problem, this may also slow you down. For example, if the problem is a sequence of dependent steps involving a small amount of data, parallelizing across multiple cores is likely to slow you down. On the other hand, if your problem is embarrassingly parallel, it may be trivial to fan out work across multiple cores and achieve nearly linear improvements in performance. You may find a barrier after the 1 node breakpoint, where you need to embrace a different architecture (e.g. pub/sub with work stealing and MPI for communication). At this point it may take a month to make your code run on multiple nodes in a cluster, and your program now competes for what are often quite valued resources. What could possibly be of much greater value is to simply choose a faster mathematical backend. Loosely speaking, this is the BLAS behind your code. If there isn’t one, your code can be sped up a thousandfold with minimal effort. You could also experiment with using a GPU - a process that gets easier each year - to elicit a further 100x improvement. A 100,000x reduction in execution time from a BLAS-less codebase to a GPU-enabled one is the difference between a runtime of 28 hours and a runtime of one second, or (often) the difference between a year (“your code cannot be run”) and a small portion of a day (“running your code is no trouble at all”).

prysm, the open source optics module I write, has tried to be CPU/GPU flexible since the early days (v0.7, I believe) in addition to leveraging some JIT tools to fuse kernels for polynomial evaluation. It allows the use of GPU or CPU to be selected at runtime by calling a single function:

from prysm import config

config.change_backend(to='np')  # to='cp', to='cupy'

The implementation of this is relatively simple, but under the hood relies on some dirty tricks and quirks of the python interpreter. Python modules have global variables, and the entire language has “pass by object” semantics, broadly similar to pass by reference.


foo = 1

def chfoo():
    globals()['foo'] = 2

import mod1
from mod1 import foo
print(foo) # 1
print(  # these two are the same

# what happens now?
mod1.chfoo()  # or import the function and call it; no difference
print(foo)  # 1
print( # 2

Whitespace is significant in python, so the consideration is as simple as “any unindented declaration in a python file is a global variable.” These are all able to be accessed via the globals() function, which returns an object reference to the global dictionary, which you can mutate. When you import a varable using “from x import y” syntax, the value is retrieved using something conceptually similar to mod1.globals()[variable_name]. This gives this syntax a generally pass-by-value feel, with the exception of mutable types (dicts, classes, etc.) The reference can be overwritten without a change in another file, but the change won’t propagate. On the other hand, if the value itself is mutated (add or delete a dict key, modify a class attribute, etc) this does propagate.

Modules themselves are themselves are similar to classes, and could be rudimentarily implemented as:

class PythonModule:
    def __init__(self, globals_):
        self._globals = globals_
    def globals(self):
        return self._globals
    def __getattr__(self, key):
        return self.globals()[key]

Since getattr calls globals on any attribute access, the value is updated with any access of a variable through the module name itself. This means that, provided you restrict usage to, you can change bar without the user being any the wiser. This is how prysm allows numpy and cupy to be swapped at runtime,

# from

constants = frozenset((

... # more frozensets of variables in numpy excluded

def change_backend(to):
    if to == 'cu':
        if not cuda_compatible:
            raise ValueError('installation lacks cuda support.')
            target_base = 'cupy'
            target_fft = 'cupy.fft'
            target_linalg = 'cupy.linalg'
            target_rand = 'cupy.random'
            # target_scipy = 'cupyx.scipy'

    elif to == 'np':
        target_base = 'numpy'
        target_fft = 'numpy.fft'
        target_linalg = 'numpy.linalg'
        target_rand = 'numpy.random'
        # target_scipy = 'scipy'

        # two sets of functionality unavailable via cupy
        for func in allfuncs_cupy_missing:
            exec(f'from {target_base} import {func}')
            globals()[func] = eval(func)

        for func in linalgfuncs:
            exec(f'from {target_linalg} import {func}')
            globals()[func] = eval(func)

    for func in allfuncs:
        exec(f'from {target_base} import {func}')
        globals()[func] = eval(func)

    for const in constants:
        exec(f'from {target_base} import {const}')
        globals()[const] = eval(const)

    for func in fftfuncs:
        exec(f'from {target_fft} import {func}')
        globals()[func] = eval(func)

    for func in randomfuncs:
        exec(f'from {target_rand} import {func}')
        globals()[func] = eval(func)
config.backend = config.backend  # trigger import of math functions

This code works as follows:

It iterates though some frozensets (as a security measure vs, say, lists, since they cannot be mutated). In each, it knows what submodule of numpy or cupy to import from and builds a string that would import that function if run.

It then uses exec (eval without the ability to return) to run this string of code. This dumps a new function into the local namespace, but we lack a reference to the function. Using eval on the name of the function itself, we retrieve a reference to the function and mutate the globals dictionary.

This code is a dirty, filthy hack that does achieve something of value. But as Raymond Hettinger would say - there must be a better way.

Today, I thought of a better way. The code is more concise, but will mean a breaking API change. Instead of users writing:

from prysm import mathops as m

ary = m.empty(10)
m.fft(ary)  # FFT at top level namespace

They will write:

from prysm.mathops import engine as e

ary = e.empty(10)
e.fft.fft(ary)  # now you have to know the submodule you want it to come from

So how do we implement such a thing? The implementation is actually pretty terse,

class MathEngine:
    def __init__(self, source=np):
        self.source = np

    def __getattr__(self, key):
            return getattr(self.source, key)
        except AttributeError:
            # function not found, fall back to numpy
            # this will actually work nicely for numpy 1.16+
            # due to the __array_function__ and __array_ufunc__ interfaces
            # that were implemented
            return getattr(np, key)  # this can raise, but we don't *need* to catch

    def change_backend(self, backend):
        if isinstance(backend, str):
            exec(f'import {backend}')
            self.source = eval(backend)
            # backend is a module
            self.source = backend

engine = MathEngine()

All this class does is have the ability to set various modules as an instance attribute and threads calls to getattr through to that module. change_backend has a little bit of logic to deal with strings, but the user can just pass a module instead. This means we no longer have hard-coded numpy and cupy, and the user could drop in a custom module, something like Chainer or TensorFlow or PyTorch, etc. We’ve reduced the amount of code we maintain (this is a plus), gained the ability for a user to inject anything they want for a math backend (another plus), and we no longer get hundreds of linter errors that prysm.mathops has no member xyz.

There are a few naggles having to do with special functions that make the real implementation a little bit more complicated, but conceptually this is all there is to it.