numpy array/item handling functions

Question

Recently I have been using numpy arrays, which have great utility via their broadcasting methods.

I am attempting to write a useful public facing library, and this has me thinking about how best to approach functions to handle both numpy arrays and single items.

Here is a (working, but cut down) version of the Octahedron class I am writing:

import numpy as np


class Octahedron:
    r2 = 2 ** 0.5
    r3 = 3 ** 0.5
    r6 = 6 ** 0.5  # r2 * r3
    i2 = 1.0 / r2
    i3 = 1.0 / r3
    i6 = 1.0 / r6
    i26 = 2. * i6

    ov = {
        'N': (0.0, 0.0, +1.0), 'S': (0.0, 0.0, -1.0),  # North/South 0 1 (z is vertical)       +90, -90 lat
        'E': (0.0, +1.0, 0.0), 'W': (0.0, -1.0, 0.0),  # East/West 2 3 (y is left to right)  +90, -90 lon
        'A': (+1.0, 0.0, 0.0), 'P': (-1.0, 0.0, 0.0),  # Anterior/Posterior 4 5 (x is front to back)  0, 180 lon
    }

    # These rotate a 3D face to the (Z) plane
    matrices = {
        'NEA': np.array([[-i2, 0, i2], [i6, -i26, i6], [i3, i3, i3]]),
        'NEP': np.array([[0, -i2, i2], [i26, i6, i6], [-i3, i3, i3]]),
        'NWA': np.array([[0, i2, i2], [-i26, -i6, i6], [i3, -i3, i3]]),
        'NWP': np.array([[i2, 0, i2], [-i6, i26, i6], [-i3, -i3, i3]]),
        'SEA': np.array([[0, -i2, -i2], [-i26, i6, -i6], [i3, i3, -i3]]),
        'SEP': np.array([[i2, 0, -i2], [-i6, -i26, -i6], [-i3, i3, -i3]]),
        'SWA': np.array([[-i2, 0, -i2], [i6, i26, -i6], [i3, -i3, -i3]]),
        'SWP': np.array([[0, i2, -i2], [i26, -i6, -i6], [-i3, -i3, -i3]]),
    }

    def __init__(self):
        self.signs = {_face: np.sum([self.ov[v] for v in _face], axis=0) for _face in self.matrices}
        self.pt_signs = {tuple(v): k for k, v in self.signs.items()}

    def pt_face(self, uvw):  # Which face does this point belong to?
        return self.pt_signs[tuple(np.sign(uvw).astype(int).tolist())]

    def pts_faces(self, uvw_s):  # face.keys from np_array of 3d Octahedron/Spherical points
        pts = uvw_s if isinstance(uvw_s, np.ndarray) else np.array(uvw_s)
        return np.apply_along_axis(self.pt_face, -1, pts)

    @classmethod
    def pt_valid(cls, pt):  # Constraint: |u|+|v|+|w|=1 (surface of the unit octahedron)
        return np.abs(np.sum(np.abs(pt)) - 1.) < 1e-15

    def pts_valid(self, uvw_s):  # Constraint: |u|+|v|+|w|=1 (surface of the unit octahedron)
        pts = uvw_s if isinstance(uvw_s, np.ndarray) else np.array(uvw_s)
        return np.all(np.apply_along_axis(self.pt_valid, -1, pts))


def random_octahedral(n):
    # return a random points from Octant 0.
    uvw = []
    r_xy = np.random.uniform(0, 1, [n, 2])
    for (a, b) in r_xy:
        if a + b > 1:
            a, b = 1 - a, 1 - b
        uvw.append([a, b, 1 - a - b])
    return np.array(uvw)


if __name__ == '__main__':
    o = Octahedron()
    pts = random_octahedral(100)
    all_good = o.pts_valid(pts)
    for k, v in o.signs.items():
        f_pts = np.copysign(pts, v)
        faces = np.unique(o.pts_faces(f_pts))
        if len(faces) != 1:
            print(f'{k}: mysterious outlier in {faces}')
        elif faces[0] != k:
            print(f'{k}: mysterious result in {faces}')
        else:
            print(f'{k}: looks good!')

While any suggestions for improvement of both readability and efficiency are welcome, my main question is whether or not I should be writing the functions in that way, or whether I should be using alternatives for handling different input types / composites.

J_H · Accepted Answer · 2025-03-05 16:29:22Z

5

force a type

We conditionally construct an NDArray:

    def pts_valid( ... ):  
        pts = uvw_s if isinstance(uvw_s, np.ndarray) else np.array(uvw_s)

It would be enough to unconditionally assign pts = np.array(uvw_s). Even if the input is "large", you'll efficiently obtain a view, with no copying. In other words, let numpy sweat the object type details.

This is similar to code which might possibly obtain a string p and unconditionally assigns

    def report(p: Path | str):
        p = Path(p)
        ...

dead store

    all_good = o.pts_valid(pts)

The OP code never consults that flag. Consider making that method raise ValueError() rather than returning False, if that suits your Use Case. That way the caller can't ignore the trouble.

type hinting

It's optional, but annotating the method signatures would make it easier for folks to correctly call and to analyze them.

Numpy annotations are slightly more involved than basic types such as
s: str. Here is one example that works with mypy and pyright. The extra import plays an important role.

from numpy.typing import NDArray
    ...
    x: NDArray[np.float64]
    ...

answered Mar 5 at 16:29

J_H

40.7k3 gold badges37 silver badges152 bronze badges

\$\begingroup\$ Any further thoughts about pts/pt methods? is np.apply_along_axis a well-trodden practice for dealing with that? \$\endgroup\$
– Konchog
Commented Mar 5 at 16:49
\$\begingroup\$ (soz- on the all_good, I inadvertently left out the 'if all_good...' logic in the example.) \$\endgroup\$
– Konchog
Commented Mar 5 at 16:57
1

\$\begingroup\$ Well, apply along axis is fine if you can’t vectorize. But it’s N python function calls. Best practice with numpy is always to push an operation down to where the BLAS can broadcast it, where feasible. Offhand I do not yet notice how to do that. In your favor, you don’t have a huge array yet. Use python -m cProfile if you want to measure the apply() cost. \$\endgroup\$
– J_H
Commented Mar 5 at 20:37
\$\begingroup\$ Haha - sure... I cannot vectorise everything! I wish I could too! Thanks for the thoughts. \$\endgroup\$
– Konchog
Commented Mar 5 at 20:47
1

\$\begingroup\$ @Konchog you obviously can vectorize pts_valid ((np.abs(np.sum(np.abs(pts), axis=-1) - 1.) < 1e-15).all() probably), but that's a performance vs maintainability tradeoff: if you do that, you have to keep the two definitions in sync should something change. If there are only few points, it's likely not worth duplicating, I'd keep the current version or even leave numpy and use short-circuiting all builtin (all(self.valid(pt) for pt in pts), even without casting to np.array beforehand). Are you interested in answer exploring this route? \$\endgroup\$
– STerliakov
Commented Mar 6 at 14:34

| Show 1 more comment

toolic · Accepted Answer · 2025-03-05 17:12:09Z

Here are some general coding style suggestions.

Documentation

The PEP 8 style guide recommends adding docstrings for classes and functions.

Octahedron seems like a good name for a class, and the comments in the code are helpful. For the class, the docstring could summarize a few items:

Your motivation for creating the class
What types of things does the class do
What are the public properties of the class. For example, describe what signsare.

For functions, you could convert comments:

def pt_face(self, uvw):  # Which face does this point belong to?

to docstrings:

def pt_face(self, uvw):
    """ Which face does this point belong to? """

Readability

I find multiple assignments in a single line hard to read:

a, b = 1 - a, 1 - b

I think think this is easier to read:

a = 1 - a
b = 1 - b

Comment

There is no need for this comment:

r6 = 6 ** 0.5  # r2 * r3

if you just use this:

r6 = r2 * r3

Thanks for the heads up on docstrings! Also other good ptrs, and I appreciate the positive criticism how about the pt_ pts_ methods? Is that reasonable? — Konchog, Commented Mar 5 at 17:28

Reinderien · Accepted Answer · 2025-03-09 13:06:04Z

It's not a great idea to **0.5; just call np.sqrt - once over all three radicands.

ov only seems used for self.signs, but there is a more direct way to construct signs that does not require ov and does not iterate; and puts all of the sign vectors in contiguous memory.

matrices is not used. Do you really need it? And why is it a class variable versus the other variables? At a wild guess, it's not useful or meaningful for this to be a class variable, and it can just be an instance variable. This allows callers to control when matrices is constructed; right now there is no control possible and the time will be spent during import.

matrices, similar to signs, can be constructed in a way that keeps them in contiguous memory.

pt_signs being a dictionary forces a non-vectorised lookup. Instead you can use a vectorised lookup by doing an array comparison. apply_along_axis will be slow.

Add type hints. These can (to a limited degree) express dtype and axis count. If this needs to become a public library, more attention needs to be paid to this than what I've shown; for instance some of the FloatsN3 can actually accept broader array-likes due to the asarray call.

Don't uvw_s if isinstance(uvw_s, np.ndarray) else np.array(uvw_s). Just call asarray; or pass the argument directly to Numpy analysis routines that will coerce this for you.

random_octahedral needs seed control, and should not iterate; it should be vectorised.

uniform(0, 1) is equivalent to random().

In random_octahedral, you have three (!) different axis designation schemes - ab, xy and uvw. Just use one.

All together,

import typing

import numpy as np

if typing.TYPE_CHECKING:
    FloatsN3 = np.ndarray[tuple[int, typing.Literal[3]], np.dtype[np.float64]]
    StringsN = np.ndarray[tuple[int], np.dtype[np.str_]]


class Octahedron:
    __slots__ = 'matrices', 'signs', 'signs_by_vertex'

    vertices = np.array(('NEA', 'SEA', 'NEP', 'SEP', 'NWA', 'SWA', 'NWP', 'SWP'))
    vertices.flags.writeable = False

    def __init__(self) -> None:
        # These rotate a 3D face to the (Z) plane
        z_array = (
            ((-1,  0,  1), ( 1, -2,  1), ( 1,  1,  1)),  # nea
            (( 0, -1, -1), (-2,  1, -1), ( 1,  1, -1)),  # sea
            (( 0, -1,  1), ( 2,  1,  1), (-1,  1,  1)),  # nep
            (( 1,  0, -1), (-1, -2, -1), (-1,  1, -1)),  # sep
            (( 0,  1,  1), (-2, -1,  1), ( 1, -1,  1)),  # nwa
            ((-1,  0, -1), ( 1,  2, -1), ( 1, -1, -1)),  # swa
            (( 1,  0,  1), (-1,  2,  1), (-1, -1,  1)),  # nwp
            (( 0,  1, -1), ( 2, -1, -1), (-1, -1, -1)),  # swp
        )/np.sqrt((2, 6, 3))[:, np.newaxis]
        z_array.flags.writeable = False
        self.matrices = dict(zip(self.vertices, z_array))

        s = np.array((1, -1), dtype=np.int8)
        self.signs = np.stack(
            np.meshgrid(s, s, s), axis=-1,
        ).reshape((-1, 3))
        self.signs.flags.writeable = False

        self.signs_by_vertex = dict(zip(self.vertices, self.signs))

    def pts_faces(self, uvw: 'FloatsN3') -> 'StringsN':
        """face.keys from np_array of 3d Octahedron/Spherical points"""
        uvw_array = np.asarray(uvw)
        lhs = np.sign(uvw_array[..., np.newaxis, :])
        _, idx = (lhs == self.signs[np.newaxis, :, :]).all(axis=-1).nonzero()
        return self.vertices[idx]

    @staticmethod
    def pts_valid(uvw: 'FloatsN3', rtol: float = 1e-5, atol: float = 1e-8) -> bool:
        """Constraint: |u|+|v|+|w|=1 (surface of the unit octahedron)"""
        sums = np.abs(uvw).sum(axis=-1)
        return np.allclose(1, sums, rtol=rtol, atol=atol)


def random_octahedral(n: int, rand: np.random.Generator | None = None) -> 'FloatsN3':
    """Random points from Octant 0"""

    if rand is None:
        rand = np.random.default_rng()

    uvw = np.empty(shape=(3, n))
    uvw[:2] = rand.random(size=(2, n))
    u, v, w, = uvw  # read-write views

    too_high = u + v > 1
    uvw[:2, too_high] = 1 - uvw[:2, too_high]
    w[:] = 1 - u - v

    return uvw.T


def main() -> None:
    rand = np.random.default_rng(seed=0)
    o = Octahedron()
    pts = random_octahedral(n=100, rand=rand)

    all_good = o.pts_valid(pts, rtol=0, atol=1e-15)
    assert all_good

    for k, v in o.signs_by_vertex.items():
        f_pts = np.copysign(pts, v)
        faces = np.unique(o.pts_faces(f_pts))
        if len(faces) != 1:
            print(f'{k}: mysterious outlier in {faces}')
        elif faces[0] != k:
            print(f'{k}: mysterious result in {faces}')
        else:
            print(f'{k}: looks good!')


if __name__ == '__main__':
    main()

I have had a little time to go through your answer. The signs composition is very elegant - I'm wondering if it could be extended to map NS/WE/AP directly to the +1, -1 signs also? Likewise, I see the the final row of each of the z_array is actually the same as the signs, and maybe that could be exploited one way or another? The matrix is great as an instance variable - it just never changes. OTOH, I envisage the Octahedron as most likely going to be a singleton anyhow. The rotation matrices are important - I reduced the code somewhat to ask the questions at the top. — Konchog, Commented Mar 10 at 16:58

Stack Exchange Network

numpy array/item handling functions

3 Answers 3

force a type

dead store

type hinting

Documentation

Readability

Comment

Your Answer

Hot Network Questions

numpy array/item handling functions

3 Answers 3

force a type

dead store

type hinting

Documentation

Readability

Comment

Your Answer

Sign up or log in

Post as a guest

Related

Hot Network Questions