You are slicing your original dataset 1000 times, making 2000 comparisons with borders for each of the values in it.
It is much faster to calculate the proper bin for each value using simple math:
bin = (value - min_data) / dx
D[bin][1] += 1
And you can use original circle to set all В[bin][0]
The only problem, that at max_data, bin would be equal bins (not bins-1 as expected), you can add additional if or just add additional bin D = np.zeros([bins+1,2]) and then add from bins to bins-1 and drop the bin=bins.
UPD: So here is my full code (on the base of original code with some refactoring)
def histogram(a, bins = 1000, density=False):
min_data = np.min(a)
max_data = np.max(a)
dx = (max_data - min_data) / bins
x = np.zeros(bins)
y = np.zeros(bins+1)
for i in range(bins):
x[i] = i*dx + min_data
for v in a:
bin = int((v - min_data) / dx)
y[bin] += 1
y[bins-2] += y[bins-1]
y = y[:bins]
if density == True:
inte = sum((x[1]-x[0])*y)
y /= inte
plt.bar(x, y, width=dx)
return np.column_stack((x, y))
It gave me 15 seconds in comparision with 60 seconds of original code. What else can we do? The most intense calculations are in this circle
for v in a:
bin = int((v - min_data) / dx)
y[bin] += 1
And they are made pure python which is really slow (C++ is almost 10 times faster, and it is used to optimise inner calculation in numpy). So we'd better make these calculations with numpy (just change previous 3 lines to that 3 lines).
a_to_bins = ((a - min_data) / dx).astype(int)
for bin in a_to_bins:
y[bin] += 1
That gave me the final time about 7 seconds.
UPD 2: What is the slowest part now? As we know python intensive calculations are slow. Let's check it:
def histogram(a, bins = 1000, density=False):
start_time = time.perf_counter()
min_data = np.min(a)
max_data = np.max(a)
dx = (max_data - min_data) / bins
print(time.perf_counter() - start_time, 'to calc min/max')
x = np.zeros(bins)
y = np.zeros(bins+1)
print(time.perf_counter() - start_time, 'to create x, y')
for i in range(bins):
x[i] = i*dx + min_data
print(time.perf_counter() - start_time, 'to calc bin borders')
a_to_bins = ((a - min_data) / dx).astype(int)
print(time.perf_counter() - start_time, 'to calc bins')
for bin in a_to_bins:
y[bin] += 1
print(time.perf_counter() - start_time, 'to fill bins')
y[bins-2] += y[bins-1]
y = y[:bins]
if density == True:
inte = sum((x[1]-x[0])*y)
y /= inte
print(time.perf_counter() - start_time, 'before draw')
plt.bar(x, y, width=dx)
print(time.perf_counter() - start_time, 'after draw')
return np.column_stack((x, y))
Gives me:
0.010483399993972853 to calc min/max
0.011489700002130121 to create x, y
0.012588899990078062 to calc bin borders
0.09252060001017526 to calc bins
7.7265202999988105 to fill bins
7.727168200013693 before draw
8.440735899988795 after draw
So almost all the time was consumed by python circle to add 1 to the bin (pay attention that previous calculation with numpy made all bin calculation for less than 0.1 seconds). So if we rewrite that code in C++ we would probably get 10x faster result which is pretty close to the original .hist timing.
By the way, if I use @nb.jit(parallel=True)
than time goes down to
0.014584699994884431 to calc min/max
0.014821599994320422 to create x, y
0.3439053999900352 to calc bin borders
0.5012229999992996 to calc bins
0.8304882999800611 to fill bins
0.8317092999932356 before draw
1.5190471999812871 after draw
While original .hist
on my PC takes 0.8466330000082962 (I don't know if it uses parallel computation or not).
In total: pure python calculations are terribly slow, transfering them into C++ makes huge impact on speed. That is why you'd better use numpy calculations (which uses C++ under the hood) instead of python circles. It can give the speed up around 10x or even more.