python - Numpy vectorized summation with variable number of factors -
i computing function contains summation on index. index between 0 , integer part of t; ideally able compute summation several values of t. in real-life case, of values of t small, small percentage can 1 or 2 orders of magnitude larger average.
what doing is:
1) define vector t, e.g. (my real-life data have larger number of entries, give idea):
import numpy np t = np.random.exponential(5, 10)
2) create matrix containing factors between 0 , int(t), , zeroes:
n = int(t.max()) j = ((np.arange(n) < t[:,np.newaxis])*np.arange(1,n+1)).astype(int).transpose() print(j)
[[ 1 1 1 1 1 1 1 1 1 1]
[ 2 0 2 2 2 0 2 0 2 2]
[ 0 0 3 0 3 0 3 0 3 3]
[ 0 0 4 0 4 0 0 0 4 4]
[ 0 0 5 0 5 0 0 0 5 5]
[ 0 0 6 0 6 0 0 0 6 6]
[ 0 0 7 0 7 0 0 0 0 7]
[ 0 0 8 0 8 0 0 0 0 8]
[ 0 0 9 0 9 0 0 0 0 9]
[ 0 0 0 0 10 0 0 0 0 10]
[ 0 0 0 0 11 0 0 0 0 0]
[ 0 0 0 0 12 0 0 0 0 0]]
3) generate single elements of summation, using mask avoid applying function elements zero:
a = np.log(1 + (1 + j) * 5)* (j>0)
4) sum along columns:
a.sum(axis=0)
obtaining: array([ 5.170484 , 2.39789527, 29.96464821, 5.170484 , 42.29052851, 2.39789527, 8.21500643, 2.39789527, 18.49060911, 33.9899999 ])
is there fastest/better way vectorize that? have feeling slow due large amount of zeroes not contribute sum, since beginner numpy couldn't figure out better way of writing it.
edit: in actual problem, function applied j depends on second parameter tau (in vector of same size of t). items contained in every column not same.
looking @ j
, each column has numbers going 1
n
, n
being decided based on each t
element. then, summing along each column, same summing until n
because rest of elements zeros anyway. summed values calculated np.cumsum
, n
values limits of each column in j
directly calculated t
. these n
values used indices index cumsum-ed
values give final output.
this should pretty fast , memory efficient, given cumsum
computation done , on 1d array, compared summation done in original approach on 2d array along each column. thus, have vectorized approach -
n = int(t.max()) vals = (np.log(1 + (1 + np.arange(1,n+1)) * 5)).cumsum() out = vals[(t.astype(int)).clip(max=n-1)]
in terms of memory usage, generating 3 variables -
n : scalar vals : 1d array of n elements out : 1d array of t.size elements (this output anyway)
runtime test , verify output -
in [5]: def original_app(t): ...: n = int(t.max()) ...: j = ((np.arange(n) < t[:,none])*np.arange(1,n+1)).astype(int).transpose() ...: = np.log(1 + (1 + j) * 5)* (j>0) ...: return a.sum(axis=0) ...: ...: def vectorized_app(t): ...: n = int(t.max()) ...: vals = (np.log(1 + (1 + np.arange(1,n+1)) * 5)).cumsum() ...: return vals[(t.astype(int)).clip(max=n-1)] ...: in [6]: # input array ...: t = np.random.exponential(5, 10000) in [7]: %timeit original_app(t) 100 loops, best of 3: 9.62 ms per loop in [8]: %timeit vectorized_app(t) 10000 loops, best of 3: 50.1 µs per loop in [9]: np.allclose(original_app(t),vectorized_app(t)) # verify outputs out[9]: true
Comments
Post a Comment