我有以下清单,
Events = [0, 0, 0, 1, 1, 0]
Details = ['Start', 'End', 'Start', 'Start', 'End', 'End]
Time = [0, 1, 4, 5, 10, 16]我需要以下列方式对个别事件进行分组:
Event 0:
Sum of Start Times = 0+4 = 4
Sum of End Times = 1+16 = 17
Total time spend by event 0 = 17-4 = 13
Event 1:
Sum of start times = 5
Sum of end times = 10
Total time spend by event 1 = 10-5=5我想要一些速记版本来做这个。如果有大量的事件和大量的时间,那么定义for-if循环类型的语法就变得很费时,就像在Java中所做的那样。
有没有一种有效的方法来做到这一点?
发布于 2014-09-23 05:41:57
与选项类似,您可以执行以下操作:
result = {}
for e, d, t in zip(Events, Details, Time):
result.setdefault(e, {})
result[e].setdefault(d, 0)
result[e][d] += t
print result
>>> {0: {'Start': 4, 'End': 17}, 1: {'Start': 5, 'End': 10}}在此之后,很容易产生您预期的输出。
更新:
感谢@abarnert:从集合导入计数器
result = {}
for e, d, t in zip(Events, Details, Time):
result.setdefault(e, collections.Counter())[d] += t
print result
>>> {0: Counter({'End': 17, 'Start': 4}), 1: Counter({'End': 10, 'Start': 5})}感谢@AMacK:
result = {}
for e, d, t in zip(Events, Details, Time):
result.setdefault(e, {}).setdefault(d, []).append(t)
print result
>>> {0: {'Start': [0, 4], 'End': [1, 16]}, 1: {'Start': [5], 'End': [10]}}向你问好,阿特姆
发布于 2014-09-23 09:06:57
对于Numpy,你会这样做:
>>> import numpy as np
>>> Events = np.array([0, 0, 0, 1, 1, 0])
>>> Details = np.array(['Start', 'End', 'Start', 'Start', 'End', 'End'])
>>> Time = np.array([0, 1, 4, 5, 10, 16])
>>> is_start = (Details == 'Start')
>>> sum_start = np.bincount(Events[is_start], Time[is_start])
>>> sum_end = np.bincount(Events[~is_start], Time[~is_start])
>>> durations = sum_end - sum_start
>>> durations
array([ 13., 5.])如果数据已经在Numpy数组中,这将比基于Python循环的方法快(~ 10倍)。如果您的数据还没有在Numpy数组中,它将只比循环快一点点(< 2x),因为遍历大Python列表比实际进行计数要慢。
import numpy as np
def evcount(events, details, time):
events = np.asarray(events)
details = np.asarray(details)
time = np.asarray(time)
is_start = (details == 'Start')
sum_start = np.bincount(events[is_start], time[is_start], minlength=nbins)
sum_end = np.bincount(events[~is_start], time[~is_start], minlength=nbins)
return sum_end - sum_start
def evcount2(events, details, time):
result = {}
for e, d, t in zip(events, details, time):
result.setdefault(e, {}).setdefault(d, []).append(t)
return result
n = 20000
nbins = 200
events_arr = np.random.randint(0, nbins, n)
events = events_arr.tolist()
times_arr = np.random.rand(n)
times = times_arr.tolist()
details_arr = np.array(['Start', 'End'])[np.random.randint(0, 2, n)]
details = details_arr.tolist()
def doit_numpy_list():
evcount(events, details, times)
def doit_numpy_arrays():
evcount(events_arr, details_arr, times_arr)
def doit_loop():
evcount2(events, details, times)和
In [34]: %timeit doit_numpy_list()
100 loops, best of 3: 4.03 ms per loop
In [35]: %timeit doit_numpy_arrays()
1000 loops, best of 3: 781 µs per loop
In [36]: %timeit doit_loop()
100 loops, best of 3: 6.18 ms per loophttps://stackoverflow.com/questions/25987504
复制相似问题