我把一些东西放在一起,我想展示一个球队最长的连胜记录,以及这个特别的连胜的开始和结束日期。例如,如果我有以下两个列表:
streak = ["W", "W", "W", "L","W", "W", "W", "W", "W", "L"]
dates = ["2016-06-15", "2016-06-14", "2016-06-13", "2016-06-10", "2016-06-09", "2016-06-08", "2016-06-05", "2016-06-03", "2016-06-02", "2016-06-02"]
如果我想得到最长的条纹,我可以做这样的事情:
from itertools import groupby
longest = sorted([list(y) for x, y in groupby(streak)], key = len)[-1]
print longest
["W", "W", "W", "W", "W"]
现在,我的想法是(让我知道是否可以做得更好)以某种方式获得这个最长的连列的开始和结束指数,所以在本例中:
start, end = get_indices(streak, longest) # 8, 4
print "The longest streak of {}{} was from {} to {}.".format(len(longest), longest[0], dates[start], dates[end])
"The longest streak of 5W was from 2016-06-02 to 2016-06-09.
我该怎么做?或者,是否有更好的方法来做到这一点,例如,将列表拉链在一起并做一些事情呢?
发布于 2016-06-16 05:06:40
考虑到您的代码,您仍然可以继续使用itertools
并使用不合格的takewhile
。
from itertools import takewhile, groupby
import itertools
L = [list(y) for x, y in groupby(streak)]
l = sorted(L, key=len)[-1]
ix = len(list(itertools.chain.from_iterable(takewhile(lambda x: x!=l, L))))
print("the longest streak goes from " + dates[ix+len(l)] + " to " + dates[ix])
#the longest streak goes from 2016-06-02 to 2016-06-09
发布于 2016-06-16 07:02:55
减少临时时间的替代解决方案(但请注意,除非RAM受到严重限制或产生不合理的巨大条纹,否则产生临时线程的速度要比最小的临时选择更快)。没有必要,只是举例说明将迭代器相关工具组合在一起以获得相同结果的其他方法:
from itertools import groupby, tee, zip_longest
from operator import itemgetter, sub
def longeststreak(streaks, dates):
# Create parallel iterators over the first index of each new group
s, e = tee(map(next, map(itemgetter(1), groupby(range(len(streaks)), key=streaks.__getitem__))))
# Advance end iterator so we can zip at offset to create start/end index pairs
next(e, None)
# Find greatest difference between start and end
longend, longstart = max(zip_longest(e, s, fillvalue=len(streaks)), key=lambda es: sub(*es))
# return dates for those indices (must subtract one from end since end index is exclusive)
return dates[longend-1], dates[longstart]
或者另一种方法:
from collections import deque
from itertools import groupby
from operator import itemgetter, sub
def longeststreak(streaks, dates):
# Generator of grouped indices for each streak
streakgroups = map(itemgetter(1), groupby(range(len(streaks)), streaks.__getitem__))
# Get first and last index of each streak without storing intermediate indices
streakranges = ((next(iter(deque(g, 1)), start), start) for g in streakgroups for start in (next(g),))
# As before, find greatest difference and return range
longend, longstart = max(streakranges, key=lambda es: sub(*es))
# End index is inclusive in this design, so don't subtract 1
return dates[longend], dates[longstart]
在这两种情况下,如果在Py2上,您将需要从future_builtins
导入map
,而对于前者,则使用izip_longest
。
此外,为了完整起见,优化了Colonel Beauvel's answer版本,以最小化字节代码执行(在CPython中缓慢),以支持更多的C级执行(CPython中的fast):
def longeststreak(streaks, dates):
# Use map with C-level builtins to reduce bytecode use
streakgroups = list(map(list, map(itemgetter(1), groupby(streaks))))
# Use max with key instead of sorted followed by indexing at -1, to turn
# O(n log n) work into O(n) work
longeststreak = max(streakgroups, key=len)
# Replace lambda with C layer built-in comparator
ix = len(list(chain.from_iterable(takewhile(longeststreak.__ne__, streakgroups))))
# Added -1 missing in original answer; end index should be exclusive,
# so we need to subtract 1; not noticeable on sample data because sample
# data had same data at end of longest streak and beginning of next
return dates[ix+len(longeststreak)-1], dates[ix]
作为记录,由于试图避免创建包含整个条纹作为一个组的list
/tuple
所涉及的各种开销,当我们只需要开始和结束时,我的两个备选解决方案在基本上所有真实世界的数据上运行得更慢;在ipython3
(Python3.5x86-64适用于Linux)上涉及随机条纹长度的测试用例,总共有450 K条目,在我的机器上用了大约35 ms的时间来处理上校的答案的优化版本,用我的第一种解决方案-- tee
--使用了大约50 ms,用了第二种解决方案-- deque
,用了77 ms。
https://stackoverflow.com/questions/37859623
复制相似问题