文章/答案/技术大牛

发布

社区首页 >问答首页 >Ray GPU2.0 RecursionError

问Ray GPU2.0 RecursionError
EN

Stack Overflow用户

提问于 2019-09-02 02:47:18

回答 2查看 976关注 0票数 2

系统信息

操作系统平台和发行版(例如Linux Ubuntu 16.04)：Ubuntu 18.04

安装自(源或二进制)的射线:二进制

射线版本: 0.7.3

Python版本: 3.7

Tensorflow版本: tensorflow-gpu 2.0.0rc0

精确的复制命令：

# Importing packages
from time import time
import gym
import tensorflow as tf
import ray

# Creating our initial model    
model = tf.keras.Sequential([
        tf.keras.layers.Dense(64, input_shape=(24,), activation='relu'),
        tf.keras.layers.Dense(4, activation='softmax')
        ])

# Setting parameters
episodes = 64
env_name = 'BipedalWalker-v2'

# Initializing ray
ray.init(num_cpus=8, num_gpus=1)

# Creating our ray function
@ray.remote
def play(weights):
    actor = tf.keras.Sequential([
        tf.keras.layers.Dense(64, input_shape=(24,), activation='relu'),
        tf.keras.layers.Dense(4, activation='softmax')
        ])
    actor = actor.set_weights(weights)
    env = gym.make('BipedalWalker-v2').env
    env._max_episode_steps=1e20
    obs = env.reset()
    for _ in range(1200):
        action = actor.predict_classes(obs).flatten()[0]
        action = env.action_space.sample()
        obs, rt, done, info = env.step(action)
    return rt

# Testing ray
start = time()
weights = model.get_weights()
weights = ray.put(weights)
results = ray.get([play.remote(weights) for i in range(episodes)])
ray.shutdown()
print('Ray done after:',time()-start)

描述问题

我正试图使用Tensorflow 2.0GPUKeras演员来并行化OpenAI健身房环境的推出。每次我尝试使用@ray.remote实例化Keras模型时，都会引发递归深度达到的错误。我遵循Ray概述的文档，其中建议传递权重而不是模型。我不知道我做错了什么，有什么想法吗？

源代码/日志

文件"/home/jacob/anaconda3/envs/tf-2.0-gpu/lib/python3.7/site-packages/tensorflow/init.py"，第50行，在getattr模块= self._load()中

文件"/home/jacob/anaconda3/envs/tf-2.0-gpu/lib/python3.7/site-packages/tensorflow/init.py"，第44行，在_load模块= _importlib.import_module(self.name)中

RecursionError:超过最大递归深度

tf.keras

ray

tensorflow2.0

回答 2

Stack Overflow用户

回答已采纳

发布于 2019-09-03 21:32:36

请参阅GitHub对此问题的响应：https://github.com/ray-project/ray/issues/5614

需要做的就是在函数定义中导入tensorflow：

@ray.remote
def play(weights):
    import tensorflow as tf
    actor = tf.keras.Sequential([
        tf.keras.layers.Dense(64, input_shape=(24,), activation='relu'),
        tf.keras.layers.Dense(4, activation='softmax')
        ])
    actor.set_weights(weights)
    env = gym.make('BipedalWalker-v2').env
    env._max_episode_steps=1e20
    obs = env.reset()
    for _ in range(1200):
        action = actor.predict_classes(np.array([obs])).flatten()[0]
        action = env.action_space.sample()
        obs, rt, done, info = env.step(action)
    return rt

票数 1

Stack Overflow用户

发布于 2019-09-02 17:51:34

核心问题似乎是Cloud泡菜( Ray用来序列化远程函数并将它们发送给工作进程)无法对tf.keras.Sequential类进行分类。例如，我可以如下所示：

import cloudpickle  # cloudpickle.__version__ == '1.2.1'
import tensorflow as tf  # tf.__version__ == '2.0.0-rc0'

def f():
    tf.keras.Sequential

cloudpickle.loads(cloudpickle.dumps(f))  # This fails.

最后一行失败

---------------------------------------------------------------------------
RecursionError                            Traceback (most recent call last)
<ipython-input-23-25cc307e6227> in <module>
----> 1 cloudpickle.loads(cloudpickle.dumps(f))

~/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py in __getattr__(self, item)
     48 
     49   def __getattr__(self, item):
---> 50     module = self._load()
     51     return getattr(module, item)
     52 

~/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py in _load(self)
     42   def _load(self):
     43     """Import the target module and insert it into the parent's namespace."""
---> 44     module = _importlib.import_module(self.__name__)
     45     self._parent_module_globals[self._local_name] = module
     46     self.__dict__.update(module.__dict__)

... last 2 frames repeated, from the frame below ...

~/anaconda3/lib/python3.6/site-packages/tensorflow/__init__.py in __getattr__(self, item)
     48 
     49   def __getattr__(self, item):
---> 50     module = self._load()
     51     return getattr(module, item)
     52 

RecursionError: maximum recursion depth exceeded while calling a Python object

有趣的是，这个继承了和tensorflow==1.14.0，但是我想keras在2.0中已经改变了很多。

解决办法

作为的解决方案，您可以尝试在一个单独的模块或文件中定义f，如

# helper_file.py

import tensorflow as tf

def f():
    tf.keras.Sequential

然后在您的主脚本中使用它，如下所示。

import helper_file
import ray

ray.init(num_cpus=1)

@ray.remote
def use_f():
    helper_file.f()

ray.get(use_f.remote())

这里的不同之处在于，当Cloud泡菜试图序列化use_f时，它实际上不会查看helper_file的内容。当某些辅助进程试图反序列化use_f时，该辅助进程将导入helper_file。这种额外的间接作用似乎会使云泡菜更可靠地工作。这与使用tensorflow或任何库对函数进行筛选时发生的情况相同。Cloudpickle不序列化整个库，它只是告诉反序列化过程导入相关的库。

注释：要在多台机器上工作，helper_file.py必须存在并位于每台机器上的helper_file.py路径上(实现这一目的的一种方法是将它作为一个模块安装在每台机器上)。

我验证了这似乎解决了您的例子中的问题。做了那个修复后，我遇到了

  File "<ipython-input-4-bb51dc74442c>", line 3, in play
  File "/Users/rkn/Workspace/ray/helper_file.py", line 15, in play
    action = actor.predict_classes(obs).flatten()[0]
AttributeError: 'NoneType' object has no attribute 'predict_classes'

但这似乎是另一个问题。

票数 1

页面原文内容由Stack Overflow提供。腾讯云小微IT领域专用引擎提供翻译支持

原文链接：

https://stackoverflow.com/questions/57750920

复制

相似问题

问Ray GPU2.0 RecursionError
EN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Ray GPU2.0 RecursionErrorEN

回答 2

Stack Overflow用户

Stack Overflow用户

社区

活动

圈层

关于

腾讯云开发者

热门产品

热门推荐

更多推荐

问Ray GPU2.0 RecursionError
EN