我想从我上传到GitHub的压缩图像中为我的python笔记本创建一个数据集。我遵循了我看到的步骤,但当我运行该命令时,它抛出了一个错误。
这是我正在运行的命令
!tfds build human_dataset
这就是我得到的错误
2021-02-26 15:02:11.312748: I tensorflow/stream_executor/platform/default/dso_loader.cc:49] Successfully opened dynamic library libcudart.so.10.1
INFO[build.py]: Loading dataset human_dataset from path: /content/drive/MyDrive/DeepPerceptionLearning/human_dataset/human_dataset.py
2021-02-26 15:02:13.692513: I tensorflow/core/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2021-02-26 15:02:13.880000: I tensorflow/core/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
INFO[build.py]: download_and_prepare for dataset human_dataset/1.0.0...
INFO[dataset_builder.py]: Generating dataset human_dataset (/root/tensorflow_datasets/human_dataset/1.0.0)
Downloading and preparing dataset Unknown size (download: Unknown size, generated: Unknown size, total: Unknown size) to /root/tensorflow_datasets/human_dataset/1.0.0...
2021-02-26 15:02:14.132647: I tensorflow/core/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
2021-02-26 15:02:14.329230: I tensorflow/core/platform/cloud/google_auth_provider.cc:180] Attempting an empty bearer token since no token was retrieved from files, and GCE metadata check was skipped.
Dl Completed...: 0 url [00:00, ? url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
INFO[download_manager.py]: Skipping download of https://github.com/egjlmn1/DeepPerceptionLearning/archive/master/humans.zip: File cached in /root/tensorflow_datasets/downloads/egjlmn1_DeepPercepti_archive_master_humansqcMvXg2sLJ3_V9MFSR8_gIF2ERgzGdA9jeVBzyg_kvY.zip
Dl Completed...: 0 url [00:00, ? url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
INFO[download_manager.py]: Reusing extraction of /root/tensorflow_datasets/downloads/egjlmn1_DeepPercepti_archive_master_humansqcMvXg2sLJ3_V9MFSR8_gIF2ERgzGdA9jeVBzyg_kvY.zip at /root/tensorflow_datasets/downloads/extracted/ZIP.egjlmn1_DeepPercepti_archive_master_humansqcMvXg2sLJ3_V9MFSR8_gIF2ERgzGdA9jeVBzyg_kvY.zip.
Dl Completed...: 0 url [00:00, ? url/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
Extraction completed...: 0 file [00:00, ? file/s]
Extraction completed...: 0 file [00:00, ? file/s]
Dl Size...: 0 MiB [00:00, ? MiB/s]
Dl Completed...: 0 url [00:00, ? url/s]
Generating splits...: 0% 0/2 [00:00
sys.exit(launch_cli())
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/scripts/cli/main.py", line 126, in launch_cli
app.run(main, flags_parser=_parse_flags)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 300, in run
_run_main(main, args)
File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
sys.exit(main(argv))
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/scripts/cli/main.py", line 121, in main
args.subparser_fn(args)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/scripts/cli/build.py", line 199, in _build_datasets
_download_and_prepare(args, builder)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/scripts/cli/build.py", line 357, in _download_and_prepare
download_config=dl_config,
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/dataset_builder.py", line 452, in download_and_prepare
download_config=download_config,
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/dataset_builder.py", line 1187, in _download_and_prepare
leave=False,
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/dataset_builder.py", line 1182, in
for split_name, generator
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/split_builder.py", line 295, in submit_split_generation
return self._build_from_generator(**build_kwargs)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/split_builder.py", line 366, in _build_from_generator
shard_lengths, total_size = writer.finalize()
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/tfrecords_writer.py", line 222, in finalize
self._shuffler.bucket_lengths, self._path)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/tfrecords_writer.py", line 95, in _get_shard_specs
shard_boundaries = _get_shard_boundaries(num_examples, num_shards)
File "/usr/local/lib/python3.7/dist-packages/tensorflow_datasets/core/tfrecords_writer.py", line 118, in _get_shard_boundaries
raise AssertionError("No examples were yielded.")
AssertionError: No examples were yielded.
错误似乎是什么?它说没有生成示例,但在压缩包中有我正在提取的图像。而且,它说它正在跳过下载,因为数据已经被缓存了,也许这就是问题所在,但是我如何迫使他重新下载数据呢?
发布于 2021-03-01 09:25:59
修好了。我所做的是将压缩文件放在创建数据集的同一目录中,而不是从URL获取数据集,而只是从本地路径提取数据集。
发布于 2022-02-17 11:36:21
你好 我也是用tfds 來創建自家数據集
也是遇到你的問題
我的數據集是放在本地文件夾中,是png 沒有zip 的
請問这会有問題?
class Grass(tfds.core.GeneratorBasedBuilder):
"""DatasetBuilder for face_grass dataset."""
VERSION = tfds.core.Version('1.0.0')
RELEASE_NOTES = {
'1.0.0': 'Initial release.',
}
def _info(self) -> tfds.core.DatasetInfo:
"""Returns the dataset metadata."""
# TODO(grass): Specifies the tfds.core.DatasetInfo object
return tfds.core.DatasetInfo(
builder=self,
description=_DESCRIPTION,
features=tfds.features.FeaturesDict({
"lr": tfds.features.Image(),
"hr": tfds.features.Image(),
}),
supervised_keys=("lr", "hr"),
# If there's a common (input, target) tuple from the
# features, specify them here. They'll be used if
# `as_supervised=True` in `builder.as_dataset`.
# supervised_keys=('image', 'label'), # Set to `None` to disable
homepage='https://dataset-homepage/',
citation=_CITATION,
)
def _split_generators(self, dl_manager: tfds.download.DownloadManager):
"""Returns SplitGenerators."""
# TODO(face_grass): Downloads the data and defines the splits
#path = dl_manager.download_and_extract('https://todo-data-url')
# TODO(face_grass): Returns the Dict[split names, Iterator[Key, Example]]
return [
tfds.core.SplitGenerator(
name=tfds.Split.TRAIN,
gen_kwargs={
"lr_path":
"../data1/project/grass/train/LR",
"hr_path":
"../data1/project/grass/train/HR",
}),
tfds.core.SplitGenerator(
name=tfds.Split.VALIDATION,
gen_kwargs={
"lr_path":
"../data1/project/grass/valid/LR",
"hr_path":
"../data1/project/grass/valid/HR",
}),
]
def _generate_examples(self, lr_path, hr_path):
"""Yields examples."""
# TODO(face_grass): Yields (key, example) tuples from the dataset
for root, _, files in tf.io.gfile.walk(lr_path):
for file_path in files:
# Select only png files.
if file_path.endswith(".png"):
yield file_path, {
"lr": os.path.join(root, file_path),
"hr": os.path.join(hr_path, file_path)
}
https://stackoverflow.com/questions/66388210
复制相似问题