已加载运行时CuDNN库:8.0.5,但源代码是使用8.1.0编译的。使用Google CoLab时源代码、CuDNN、CoLab、Google

2023-09-03 08:10:32 作者:爱情不是曹操,说到就到

我试着关注 https://tensorflow-object-detection-api-tutorial.readthedocs.io/en/latest/training.html#training-the-model 在Google CoLab

一切顺利,构建了pycoTools,使用OBJECT_DETACTION/Packages/tf2/setup.py进行设置,使用object_detection/builders/model_builder_tf2_test.py,Create tfRecord进行测试,一切运行顺利,没有任何问题

ubuntu共存安装cudn10.0和cudn10.2并随时切换,安装cudnn7.6.5 pytorch1.7.0

但当培训开始时,总是失败

2021-11-24 04:51:47.954507: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-11-24 04:51:47.958479: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.

完整错误如下

2021-11-24 04:51:47.954507: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
2021-11-24 04:51:47.958479: E tensorflow/stream_executor/cuda/cuda_dnn.cc:362] Loaded runtime CuDNN library: 8.0.5 but source was compiled with: 8.1.0.  CuDNN library needs to have matching major version and equal or higher minor version. If using a binary install, upgrade your CuDNN library.  If building from sources, make sure the library loaded at runtime is compatible with the version specified during compile configuration.
Traceback (most recent call last):
  File "model_main_tf2.py", line 115, in <module>
    tf.compat.v1.app.run()
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/platform/app.py", line 40, in run
    _run(main=main, argv=argv, flags_parser=_parse_flags_tolerate_undef)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 303, in run
    _run_main(main, args)
  File "/usr/local/lib/python3.7/dist-packages/absl/app.py", line 251, in _run_main
    sys.exit(main(argv))
  File "model_main_tf2.py", line 112, in main
    record_summaries=FLAGS.record_summaries)
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 603, in train_loop
    train_input, unpad_groundtruth_tensors)
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 394, in load_fine_tune_checkpoint
    _ensure_model_is_built(model, input_dataset, unpad_groundtruth_tensors)
  File "/usr/local/lib/python3.7/dist-packages/object_detection/model_lib_v2.py", line 176, in _ensure_model_is_built
    labels,
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 1286, in run
    return self._extended.call_for_each_replica(fn, args=args, kwargs=kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/distribute_lib.py", line 2849, in call_for_each_replica
    return self._call_for_each_replica(fn, args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/mirrored_strategy.py", line 671, in _call_for_each_replica
    self._container_strategy(), fn, args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/distribute/mirrored_run.py", line 86, in call_for_each_replica
    return wrapped(args, kwargs)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 885, in __call__
    result = self._call(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/def_function.py", line 950, in _call
    return self._stateless_fn(*args, **kwds)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 3040, in __call__
    filtered_flat_args, captured_inputs=graph_function.captured_inputs)  # pylint: disable=protected-access
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 1964, in _call_flat
    ctx, args, cancellation_manager=cancellation_manager))
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/function.py", line 596, in call
    ctx=ctx)
  File "/usr/local/lib/python3.7/dist-packages/tensorflow/python/eager/execute.py", line 60, in quick_execute
    inputs, attrs, num_outputs)
tensorflow.python.framework.errors_impl.UnknownError: 2 root error(s) found.
  (0) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node model/conv1_conv/Conv2D (defined at /local/lib/python3.7/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py:1346) ]]
     [[Loss/RPNLoss/BalancedPositiveNegativeSampler_1/Cast_8/_588]]
  (1) Unknown:  Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above.
     [[node model/conv1_conv/Conv2D (defined at /local/lib/python3.7/dist-packages/object_detection/meta_architectures/faster_rcnn_meta_arch.py:1346) ]]
0 successful operations.
0 derived errors ignored. [Op:__inference__dummy_computation_fn_44910]

我一直在尝试使用较低版本的TensorFlow,如2.4.0,但问题仍然存在

推荐答案

我正在处理同样的问题,您需要在此处检查版本: https://www.tensorflow.org/install/source#gpu TensorFlow对象检测使用TensorFlow 2.6.0,因此您需要将cuDNN设置为8.1,而Colab运行时使用8.0.5。 我在这里解决了这个问题: https://developer.nvidia.com/cudnn 已注册并已下载

cudnn-11.2-linux-x64-v8.1.0.77.tgz
后来,我将其上载到我的驱动器中,并在挂载驱动器的情况下运行CoLab。在使用对象检测的CoLab笔记本中,我放置在第一个单元格中:

!tar -zvxf /content/drive/MyDrive/task/cudnn-11.2-linux-x64-v8.1.0.77.tgz

及更高版本

%%bash
cd cuda/include
sudo cp *.h /usr/local/cuda/include/

这解决了我的问题。