编程笔记

lifelong learning & practice makes perfect

cuDNN failed to initialize

  • 问题

    1
    2
    3
    Failed to get convolution algorithm. This is probably because cuDNN
    failed to initialize, so try looking to see if a warning log message was
    printed above.
  • 描述

    1
    2
    3
    4
    5
    6
    Ubuntu
    anaconda
    jupyter
    tensorflow-gpu 1.14.0
    cudnn 7.6.0
    RTX2060 6G

    在jupyter中训练CNN时报错,不使用卷积神经网络都能正常工作。

  • 解决

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    You're out of memory

    The error can also show up if you run out of graphics card RAM. With an nvidia GPU you can check graphics card memory usage with nvidia-smi. This will give you not only a readout of how much GPU RAM you have in use (something like 6025MiB / 6086MiB if you're almost at the limit) as well as a list of what processes are using GPU RAM.

    If you've run out of RAM, you'll need to restart the process (which should free up the RAM) and then take a less memory-intensive approach. A few options are:
    reducing your batch size
    using a simpler model
    using less data
    //减少gpu内存使用
    limit TensorFlow GPU memory fraction: For example, the following will make sure TensorFlow uses <= 90% of your RAM:

    import keras
    import tensorflow as tf

    config = tf.ConfigProto()
    config.gpu_options.per_process_gpu_memory_fraction = 0.9
    keras.backend.tensorflow_backend.set_session(tf.Session(config=config))

    This will slow down your model evaluation if not used together with the items above, presumably since the large data set will have to be swapped in and out to fit into the small amount of memory you've allocated.

    在stackoverflow上找到一个合理的解释,GPU内存不够了,在电脑里用的独显显示,桌面显示、跑模型全用的RTX2060显卡导致内存不足。在NVIDIA控制面板里改为使用集显优先,把独显内存全用来训练或者在程序中设置使用的内存量。

  • 引用

stackoverflow问题: https://stackoverflow.com/questions/53698035/failed-to-get-convolution-algorithm-this-is-probably-because-cudnn-failed-to-in

​ Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed above

欢迎关注我的其它发布渠道