Resolving CUDA_ILLEGAL_INSTRUCTION and Event Polling Errors with tf.one_hot on Windows GPU
Exceuting standard TensorFlow operations on a Windows system equipped with an NVIDIA GPU can trigger specific runtime failrues. A common scenario involves the following crash logs:
2019-04-02 09:50:47.986024: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1356] Found device 0 with properties:
name: GeForce GTX 1080 Ti major: 6 minor: 1 memoryClockRate(GHz): 1.582
pciBusID: 0000:01:00.0
totalMemory: 11.00GiB freeMemory: 9.10GiB
2019-04-02 09:50:47.991931: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1435] Adding visible gpu devices: 0
2019-04-02 09:50:48.667536: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:923] Device interconnect StreamExecutor with strength 1 edge matrix:
2019-04-02 09:50:48.672794: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:929] 0
2019-04-02 09:50:48.675436: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:942] 0: N
2019-04-02 09:50:48.678921: I C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_device.cc:1053] Created TensorFlow device (/job:localhost/replica:0/task:0/device:GPU:0 with 8795 MB memory) -> physical GPU (device: 0, name: GeForce GTX 1080 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)
2019-04-02 09:50:51.208473: E C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_ILLEGAL_INSTRUCTION
2019-04-02 09:50:51.213582: F C:\users\nwani\_bazel_nwani\swultrt5\execroot\org_tensorflow\tensorflow\core\common_runtime\gpu\gpu_event_mgr.cc:208] Unexpected Event status: 1
This issue is often linked to a bug within the Windows GPU build of TensorFlow when processing the tf.one_hot operation. Similar stack traces involving CUDA_ERROR_LAUNCH_FAILED usually point to the same root cause:
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_driver.cc:1177] could not synchronize on CUDA context: CUDA_ERROR_LAUNCH_FAILED :: No stack trace available
E c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\stream_executor\cuda\cuda_event.cc:49] Error polling for event status: failed to query event: CUDA_ERROR_LAUNCH_FAILED
F c:\tf_jenkins\home\workspace\release-win\device\gpu\os\windows\tensorflow\core\common_runtime\gpu\gpu_util.cc:370] GPU sync failed
To bypass the illegal instruction error without changing the library version, explicitly force the encoding operation to execute on the CPU. While this may introduce a slight performance overhead, it stabilizes the execution.
import tensorflow as tf
input_labels = tf.constant([0, 2, 1, 3])
# Scope the one-hot encoding to the CPU to avoid the Windows GPU driver crash
with tf.device('/cpu:0'):
encoded_labels = tf.one_hot(indices=input_labels, depth=123)