Configuring a Cross-Compilation Environment for TensorFlow on ARMv7 Chips
Since most PC Linux kernels are X86 or X64, while the target chip is ARMv7, directly compiled vertions cannot be used on the chip. Therefore, a cross-compilation environment needs to be configured.
Steps to install the cross-compilation environment:
-
Install Bazel Method One: Method Two:
sudo apt-get install openjdk-8-jdk- Add Bazel source:
sudo apt-get install curlecho "deb [arch=amd64] http://storage.googleapis.com/bazel-apt stable jdk1.8" | sudo tee /etc/apt/sources.list.d/bazel.listcurl https://bazel.build/bazel-release.pub.gpg | sudo apt-key add - - Install Bazel
sudo apt-get updatesudo apt-get install bazel(If the googleapi website fails during installation, try several times) - Upgrade Bazel
sudo apt-get upgrade bazel
-
Set up the cross-compilation toolchain Current cross-compilation SDK directory:
/home/dev/sysroots/x86_64-linuxAdd the following instructions at the end of/etc/bash.bashrc:
export ARCH=arm
export PATH=/home/dev/sysroots/x86_64-linux/bin/:$PATH
export CROSS_COMPILE=arm-poky-linux-gnueabi-
export CC=/home/dev/sysroots/x86_64-linux/bin/arm-poky-linux-gnueabi-gcc
export CXX=/home/dev/sysroots/x86_64-linux/bin/arm-poky-linux-gnueabi-g++
export LD=/home/dev/sysroots/x86_64-linux/bin/arm-poky-linux-gnueabi-ld
export AR=/home/dev/sysroots/x86_64-linux/bin/arm-poky-linux-gnueabi-ar
export AS=/home/dev/sysroots/x86_64-linux/bin/arm-poky-linux-gnueabi-as
export RANLIB=/home/dev/sysroots/x86_64-linux/bin/arm-poky-linux-gnueabi-ranlib
Restart the command line after modification for the changes to take effect.
Use the following command to confirm the cross-compilation chain is set:
echo $CC
If it displays /home/dev/sysroots/x86_64-linux/bin/arm-poky-linux-gnueabi-gcc, the cross-compilation chain is set.
When needing to switch back to native compilation, comment out the above instructions.
-
Prepare cross-compilation configuration The following Bazel configuration scripts need to be created and modified:
./WORKSPACE./arm_compiler../BUILD../build_armv7.sh../cross_toolchain_target_armv7.BUILD../CROSSTOOL
3.1 Modify WORKSPACE Add the following content to the TensorFlow root directory:
new_local_repository(
name ='toolchain_target_armv7', # cross-compiler alias
path ='/home/dev/sysroots/x86_64-linux', # cross-compiler path, adjustable
build_file = 'arm_compiler/cross_toolchain_target_armv7.BUILD' # cross-compiler description file
)
3.2 Create cross-compiler description file:
Create cross_toolchain_target_armv7.BUILD:
major_version: "local"
minor_version: ""
default_target_cpu: "armv7"
default_toolchain {
cpu: "armv7"
toolchain_identifier: "arm-poky-linux-gnueabi"
}
default_toolchain {
cpu: "k8"
toolchain_identifier: "local"
}
toolchain {
abi_version: "gcc"
abi_libc_version: "glibc_2.23"
builtin_sysroot: ""
compiler: "compiler"
host_system_name: "armv7"
needsPic: true
supports_gold_linker: false
supports_incremental_linker: false
supports_fission: false
supports_interface_shared_objects: false
supports_normalizing_ar: true
supports_start_end_lib: false
supports_thin_archives: true
target_libc: "glibc_2.23"
target_cpu: "armv7"
target_system_name: "armv7"
toolchain_identifier: "arm-poky-linux-gnueabi"
tool_path { name: "ar" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-ar" }
tool_path { name: "compat-ld" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-ld" }
tool_path { name: "cpp" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-cpp" }
tool_path { name: "dwp" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-dwp" }
tool_path { name: "gcc" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-gcc" }
tool_path { name: "gcov" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-gcov" }
tool_path { name: "ld" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-ld" }
tool_path { name: "nm" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-nm" }
tool_path { name: "objcopy" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-objcopy" }
objcopy_embed_flag: "-I"
objcopy_embed_flag: "binary"
tool_path { name: "objdump" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-objdump" }
tool_path { name: "strip" path: "/home/dev/sysroots/x86_64-linux/usr/bin/arm-poky-linux-gnueabi/arm-poky-linux-gnueabi-strip" }
compiler_flag: "-nostdinc"
compiler_flag: "-isystem"
compiler_flag: "/home/dev/sysroots/x86_64-linux/cortexa9hf/usr/include"
compiler_flag: "-isystem"
compiler_flag: "/home/dev/sysroots/x86_64-linux/usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/5.3.0/include"
compiler_flag: "-isystem"
compiler_flag: "/home/dev/sysroots/x86_64-linux/usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/5.3.0/include-fixed"
compiler_flag: "-isystem"
compiler_flag: "/home/dev/sysroots/x86_64-linux/cortexa9hf/usr/include/c++/5.3.0"
compiler_flag: "-isystem"
compiler_flag: "/home/dev/sysroots/x86_64-linux/cortexa9hf/usr/include/c++/5.3.0/arm-poky-linux-gnueabi"
cxx_flag: "-isystem"
cxx_flag: "/home/dev/sysroots/x86_64-linux/cortexa9hf/usr/include"
cxx_flag: "-isystem"
cxx_flag: "/home/dev/sysroots/x86_64-linux/usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/5.3.0/include"
cxx_flag: "-isystem"
cxx_flag: "/home/dev/sysroots/x86_64-linux/usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/5.3.0/include-fixed"
cxx_flag: "-isystem"
cxx_flag: "/home/dev/sysroots/x86_64-linux/cortexa9hf/usr/include/c++/5.3.0"
cxx_flag: "-isystem"
cxx_flag: "/home/dev/sysroots/x86_64-linux/cortexa9hf/usr/include/c++/5.3.0/arm-poky-linux-gnueabi"
cxx_flag: "-std=c++11"
cxx_builtin_include_directory: "/home/dev/sysroots/x86_64-linux/cortexa9hf/usr/include"
cxx_builtin_include_directory: "/home/dev/sysroots/x86_64-linux/usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/5.3.0/include"
cxx_builtin_include_directory: "/home/dev/sysroots/x86_64-linux/usr/lib/arm-poky-linux-gnueabi/gcc/arm-poky-linux-gnueabi/5.3.0/include-fixed"
cxx_builtin_include_directory: "/home/dev/sysroots/x86_64-linux/cortexa9hf/usr/include/c++/5.3.0"
cxx_builtin_include_directory: "/home/dev/sysroots/x86_64-linux/cortexa9hf/usr/include/c++/5.3.0/arm-poky-linux-gnueabi"
linker_flag: "-lstdc++"
linker_flag: "-L/home/dev/sysroots/x86_64-linux/lib"
linker_flag: "-L/home/dev/sysroots/x86_64-linux/usr/lib"
linker_flag: "-L/home/dev/sysroots/x86_64-linux/cortexa9hf/lib"
linker_flag: "-L/home/dev/sysroots/x86_64-linux/cortexa9hf/usr/lib"
linker_flag: "-Wl,--dynamic-linker=/lib/ld-linux-aarch64.so.1"
unfiltered_cxx_flag: "-no-canonical-prefixes"
linker_flag: "-no-canonical-prefixes"
unfiltered_cxx_flag: "-Wno-builtin-macro-redefined"
unfiltered_cxx_flag: "-D__DATE__=\"redacted\""
unfiltered_cxx_flag: "-D__TIMESTAMP__=\"redacted\""
unfiltered_cxx_flag: "-D__TIME__=\"redacted\""
compiler_flag: "-U_FORTIFY_SOURCE"
compiler_flag: "-fstack-protector"
compiler_flag: "-fPIE"
linker_flag: "-pie"
linker_flag: "-Wl,-z,relro,-z,now"
compiler_flag: "-fdiagnostics-color=always"
compiler_flag: "-Wall"
compiler_flag: "-Wunused-but-set-parameter"
compiler_flag: "-Wno-free-nonheap-object"
compiler_flag: "-fno-omit-frame-pointer"
linker_flag: "-Wl,--build-id=md5"
linker_flag: "-Wl,--hash-style=gnu"
compilation_mode_flags {
mode: DBG
compiler_flag: "-g"
}
compilation_mode_flags {
mode: OPT
compiler_flag: "-g0"
compiler_flag: "-O3"
compiler_flag: "-DNDEBUG -mcpu=cortex-a9 -mfpu=neon"
}
}
toolchain {
toolchain_identifier: "local"
abi_libc_version: "local"
abi_version: "local"
builtin_sysroot: ""
compiler: "compiler"
compiler_flag: "-U_FORTIFY_SOURCE"
compiler_flag: "-D_FORTIFY_SOURCE=2"
compiler_flag: "-fstack-protector"
compiler_flag: "-Wall"
compiler_flag: "-Wl,-z,-relro,-z,now"
compiler_flag: "-B/usr/bin"
compiler_flag: "-B/usr/bin"
compiler_flag: "-Wunused-but-set-parameter"
compiler_flag: "-Wno-free-nonheap-object"
compiler_flag: "-fno-omit-frame-pointer"
compiler_flag: "-isystem"
compiler_flag: "/usr/include"
cxx_builtin_include_directory: "/usr/include/c++/5.4.0"
cxx_builtin_include_directory: "/usr/include/c++/5"
cxx_builtin_include_directory: "/usr/lib/gcc/x86_64-linux-gnu/5/include"
cxx_builtin_include_directory: "/usr/include/x86_64-linux-gnu/c++/5.4.0"
cxx_builtin_include_directory: "/usr/include/c++/5.4.0/backward"
cxx_builtin_include_directory: "/usr/lib/gcc/x86_64-linux-gnu/5.4.0/include"
cxx_builtin_include_directory: "/usr/local/include"
cxx_builtin_include_directory: "/usr/lib/gcc/x86_64-linux-gnu/5.4.0/include-fixed"
cxx_builtin_include_directory: "/usr/lib/gcc/x86_64-linux-gnu/5/include-fixed"
cxx_builtin_include_directory: "/usr/include/x86_64-linux-gnu"
cxx_builtin_include_directory: "/usr/include"
cxx_flag: "-std=c++11"
host_system_name: "local"
linker_flag: "-lstdc++"
linker_flag: "-lm"
linker_flag: "-Wl,-no-as-needed"
linker_flag: "-B/usr/bin"
linker_flag: "-B/usr/bin"
linker_flag: "-pass-exit-codes"
needsPic: true
objcopy_embed_flag: "-I"
objcopy_embed_flag: "binary"
supports_fission: false
supports_gold_linker: false
supports_incremental_linker: false
supports_interface_shared_objects: false
supports_normalizing_ar: false
supports_start_end_lib: false
supports_thin_archives: false
target_cpu: "k8"
target_libc: "local"
target_system_name: "local"
unfiltered_cxx_flag: "-fno-canonical-system-headers"
unfiltered_cxx_flag: "-Wno-builtin-macro-redefined"
unfiltered_cxx_flag: "-D__DATE__=\"redacted\""
unfiltered_cxx_flag: "-D__TIMESTAMP__=\"redacted\""
unfiltered_cxx_flag: "-D__TIME__=\"redacted\""
tool_path {name: "ar" path: "/usr/bin/ar" }
tool_path {name: "cpp" path: "/usr/bin/cpp" }
tool_path {name: "dwp" path: "/usr/bin/dwp" }
tool_path {name: "gcc" path: "/usr/bin/gcc" }
tool_path {name: "gcov" path: "/usr/bin/gcov" }
tool_path {name: "ld" path: "/usr/bin/ld" }
tool_path {name: "nm" path: "/usr/bin/nm" }
tool_path {name: "objcopy" path: "/usr/bin/objcopy" }
tool_path {name: "objdump" path: "/usr/bin/objdump" }
tool_path {name: "strip" path: "/usr/bin/strip" }
compilation_mode_flags {
mode: DBG
compiler_flag: "-g"
}
compilation_mode_flags {
mode: OPT
compiler_flag: "-g0"
compiler_flag: "-O3"
compiler_flag: "-DNDEBUG"
compiler_flag: "-ffunction-sections"
compiler_flag: "-fdata-sections"
linker_flag: "-Wl,--gc-sections"
}
linking_mode_flags { mode: DYNAMIC }
}
3.4 Create BUILD
package(default_visibility = ["//visibility:public"])
cc_toolchain_suite(
name = "toolchain",
toolchains = {
"armv7|compiler": ":cc-compiler-armv7",
"k8|compiler": ":cc-compiler-local",
},
)
filegroup(
name = "empty",
srcs = [],
)
filegroup(
name = "arm_linux_all_files",
srcs = [
"@toolchain_target_armv7//:compiler_pieces",
],
)
cc_toolchain(
name = "cc-compiler-local",
all_files = ":empty",
compiler_files = ":empty",
cpu = "local",
dwp_files = ":empty",
dynamic_runtime_libs = [":empty"],
linker_files = ":empty",
objcopy_files = ":empty",
static_runtime_libs = [":empty"],
strip_files = ":empty",
supports_param_files = 1,
)
cc_toolchain(
name = "cc-compiler-armv7",
all_files = ":arm_linux_all_files",
compiler_files = ":arm_linux_all_files",
cpu = "armv7",
dwp_files = ":empty",
dynamic_runtime_libs = [":empty"],
linker_files = ":arm_linux_all_files",
objcopy_files = "arm_linux_all_files",
static_runtime_libs = [":empty"],
strip_files = "arm_linux_all_files",
supports_param_files = 1,
)
3.5 Add nsync cross-compilation support
The official nsync cross-compiler may differ, requiring modifications to nsync's build settings in ~/.cache/bazel/<random_dir>/external/nsync/BUILD:
3.5.1 List directories: ls ~/.cache/bazel/ (finds bazel_<user>)
3.5.2 Navigate inside and find the latest random folder by timestamp.
3.5.3 Example folder: 7924169126bef9c95805dc831e19e9c3. Enter external/nsync/BUILD and make these additions:
Add config_setting:
config_setting(
name = "armv7",
values = {"cpu": "armeabi-v7a"},
)
config_setting(
name = "armv8",
values = {"cpu": "arm64-v8a"},
)
In NSYNC_OPTS, extend the select blocks to include ":armv7" and ":armv8".
In NSYNC_SRC_PLATFORM, add:
":armv7": NSYNC_SRC_LINUX,
":armv8": NSYNC_SRC_LINUX,
Similarly update NSYNC_TEST_SRC_PLATFORM.
-
Run the configure script
cd tensorflow && ./configureNote:- Specify Python libray location.
- Optimization flags: default
-march=nativeshould be changed for cross-compilation, e.g.,-march=armv7-a. - Select No for all other options.
-
Shell script:
build_armv7.sh(cross-compile the main TensorFlow part):bazel build --copt="-fPIC" --copt="-march=armv7-a" --cxxopt="-fPIC" --cxxopt="-march=armv7-a" --verbose_failures --crosstool_top=//arm_compiler:toolchain --cpu=armv7 --config=opt tensorflow/examples/label_image/...You can set
--jobsto specify compilation threads; default usesCPU cores × 2. Various errors might occur during compilation; refer to TensorFlow cross-compilation error collections for solutions. After compilation, target files are generated:bazel-bin/tensorflow/libtensorflow_framework.sobazel-bin/tensorflow/examples/label_image/label_image
-
Cross-compile TensorFlow Lite Reference:
tensorflow/contrib/lite/g3doc/rpi.mdBefore compiling, modify multi-threading code (default threads=4):
const Eigen::ThreadPoolDevice& GetThreadPoolDevice() {
int thread_count = 1;
const char *val = getenv("OMP_NUM_THREADS");
if (val != nullptr) {
thread_count = atoi(val);
}
static Eigen::ThreadPool* tp = new Eigen::ThreadPool(thread_count);
static EigenThreadPoolWrapper* thread_pool_wrapper =
new EigenThreadPoolWrapper(tp);
static Eigen::ThreadPoolDevice* device =
new Eigen::ThreadPoolDevice(thread_pool_wrapper, thread_count);
return *device;
}
Step 1:
./tensorflow/contrib/lite/download_dependencies.sh (Run once if successful)
Step 2:
Based on ./tensorflow/contrib/lite/build_rpi_lib.sh, create an iMX6 script build_imx6_lib.sh:
set -e
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
cd "$SCRIPT_DIR/../../.."
CC_PREFIX=arm-poky-linux-gnueabi- make -j 3 -f tensorflow/contrib/lite/Makefile TARGET=imx6 TARGET_ARCH=armv7a
Step 3:
Copy the downloaded FlatBuffers into ./tensorflow/contrib/lite/schema.
On success, generated targets:
tensorflow/contrib/lite/gen/lib/rpi_armv7/libtensorflow-lite.atensorflow/contrib/lite/gen/bin/rpi_armv7/benchmark_model
-
Cross-compile TensorFlow Lite label_image test tool:
bazel build --copt="-fPIC" --copt="-march=armv7-a" --cxxopt="-fPIC" --cxxopt="-march=armv7-a" --verbose_failures --crosstool_top=//arm_compiler:toolchain --cpu=armv7 --config=opt //tensorflow/contrib/lite/examples/label_image:label_image -
Native compile TOCO model conversion tool TOCO tool runs on X86 machines. To minimize the runtime environment, use FlatBuffers; convert models with TOCO before testing:
bazel build tensorflow/contrib/lite/toco:toco(disable cross-compilation options) Generated target:bazel-bin/tensorflow/contrib/lite/toco/toco(keep it in place for execusion)
TensorFlow Lite cross-compilation environment setup complete.