Running TensorFlow Lite for Microcontrollers on Contiki-NG

A guide for running Deep Learning models on Nordic Semiconductors nRF52 and nRF53 series devices

Background

TensorFlow Lite for Microcontrollers (TFLM)

TFLM is a framework that allows to execute a subset of TensorFlow Machine Learning / Deep Learning models on microcontrollers. A microcontroller is a device with very limited computational capacities, typically running a real-time operating system and having no virtual memory or extended disk storage available. TFLM allows machine learning engineers to bypass these challenges, as it offers a C++ library for run-time interpretation and inference of .tflite models. The library can be linked with many existing OS, for example Zephyr and ARM mBed.

Contiki-NG

Contiki is one of the original sensor network operating systems, started in the early 2000-s by Adam Dunkels. It has one of the smallest TCP/IP stacks in existence and to this day is actively used in research. Contiki-NG is the Next Generation version of Contiki, a fork of the original system and is currently actively developed. Contiki-NG keeps the good things of the Contiki, including the powerful network stack (IEEE 802.15.4, 6LoWPAN, TSCH / 6TiSCH, RPL and TCP/IP protocols) and the lock-free based cooperative threading model (“protothreads”). The added focus of Contiki-NG is next-generation devices and standard protocols, with the aim to serve the needs of on modern Internet of Things (IoT) applications.

Applications

Microcontrollers is a new research frontier for Deep Learning models. Some of the cool things that people have done on these devices so far include:

The workflow of deploying TensorFlow models on microcontrollers

Setting up the dependencies

TensorFlow Lite for Microcontrollers (TFLM)

Since June 2021 TFLM has it’s own repository at GitHub. To get it:

git clone https://github.com/tensorflow/tflite-micro.git
cd tflite-micro
git checkout master

Contiki-NG

The code of this tutorial is in the author’s personal fork of Contiki-NG. The repository with the code is based on Contiki-NG release version 4.7. You can get it by cloning the repository (the example is in the mainbranch):

git clone https://github.com/atiselsts/contiki-ng-feature-project.git
cd contiki-ng-feature-project
git submodule update --init --recursive # get required libraries

Target hardware

We’ll be using Nordic Semiconductors nRF series devices for the demo example. The nRF52840 is based on Cortex-M4F System-on-Chip, and the more recent nRF5340 on Cortex-M33. The information in this article should be easy to apply to other ARM Cortex-M based platforms as well, as long as the platforms are supported by Contiki / Contiki-NG. One thing that you’re going to need to follow this article is the GCC toolchain for the target architecture; that is, arm-none-eabi-gcc and friends should be installed in your `$PATH`.

Target application

We’re just going to reuse the “Hello World” example from the TFLM repository. The example aims to predict a simple sin() function using a .tflite model.

The C++ code of this example was ported to Contiki-NG — it was slightly modified and simplified to make it compile for the target OS. The alternative way would have been to add an nRF platform with Contiki-NG OS support to the TFLM codebase. However, adding new target platforms to TFLM is nontrivial and is expected to change in the future.

Building an example

TensorFlow Lite for Microcontrollers (TFLM)

Go to the TFLM top level directory and build the `microlite` target for the target microcontroller architecture:

make -f tensorflow/lite/micro/tools/make/Makefile TARGET=cortex_m_generic TARGET_ARCH=cortex-m4+fp OPTIMIZED_KERNEL_DIR=cmsis_nn microlite

Let’s break down the command in details:

Expected result:

...
tensorflow/lite/micro/tools/make/downloads/gcc_embedded/bin/arm-none-eabi-ar: creating tensorflow/lite/micro/tools/make/gen/cortex_m_generic_cortex-m4+fp_default/lib/libtensorflow-microlite.a

Contiki-NG

In order to start using Contiki-NG, we recommend to get the Docker image provided by the maintainers.

Once you have got the right branch of the OS, go to the tensorflow-lite-micro example directory and create a link to the microlite library created in the first step. The exact path depends on your folder structure — change it!

cd examples/tensorflow-lite-micro/
ln -s /home/atis/source/tflite-micro/tensorflow/lite/micro/tools/make/gen/cortex_m_generic_cortex-m4+fp_default/lib/libtensorflow-microlite.a

Now also edit the Makefile to point to the correct TFLM directory, by changing the second line:

# Change this to your Tensorflow Lite Micro directory:
TFLM=/home/pi/tflite-micro

After that, build the example normally for the target:

make TARGET=nrf BOARD=nrf52840/dk -j

TARGET=nrf selects the Contiki-NG hardware platform that supports several nRF devices, and BOARD=nrf52840/dk selects the nRF52840 development kit (PCA10056) in particular. You can run the command `make TARGET=nrf BOARD=nrf52840/dk savetarget` to store the target and board in a local file; then you can skip explicitly setting them in the next commands.

nRF52840 development kit

Expected result:

MKDIR build/nrf/nrf52840/dk/obj
...
CP build/nrf/nrf5340/dk/application/main.nrf → main.nrf
rm build/nrf/nrf5340/dk/application/obj/nn.o build/nrf/nrf5340/dk/application/main.i16hex main.o

Connect the devkit and flash the application:

make TARGET=nrf BOARD=nrf52840/dk -j main.upload

Now connect with a serial port reader program and observe the output.

make TARGET=nrf BOARD=nrf52840/dk login PORT=/dev/ttyACM0

Expected output:

[INFO: Main ] Starting Contiki-NG-release/v4.7–3-gabb314733-dirty
[INFO: Main ] — Routing: RPL Lite
[INFO: Main ] — Net: sicslowpan
[INFO: Main ] — MAC: CSMA
[INFO: Main ] — 802.15.4 PANID: 0xabcd
[INFO: Main ] — 802.15.4 Default channel: 26
[INFO: Main ] Node ID: 0
[INFO: Main ] Link-layer address: f4ce.3600.0000.0000
[INFO: Main ] Tentative link-local IPv6 address: fe80::f6ce:3600:0:0
running inference…
x_value: 1.0*2^-127, y_value: 1.0*2^-127
running inference…
x_value: 1.2566366*2^-2, y_value: 1.4910722*2^-2

The x_valueis the “true” value of the sin() function on the given parameter (i.e. the target that must be learned), and y_value is the number calculated from the output returned by the model (i.e. the guess made by the model).

Looking at the code

The neural network model

The example code already comes with the model provided in a C file called nn-model.c.

The model is converted from the .tflite file provided in the TFLM Hello World example application by using the xxd command. To regenerate the C file:

xxd -i /home/atis/source/tflite-micro/tensorflow/lite/micro/examples/hello_world/hello_world.tflite > nn-model.c

Change the path to match your directory structure, and afterwards edit the C file to rename the model back to hello_world_tflite.

Warning #1: the ARM processor requires that all memory accesses are aligned, otherwise it throws a hardware exception. After converting the model, make sure to add alignas(8)attribute to the byte code array, otherwise the program will crash. See the code here for an example.

Warning #2: if you attempt to use a custom model, be aware that not all TensorFlow Lite operations (kernels) are supported by the Tensorflow Lite Micro library! This means that some .tflite models will fail at runtime, during the setup stage.

The C++ code for model setup and inference

The important bits are in the file `nn.cpp`, which shows how to prepare the model and run inference in it.

The functions nn_setup() and nn_run_inference use the C calling convention and are simply wrappers around the C++ functions setup() and loop() from the example. The main process in a Contiki-NG application must be written in C, which means that it cannot call C++ functions directly.

The functions setup() and loop() are taken from the TFLM Hello World example; their names are a convention used in Arduino sketches. I have changed the return type of these functions to allow to return an error code.

The most important bit in the setup() function is the call to interpreter->AllocateTensors(). This step may fail because of different reasons, e.g. the model uses unsupported operations or its run-time overhead is too large to fit in the pre-allocated tensor_arena variable.

Be aware that the model itself must also be placed in RAM, not in the flash memory, as it may be modified by the TFLM runtime library: don’t be tempted to add constin front of the array that contains the model’s binary code.

int setup() {
tflite::InitializeTarget();
// Set up logging. Google style is to avoid globals or statics because of
// lifetime uncertainty, but since this has a trivial destructor it’s okay.
// NOLINTNEXTLINE(runtime-global-variables)
static tflite::MicroErrorReporter micro_error_reporter;
error_reporter = &micro_error_reporter;
// Map the model into a usable data structure. This doesn’t involve any
// copying or parsing, it’s a very lightweight operation.
model = tflite::GetModel(hello_world_tflite);
if (model->version() != TFLITE_SCHEMA_VERSION) {
TF_LITE_REPORT_ERROR(error_reporter,"Model provided is schema version %d not equal "
"to supported version %d.",
model->version(), TFLITE_SCHEMA_VERSION);
return -1;
}
// This pulls in all the operation implementations we need.
// NOLINTNEXTLINE(runtime-global-variables)
static tflite::AllOpsResolver resolver;
// Build an interpreter to run the model with.
static tflite::MicroInterpreter static_interpreter(
model, resolver, tensor_arena, kTensorArenaSize, error_reporter);
interpreter = &static_interpreter;
// Allocate memory from the tensor_arena for the model’s tensors.
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
TF_LITE_REPORT_ERROR(error_reporter, "AllocateTensors() failed");
return -1;
}
// Obtain pointers to the model’s input and output tensors.
input = interpreter->input(0);
output = interpreter->output(0);
// Keep track of how many inferences we have performed.
inference_count = 0;
return 0;
}

The loop() function sets up inputs (a single number in this case), peforms inference by calling interpreter->Invoke(), and prints out the dequantized output from the inference.

int loop() {
// Calculate an x value to feed into the model. We compare the current
// inference_count to the number of inferences per cycle to determine
// our position within the range of possible x values the model was
// trained on, and use this to calculate a value.
float position = static_cast<float>(inference_count) /
static_cast<float>(kInferencesPerCycle);
float x = position * kXrange;
// Quantize the input from floating-point to integer
int8_t x_quantized = x / input->params.scale + input->params.zero_point;
// Place the quantized input in the model’s input tensor
input->data.int8[0] = x_quantized;
// Run inference, and report any error
TfLiteStatus invoke_status = interpreter->Invoke();
if (invoke_status != kTfLiteOk) {
TF_LITE_REPORT_ERROR(error_reporter, “Invoke failed on x: %f\n”,
static_cast<double>(x));
return -1;
}
// Obtain the quantized output from model’s output tensor
int8_t y_quantized = output->data.int8[0];
// Dequantize the output from integer to floating-point
float y = (y_quantized — output->params.zero_point) * output->params.scale;
// Output the results: Log the current X and Y values
TF_LITE_REPORT_ERROR(error_reporter, “x_value: %f, y_value: %f\n”,
static_cast<double>(x),
static_cast<double>(y));
// Increment the inference_counter, and reset it if we have reached
// the total number per cycle
inference_count += 1;
if (inference_count >= kInferencesPerCycle) inference_count = 0;
return 0;
}

Acknowledgements

This work was supported by the ERDF Activity 1.1.1.2 “Post-doctoral Research Aid’’ (№1.1.1.2/VIAA/2/18/282). It was part of the FEATURE project developed at the Institute of Electronics and Computer Science (EDI), https://www.edi.lv/en/