Run TinyML AI Models on ESP32: Complete Guide with Voice Command Recognition Project

Are you excited about adding powerful machine learning (ML) capabilities to your affordable ESP32 microcontroller projects? Thanks to TinyML (Tiny Machine Learning), running AI models directly on embedded devices like the ESP32 is now possible. This enables a whole new level of smart applications — from voice recognition to gesture detection and anomaly monitoring, all without depending on cloud connectivity.

In this comprehensive guide, we’ll cover the basics of TinyML on the ESP32, walk you through a step-by-step deployment process, and then dive into a practical, working example: a voice command recognition system using an I2S microphone. Whether you’re a hobbyist, student, or developer, this article is designed to help you bring AI right to the edge!

Learn how to run TinyML AI models on ESP32 with a step-by-step guide and a voice command recognition project using an I2S microphone.

What is TinyML?

Simply put, TinyML means running lightweight machine learning models on tiny devices like microcontrollers that have very limited memory and processing power. Instead of sending sensor data to cloud servers for analysis, which introduces latency, privacy concerns, and requires internet access, TinyML performs inference locally on the device itself. This local processing makes your device faster, more secure, and capable of working even when offline.

Why Choose ESP32 for TinyML?

The ESP32 microcontroller has become a favourite in the maker and embedded world, and for good reasons:

  • Affordable and widely available: Great for both prototyping and production
  • Built-in Wi-Fi and Bluetooth: Connect your devices seamlessly
  • Enough RAM and flash: Can run small AI models efficiently
  • Audio input support via I2S microphones: Essential for voice and sound projects
  • Strong developer community and Arduino ecosystem: Tons of resources and examples
  • This combination makes the ESP32 an ideal platform to experiment with TinyML and bring AI-powered projects to life.

What Can You Build with TinyML on ESP32?

Here are some practical and fun TinyML applications you can develop:

  • Voice Command Recognition: Control appliances or gadgets by saying “yes,” “no,” “on,” or “off”
  • Gesture Detection: Use accelerometers to detect motions like shaking or tapping
  • Image Classification: Identify objects or people with the ESP32-CAM
  • Sensor Anomaly Detection: Monitor vibration, temperature, or other signals for unusual patterns that indicate faults
  • The possibilities are vast, and TinyML helps you make your projects smart, responsive, and self-contained.

Step-by-Step TinyML Deployment on ESP32

Getting started with TinyML might feel overwhelming, but breaking it down into clear steps makes it manageable.

1. Train Your Machine Learning Model

  • Collect and label your dataset (for example, audio clips for voice commands or sensor data for gestures)
  • Build and train your model using popular ML frameworks like TensorFlow
  • Validate the model to ensure it performs well on unseen data
2. Convert the Model to TensorFlow Lite Format
  • Use TensorFlow Lite Converter to generate a .tflite file optimised for microcontrollers
  • Apply quantisation to reduce the model size and increase inference speed without sacrificing much accuracy

3. Generate a C Array from the Model

  • Since microcontrollers can’t read .tflite files directly, convert the model to a C header file

Use the xxd tool in your terminal:

xxd -i model.tflite > model_data.h

  • This header file will be embedded in your ESP32 program

4. Set Up Your Development Environment

  • Install Arduino IDE or PlatformIO — both support ESP32 development
  • Install the TensorFlow Lite Micro libraries like TensorFlowLite_ESP32 or Arduino_TensorFlowLite
  • Add ESP32 board definitions in your IDE

5. Write and Upload Your Firmware

  • Include your model_data.h in your project
  • Implement code to acquire sensor or audio data
  • Preprocess this input to match the model’s expected format
  • Run inference using the TensorFlow Lite Micro interpreter
  • Write your application logic to respond to the AI model’s outputs

6. Test and Optimise Your Project

  • Upload your code to the ESP32 and monitor the Serial output for results
  • Tune your data preprocessing for better accuracy and responsiveness
  • If the model is too large or slow, consider retraining with quantisation or pruning

Complete Project: Voice Command Recognition on ESP32 with Audio Capture

Now, let’s bring it all together with a hands-on example. This project recognises simple voice commands such as “yes,” “no,” and “unknown” using an I2S microphone connected to the ESP32.

Hardware Requirements

  • ESP32 development board (for example, the ESP32-WROVER)
  • I2S MEMS microphone module (such as the INMP441)

Hardware Connections

ESP32 Pin

I2S Microphone Pin

GPIO22 (SCK)

BCLK

GPIO21 (WS)

LRCLK

GPIO23 (SD)

DOUT

3.3V

VCC

GND

GND

Step 1: Prepare the Model

Download the pre-trained speech commands model from TensorFlow’s repository:
micro_speech.tflite

Convert it to a C header file using:

xxd -i micro_speech.tflite > model_data.h

Include this header in your Arduino project folder.

Step 2: Arduino Sketch for Voice Recognition

Below is a simplified Arduino sketch. It captures audio via I2S, normalises the audio samples, runs inference with the TinyML model, and prints the recognised command on the Serial Monitor.

#include <driver/i2s.h>
#include <TensorFlowLite.h>
#include "model_data.h"
#include "micro_features/micro_model_settings.h"
#include "micro_features/micro_features_generator.h"
#include "micro_features/micro_interpreter.h"
#include "tensorflow/lite/micro/kernels/micro_ops.h"
#include "tensorflow/lite/micro/micro_error_reporter.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/schema/schema_generated.h"
#include "tensorflow/lite/version.h"
// I2S pin configuration
#define I2S_WS GPIO_NUM_21
#define I2S_SD GPIO_NUM_23
#define I2S_SCK GPIO_NUM_22
#define SAMPLE_RATE 16000
#define SAMPLES_PER_FRAME 512
constexpr int kTensorArenaSize = 10 * 1024;
uint8_t tensor_arena[kTensorArenaSize];
tflite::MicroInterpreter* interpreter;
TfLiteTensor* model_input;
TfLiteTensor* model_output;
int16_t audio_buffer[SAMPLES_PER_FRAME];
float features[SAMPLES_PER_FRAME];
tflite::ErrorReporter* error_reporter;
void setup_i2s() {
  i2s_config_t i2s_config = {
      .mode = (i2s_mode_t)(I2S_MODE_MASTER | I2S_MODE_RX),
      .sample_rate = SAMPLE_RATE,
      .bits_per_sample = I2S_BITS_PER_SAMPLE_16BIT,
      .channel_format = I2S_CHANNEL_FMT_ONLY_RIGHT,
      .communication_format = I2S_COMM_FORMAT_I2S_MSB,
      .intr_alloc_flags = ESP_INTR_FLAG_LEVEL1,
      .dma_buf_count = 8,
      .dma_buf_len = 64,
      .use_apll = false,
      .tx_desc_auto_clear = false,
      .fixed_mclk = 0
  };
  i2s_pin_config_t pin_config = {
      .bck_io_num = I2S_SCK,
      .ws_io_num = I2S_WS,
      .data_out_num = -1,
      .data_in_num = I2S_SD
  };
  i2s_driver_install(I2S_NUM_0, &i2s_config, 0, NULL);
  i2s_set_pin(I2S_NUM_0, &pin_config);
}
void setup() {
  Serial.begin(115200);
  delay(2000);
  Serial.println("Starting TinyML Voice Recognition on ESP32");
  setup_i2s();
  static tflite::MicroErrorReporter micro_error_reporter;
  error_reporter = &micro_error_reporter;
  const tflite::Model* model = tflite::GetModel(model_data);
  if (model->version() != TFLITE_SCHEMA_VERSION) {
    Serial.println("Model schema mismatch");
    while (1) {}
  }
  static tflite::MicroMutableOpResolver<4> micro_op_resolver;
  micro_op_resolver.AddConv2D();
  micro_op_resolver.AddDepthwiseConv2D();
  micro_op_resolver.AddFullyConnected();
  micro_op_resolver.AddSoftmax();
  static tflite::MicroInterpreter static_interpreter(
      model, micro_op_resolver, tensor_arena, kTensorArenaSize, error_reporter);
  interpreter = &static_interpreter;
  if (interpreter->AllocateTensors() != kTfLiteOk) {
    Serial.println("AllocateTensors() failed");
    while (1) {}
  }
  model_input = interpreter->input(0);
  model_output = interpreter->output(0);
}
void loop() {
  size_t bytes_read;
  i2s_read(I2S_NUM_0, audio_buffer, sizeof(audio_buffer), &bytes_read, portMAX_DELAY);
  for (int i = 0; i < SAMPLES_PER_FRAME; i++) {
    features[i] = (float)audio_buffer[i] / 32768.0f;  // Normalize audio
  }
  // TODO: Replace with proper MFCC feature extraction here using micro_features_generator
  for (int i = 0; i < model_input->bytes / sizeof(float); i++) {
    model_input->data.f[i] = features[i];
  }
  if (interpreter->Invoke() != kTfLiteOk) {
    Serial.println("Inference failed");
    return;
  }
  int max_index = 0;
  float max_score = 0.0f;
  for (int i = 0; i < model_output->dims->data[1]; i++) {
    if (model_output->data.f[i] > max_score) {
      max_score = model_output->data.f[i];
      max_index = i;
    }
  }
  const char* labels[] = {"yes", "no", "up", "down", "left", "right", "on", "off", "stop", "go", "unknown", "silence"};
  Serial.print("Recognized Command: ");
  Serial.println(labels[max_index]);
  delay(1000);
}code-box

Important Notes on the Project

  1. This example uses raw audio samples for inference, which is a simplification. For accurate recognition, you should implement MFCC (Mel Frequency Cepstral Coefficients) feature extraction using TensorFlow’s micro_features_generator. This transforms audio into features the model understands better.
  2. The labels array must correspond exactly to the output classes your model was trained on.
  3. You can find complete feature extraction and voice recognition examples on TensorFlow’s official TinyML GitHub repository.
  4. Make sure to adjust I2S pin assignments and buffer sizes to match your specific hardware.

Conclusion

Running TinyML models on ESP32 opens up a whole new world of smart, low-power AI applications you can build right on your desk. From voice command recognition to gesture detection and beyond, TinyML empowers embedded devices with intelligence and responsiveness — all while preserving privacy and independence from the cloud.

With this guide and practical example, you’re well on your way to deploying your own TinyML projects on ESP32. If you have any questions or want assistance with expanding this project, such as adding real feature extraction, training custom models, or exploring other AI applications, just leave a comment below!

0/Post a Comment/Comments