Project Overview
An AI smart pet camera developed for a pet technology company, powered by the ESP32-S3 dual-core chip running TensorFlow Lite to achieve on-device pet detection, behavior analysis, and anomaly alerts. The product integrates a 1080P camera, treat dispenser, two-way audio, and night vision, delivering low-latency video streaming via WebRTC so owners can interact with their pets anytime, anywhere.
Over 80,000+ units sold, with an average daily usage of 2.5 hours per user and a pet recognition accuracy of 96.5%.
Core Technical Challenges
1. Edge AI Pet Detection
Challenge:
- Limited memory on ESP32-S3 (512KB SRAM + 8MB PSRAM)
- Real-time processing of 30fps video stream required
- Model must simultaneously recognize multiple pet types (cats, dogs, rabbits, etc.)
Solution — YOLOv8-Nano Model Quantization:
# Model training and quantization script (runs on PC)
import tensorflow as tf
from ultralytics import YOLO
import numpy as np
# 1. Train YOLOv8-Nano model (using pet dataset)
def train_pet_detection_model():
model = YOLO('yolov8n.pt') # YOLOv8-Nano pre-trained model
# Training parameters
results = model.train(
data='pet_dataset.yaml', # Custom pet dataset
epochs=100,
imgsz=320, # Reduce resolution to 320x320 (suitable for ESP32)
batch=32,
device=0, # GPU training
patience=20,
project='pet_detection',
name='yolov8n_pet'
)
# Export to TensorFlow Lite format
model.export(format='tflite', imgsz=320)
return 'yolov8n_pet.tflite'
# 2. Advanced quantization (INT8)
def quantize_model_int8(model_path, representative_dataset):
"""
Quantize FP32 model to INT8 to reduce model size and inference time
"""
converter = tf.lite.TFLiteConverter.from_saved_model(model_path)
# Enable full INT8 quantization
converter.optimizations = [tf.lite.Optimize.DEFAULT]
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
converter.inference_input_type = tf.uint8
converter.inference_output_type = tf.uint8
# Provide representative dataset (for calibrating quantization parameters)
def representative_data_gen():
for img in representative_dataset:
img_resized = tf.image.resize(img, [320, 320])
img_normalized = tf.cast(img_resized, tf.float32) / 255.0
yield [img_normalized[tf.newaxis, ...]]
converter.representative_dataset = representative_data_gen
# Execute quantization
tflite_model = converter.convert()
# Save quantized model
with open('pet_detection_int8.tflite', 'wb') as f:
f.write(tflite_model)
print(f"Quantized model size: {len(tflite_model) / 1024:.2f} KB")
return 'pet_detection_int8.tflite'
# 3. Model performance evaluation
def evaluate_model_performance(tflite_model_path, test_dataset):
interpreter = tf.lite.Interpreter(model_path=tflite_model_path)
interpreter.allocate_tensors()
input_details = interpreter.get_input_details()
output_details = interpreter.get_output_details()
correct = 0
total = 0
inference_times = []
for img, label in test_dataset:
# Preprocessing
img_resized = tf.image.resize(img, [320, 320])
img_normalized = tf.cast(img_resized, tf.float32) / 255.0
input_data = np.expand_dims(img_normalized, axis=0).astype(np.float32)
# Inference
start_time = time.time()
interpreter.set_tensor(input_details[0]['index'], input_data)
interpreter.invoke()
inference_time = (time.time() - start_time) * 1000 # ms
inference_times.append(inference_time)
# Get results
output_data = interpreter.get_tensor(output_details[0]['index'])
predicted_class = np.argmax(output_data)
if predicted_class == label:
correct += 1
total += 1
accuracy = correct / total * 100
avg_inference_time = np.mean(inference_times)
print(f"Accuracy: {accuracy:.2f}%")
print(f"Average inference time: {avg_inference_time:.2f} ms")
return accuracy, avg_inference_time
ESP32-S3 TensorFlow Lite Inference:
#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_log.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"
#define TAG "PET_DETECTION"
// Model data (embedded in firmware)
extern const unsigned char pet_detection_model[];
extern const unsigned int pet_detection_model_len;
// Tensor Arena (allocate inference memory)
constexpr int kTensorArenaSize = 300 * 1024; // 300KB
alignas(16) uint8_t tensor_arena[kTensorArenaSize];
// Pet class labels
const char* pet_labels[] = {
"dog", // Dog
"cat", // Cat
"rabbit", // Rabbit
"bird", // Bird
"hamster" // Hamster
};
typedef struct {
int class_id;
float confidence;
float bbox_x;
float bbox_y;
float bbox_w;
float bbox_h;
} detection_result_t;
class PetDetector {
private:
const tflite::Model* model;
tflite::MicroInterpreter* interpreter;
TfLiteTensor* input;
TfLiteTensor* output;
public:
PetDetector() {
// Load model
model = tflite::GetModel(pet_detection_model);
if (model->version() != TFLITE_SCHEMA_VERSION) {
ESP_LOGE(TAG, "Model schema version mismatch!");
return;
}
// Register all operations
static tflite::AllOpsResolver resolver;
// Create interpreter
static tflite::MicroInterpreter static_interpreter(
model, resolver, tensor_arena, kTensorArenaSize);
interpreter = &static_interpreter;
// Allocate tensor memory
TfLiteStatus allocate_status = interpreter->AllocateTensors();
if (allocate_status != kTfLiteOk) {
ESP_LOGE(TAG, "AllocateTensors() failed");
return;
}
// Get input/output tensors
input = interpreter->input(0);
output = interpreter->output(0);
ESP_LOGI(TAG, "Pet detection model loaded successfully");
ESP_LOGI(TAG, "Input shape: [%d, %d, %d, %d]",
input->dims->data[0], input->dims->data[1],
input->dims->data[2], input->dims->data[3]);
}
// Run inference
detection_result_t detect(uint8_t* image_data, int width, int height) {
detection_result_t result = {0};
// Preprocessing: resize + normalize
preprocess_image(image_data, width, height, input->data.uint8);
// Run inference
uint32_t start_time = esp_timer_get_time();
TfLiteStatus invoke_status = interpreter->Invoke();
uint32_t inference_time = (esp_timer_get_time() - start_time) / 1000; // ms
if (invoke_status != kTfLiteOk) {
ESP_LOGE(TAG, "Invoke failed!");
return result;
}
ESP_LOGI(TAG, "Inference time: %lu ms", inference_time);
// Parse output
result = parse_yolo_output(output);
if (result.confidence > 0.5) {
ESP_LOGI(TAG, "Detected: %s (%.2f%%)",
pet_labels[result.class_id],
result.confidence * 100);
}
return result;
}
private:
// Preprocess image (resize + normalize)
void preprocess_image(uint8_t* src, int src_w, int src_h, uint8_t* dst) {
const int dst_w = 320;
const int dst_h = 320;
// Simple bilinear interpolation resize
for (int y = 0; y < dst_h; y++) {
for (int x = 0; x < dst_w; x++) {
int src_x = x * src_w / dst_w;
int src_y = y * src_h / dst_h;
// RGB conversion (assuming source is RGB565)
int src_idx = (src_y * src_w + src_x) * 2;
uint16_t rgb565 = (src[src_idx] << 8) | src[src_idx + 1];
uint8_t r = ((rgb565 >> 11) & 0x1F) << 3;
uint8_t g = ((rgb565 >> 5) & 0x3F) << 2;
uint8_t b = (rgb565 & 0x1F) << 3;
int dst_idx = (y * dst_w + x) * 3;
dst[dst_idx] = r;
dst[dst_idx + 1] = g;
dst[dst_idx + 2] = b;
}
}
}
// Parse YOLO output
detection_result_t parse_yolo_output(TfLiteTensor* output_tensor) {
detection_result_t best_result = {0};
float max_confidence = 0.0;
// YOLOv8 output format: [1, 25200, 9]
// 9 = [x, y, w, h, conf, class_0, class_1, ..., class_4]
float* output_data = output_tensor->data.f;
int num_detections = output_tensor->dims->data[1];
for (int i = 0; i < num_detections; i++) {
float* detection = &output_data[i * 9];
float x = detection[0];
float y = detection[1];
float w = detection[2];
float h = detection[3];
float obj_conf = detection[4];
// Find the class with the highest score
int best_class = 0;
float best_class_conf = detection[5];
for (int c = 1; c < 5; c++) {
if (detection[5 + c] > best_class_conf) {
best_class_conf = detection[5 + c];
best_class = c;
}
}
float confidence = obj_conf * best_class_conf;
if (confidence > max_confidence) {
max_confidence = confidence;
best_result.class_id = best_class;
best_result.confidence = confidence;
best_result.bbox_x = x;
best_result.bbox_y = y;
best_result.bbox_w = w;
best_result.bbox_h = h;
}
}
return best_result;
}
};
2. WebRTC Low-Latency Video Streaming
ESP32-S3 WebRTC Implementation:
#include "esp_camera.h"
#include "esp_http_server.h"
#include "esp_websocket_server.h"
#define TAG "WEBRTC_STREAM"
// Camera configuration (OV2640 1080P)
camera_config_t camera_config = {
.pin_pwdn = -1,
.pin_reset = -1,
.pin_xclk = 10,
.pin_sccb_sda = 40,
.pin_sccb_scl = 39,
.pin_d7 = 48,
.pin_d6 = 11,
.pin_d5 = 12,
.pin_d4 = 14,
.pin_d3 = 16,
.pin_d2 = 18,
.pin_d1 = 17,
.pin_d0 = 15,
.pin_vsync = 38,
.pin_href = 47,
.pin_pclk = 13,
.xclk_freq_hz = 20000000,
.ledc_timer = LEDC_TIMER_0,
.ledc_channel = LEDC_CHANNEL_0,
.pixel_format = PIXFORMAT_JPEG,
.frame_size = FRAMESIZE_HD, // 1280x720
.jpeg_quality = 12, // JPEG quality (0-63, lower is better)
.fb_count = 2, // Frame buffer count
.grab_mode = CAMERA_GRAB_LATEST // Always grab the latest frame
};
// WebSocket client management
typedef struct {
httpd_handle_t server;
int fd;
bool connected;
uint32_t frame_count;
} webrtc_client_t;
static webrtc_client_t webrtc_clients[4] = {0};
// Initialize camera
esp_err_t init_camera(void) {
esp_err_t err = esp_camera_init(&camera_config);
if (err != ESP_OK) {
ESP_LOGE(TAG, "Camera init failed: %s", esp_err_to_name(err));
return err;
}
// Adjust camera parameters (night vision enhancement)
sensor_t *s = esp_camera_sensor_get();
s->set_brightness(s, 1); // Brightness +1
s->set_contrast(s, 1); // Contrast +1
s->set_saturation(s, 0); // Saturation 0
s->set_whitebal(s, 1); // Auto white balance
s->set_awb_gain(s, 1); // Auto white balance gain
s->set_exposure_ctrl(s, 1); // Auto exposure
s->set_aec2(s, 1); // Auto exposure level 2
s->set_gain_ctrl(s, 1); // Auto gain
s->set_agc_gain(s, 10); // AGC gain
ESP_LOGI(TAG, "Camera initialized successfully");
return ESP_OK;
}
// WebSocket connection handler
esp_err_t webrtc_ws_handler(httpd_req_t *req) {
if (req->method == HTTP_GET) {
ESP_LOGI(TAG, "WebSocket handshake");
return ESP_OK;
}
// Find an available client slot
webrtc_client_t *client = NULL;
for (int i = 0; i < 4; i++) {
if (!webrtc_clients[i].connected) {
client = &webrtc_clients[i];
client->server = req->handle;
client->fd = httpd_req_to_sockfd(req);
client->connected = true;
client->frame_count = 0;
break;
}
}
if (!client) {
ESP_LOGW(TAG, "Maximum WebRTC clients reached");
return ESP_FAIL;
}
ESP_LOGI(TAG, "WebRTC client connected: fd=%d", client->fd);
// Receive client messages (SDP Offer/ICE Candidate)
httpd_ws_frame_t ws_pkt;
memset(&ws_pkt, 0, sizeof(httpd_ws_frame_t));
ws_pkt.type = HTTPD_WS_TYPE_TEXT;
uint8_t buffer[1024];
ws_pkt.payload = buffer;
esp_err_t ret = httpd_ws_recv_frame(req, &ws_pkt, 1024);
if (ret != ESP_OK) {
client->connected = false;
return ret;
}
ESP_LOGI(TAG, "Received WebSocket message: %s", ws_pkt.payload);
// Handle WebRTC signaling (SDP/ICE)
// Simplified here; actual implementation requires full WebRTC protocol handling
handle_webrtc_signaling(client, (char*)ws_pkt.payload, ws_pkt.len);
return ESP_OK;
}
// Video streaming task (FreeRTOS Task)
void webrtc_streaming_task(void *pvParameters) {
camera_fb_t *fb = NULL;
while (1) {
// Capture camera frame
fb = esp_camera_fb_get();
if (!fb) {
ESP_LOGE(TAG, "Camera capture failed");
vTaskDelay(pdMS_TO_TICKS(100));
continue;
}
// Send to all connected clients
for (int i = 0; i < 4; i++) {
if (!webrtc_clients[i].connected) continue;
httpd_ws_frame_t ws_frame;
memset(&ws_frame, 0, sizeof(httpd_ws_frame_t));
ws_frame.type = HTTPD_WS_TYPE_BINARY;
ws_frame.payload = fb->buf;
ws_frame.len = fb->len;
esp_err_t ret = httpd_ws_send_frame_async(
webrtc_clients[i].server,
webrtc_clients[i].fd,
&ws_frame
);
if (ret != ESP_OK) {
ESP_LOGW(TAG, "Client %d disconnected", i);
webrtc_clients[i].connected = false;
} else {
webrtc_clients[i].frame_count++;
}
}
// Release frame buffer
esp_camera_fb_return(fb);
// Control frame rate (30fps = 33ms)
vTaskDelay(pdMS_TO_TICKS(33));
}
}
3. Pet Behavior Analysis and Alerts
Behavior Recognition System:
// Node.js behavior analysis service
const { InfluxDB, Point } = require('@influxdata/influxdb-client');
const mqtt = require('mqtt');
class PetBehaviorAnalyzer {
constructor() {
this.influxDB = new InfluxDB({
url: 'http://localhost:8086',
token: 'your-token'
});
this.writeApi = this.influxDB.getWriteApi('pet-monitor', 'behaviors');
this.queryApi = this.influxDB.getQueryApi('pet-monitor');
this.mqttClient = mqtt.connect('mqtt://localhost:1883');
this.behaviorHistory = [];
this.alertThresholds = {
prolonged_absence: 120, // Alert if pet absent for 2 hours
excessive_barking: 5, // Continuous barking within 5 minutes
abnormal_activity: 30 // Abnormal activity for 30 minutes
};
this.initMQTT();
}
initMQTT() {
this.mqttClient.on('connect', () => {
this.mqttClient.subscribe('petcam/+/detection');
this.mqttClient.subscribe('petcam/+/audio');
});
this.mqttClient.on('message', (topic, message) => {
const data = JSON.parse(message.toString());
const cameraId = topic.split('/')[1];
if (topic.includes('detection')) {
this.analyzeDetection(cameraId, data);
} else if (topic.includes('audio')) {
this.analyzeAudio(cameraId, data);
}
});
}
// Analyze pet detection results
analyzeDetection(cameraId, detection) {
const point = new Point('pet_detection')
.tag('camera_id', cameraId)
.tag('pet_type', detection.class)
.floatField('confidence', detection.confidence)
.floatField('bbox_x', detection.bbox_x)
.floatField('bbox_y', detection.bbox_y)
.timestamp(new Date());
this.writeApi.writePoint(point);
// Record behavior history
this.behaviorHistory.push({
timestamp: Date.now(),
cameraId,
type: 'detection',
data: detection
});
// Check for abnormal behaviors
this.checkAbnormalBehaviors(cameraId);
}
// Analyze audio (barking detection)
analyzeAudio(cameraId, audio) {
if (audio.barking_detected) {
const point = new Point('pet_audio')
.tag('camera_id', cameraId)
.tag('event_type', 'barking')
.floatField('volume', audio.volume)
.timestamp(new Date());
this.writeApi.writePoint(point);
// Check for excessive barking
this.checkExcessiveBarking(cameraId);
}
}
// Check for abnormal behaviors
async checkAbnormalBehaviors(cameraId) {
// 1. Check prolonged pet absence
const lastDetection = await this.getLastDetectionTime(cameraId);
const timeSinceLastSeen = (Date.now() - lastDetection) / 1000 / 60; // minutes
if (timeSinceLastSeen > this.alertThresholds.prolonged_absence) {
this.sendAlert(cameraId, 'prolonged_absence', {
message: `Your pet has not appeared on camera for ${Math.floor(timeSinceLastSeen)} minutes`,
severity: 'medium'
});
}
// 2. Check abnormal activity (frequent movement / completely still)
const activityLevel = await this.calculateActivityLevel(cameraId, 30);
if (activityLevel > 0.8) {
this.sendAlert(cameraId, 'high_activity', {
message: 'Your pet may be overly excited or anxious',
severity: 'low'
});
} else if (activityLevel < 0.1) {
this.sendAlert(cameraId, 'low_activity', {
message: 'Your pet may not be feeling well — activity level has dropped significantly',
severity: 'medium'
});
}
}
// Check for excessive barking
async checkExcessiveBarking(cameraId) {
const fluxQuery = `
from(bucket: "behaviors")
|> range(start: -5m)
|> filter(fn: (r) => r._measurement == "pet_audio")
|> filter(fn: (r) => r.camera_id == "${cameraId}")
|> filter(fn: (r) => r.event_type == "barking")
|> count()
`;
let barkingCount = 0;
await this.queryApi.queryRows(fluxQuery, {
next(row, tableMeta) {
const o = tableMeta.toObject(row);
barkingCount = o._value;
},
complete() {
if (barkingCount > 10) { // More than 10 barks within 5 minutes
this.sendAlert(cameraId, 'excessive_barking', {
message: 'Your pet may be anxious or a visitor may be present',
severity: 'medium',
count: barkingCount
});
}
}
});
}
// Calculate activity level metric
async calculateActivityLevel(cameraId, minutes) {
const fluxQuery = `
from(bucket: "behaviors")
|> range(start: -${minutes}m)
|> filter(fn: (r) => r._measurement == "pet_detection")
|> filter(fn: (r) => r.camera_id == "${cameraId}")
|> derivative(unit: 1m, nonNegative: false)
|> mean()
`;
// Calculate position change rate (activity level)
return new Promise((resolve) => {
let activityLevel = 0.5; // Default value
this.queryApi.queryRows(fluxQuery, {
next(row, tableMeta) {
const o = tableMeta.toObject(row);
activityLevel = Math.abs(o._value);
},
complete() {
resolve(activityLevel);
}
});
});
}
// Send alert
sendAlert(cameraId, alertType, details) {
const alert = {
cameraId,
type: alertType,
timestamp: new Date().toISOString(),
...details
};
// Publish MQTT notification
this.mqttClient.publish(`petcam/${cameraId}/alerts`, JSON.stringify(alert));
// Send push notification (integrated with Firebase Cloud Messaging)
this.sendPushNotification(cameraId, alert);
console.log(`Alert sent: ${alertType} for camera ${cameraId}`);
}
// Send push notification
async sendPushNotification(cameraId, alert) {
// Integrated with Firebase Cloud Messaging
// Actual implementation requires FCM SDK
console.log(`Push notification: ${alert.message}`);
}
}
module.exports = PetBehaviorAnalyzer;
Project Results
Technical Metrics
- Pet recognition accuracy: 96.5% (validated with 10,000+ test images)
- Inference speed: 150ms/frame (ESP32-S3@240MHz)
- Video streaming latency: < 300ms (WebRTC)
- Night vision range: 8 meters (850nm infrared LEDs)
- Treat dispensing accuracy: 92% (with AI-assisted positioning)
- Battery life: 30 days standby (alert receiving) / 8 hours continuous viewing
Business Results
- Units sold: 80,000+
- Average daily usage: 2.5 hours/day
- User rating: 4.8/5.0
- Awarded 2024 CES Innovation Award (Pet Technology category)
- Monthly active subscribers: 25,000+ (cloud recording plan)
Innovation Highlights
- Edge AI real-time detection: Pet recognition performed on-device, no cloud upload needed — protecting user privacy
- Behavior analysis engine: AI learns pet habits and automatically detects abnormal behaviors
- Interactive treat machine: AI-assisted positioning for precise treat dispensing rewards
- Two-way HD audio: Noise-canceling algorithm for crystal-clear pet communication
Technology Stack
Hardware Platform:
- ESP32-S3 (Xtensa LX7 dual-core 240MHz)
- OV2640 (2MP camera module)
- Infrared night vision module
- Stepper motor (treat dispenser)
- MEMS microphone + speaker
Edge AI:
- TensorFlow Lite Micro
- YOLOv8-Nano (INT8 quantized)
- EdgeTPU (optional accelerator)
Backend Services:
- Node.js + Express
- AWS IoT Core
- InfluxDB (behavior data)
- Firebase Cloud Messaging
Frontend Applications:
- React Native (iOS/Android app)
- WebRTC (real-time video)
- React.js (web management dashboard)
Client Testimonial
"BASHCAT's AI pet camera has completely transformed pet monitoring products! Edge AI not only protects user privacy but also significantly reduces our cloud costs. The behavior analysis feature adds a human touch to the product, with user retention 40% higher than competing products. We are extremely satisfied with this collaboration!"
— CTO, Pet Technology Company
Project Duration: March 2023 - January 2024 Technical Domains: Edge AI, Computer Vision, IoT, Real-Time Communication