AI / ML·2024

AI Smart Pet Interactive Camera

A smart pet camera with integrated edge AI visual recognition, supporting pet behavior analysis, treat dispensing, two-way audio, and real-time alerts — keeping owners connected with their pets anytime

clientPet Technology Company
duration10 months
categoryAI / ML
stack
ESP32-S3TensorFlow LiteYOLOv8WebRTCReact NativeAWS IoTComputer VisionEdge AI

Project Overview

An AI smart pet camera developed for a pet technology company, powered by the ESP32-S3 dual-core chip running TensorFlow Lite to achieve on-device pet detection, behavior analysis, and anomaly alerts. The product integrates a 1080P camera, treat dispenser, two-way audio, and night vision, delivering low-latency video streaming via WebRTC so owners can interact with their pets anytime, anywhere.

Over 80,000+ units sold, with an average daily usage of 2.5 hours per user and a pet recognition accuracy of 96.5%.

Core Technical Challenges

1. Edge AI Pet Detection

Challenge:

  • Limited memory on ESP32-S3 (512KB SRAM + 8MB PSRAM)
  • Real-time processing of 30fps video stream required
  • Model must simultaneously recognize multiple pet types (cats, dogs, rabbits, etc.)

Solution — YOLOv8-Nano Model Quantization:

# Model training and quantization script (runs on PC)
import tensorflow as tf
from ultralytics import YOLO
import numpy as np

# 1. Train YOLOv8-Nano model (using pet dataset)
def train_pet_detection_model():
    model = YOLO('yolov8n.pt')  # YOLOv8-Nano pre-trained model

    # Training parameters
    results = model.train(
        data='pet_dataset.yaml',  # Custom pet dataset
        epochs=100,
        imgsz=320,  # Reduce resolution to 320x320 (suitable for ESP32)
        batch=32,
        device=0,  # GPU training
        patience=20,
        project='pet_detection',
        name='yolov8n_pet'
    )

    # Export to TensorFlow Lite format
    model.export(format='tflite', imgsz=320)

    return 'yolov8n_pet.tflite'

# 2. Advanced quantization (INT8)
def quantize_model_int8(model_path, representative_dataset):
    """
    Quantize FP32 model to INT8 to reduce model size and inference time
    """
    converter = tf.lite.TFLiteConverter.from_saved_model(model_path)

    # Enable full INT8 quantization
    converter.optimizations = [tf.lite.Optimize.DEFAULT]
    converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS_INT8]
    converter.inference_input_type = tf.uint8
    converter.inference_output_type = tf.uint8

    # Provide representative dataset (for calibrating quantization parameters)
    def representative_data_gen():
        for img in representative_dataset:
            img_resized = tf.image.resize(img, [320, 320])
            img_normalized = tf.cast(img_resized, tf.float32) / 255.0
            yield [img_normalized[tf.newaxis, ...]]

    converter.representative_dataset = representative_data_gen

    # Execute quantization
    tflite_model = converter.convert()

    # Save quantized model
    with open('pet_detection_int8.tflite', 'wb') as f:
        f.write(tflite_model)

    print(f"Quantized model size: {len(tflite_model) / 1024:.2f} KB")

    return 'pet_detection_int8.tflite'

# 3. Model performance evaluation
def evaluate_model_performance(tflite_model_path, test_dataset):
    interpreter = tf.lite.Interpreter(model_path=tflite_model_path)
    interpreter.allocate_tensors()

    input_details = interpreter.get_input_details()
    output_details = interpreter.get_output_details()

    correct = 0
    total = 0
    inference_times = []

    for img, label in test_dataset:
        # Preprocessing
        img_resized = tf.image.resize(img, [320, 320])
        img_normalized = tf.cast(img_resized, tf.float32) / 255.0
        input_data = np.expand_dims(img_normalized, axis=0).astype(np.float32)

        # Inference
        start_time = time.time()
        interpreter.set_tensor(input_details[0]['index'], input_data)
        interpreter.invoke()
        inference_time = (time.time() - start_time) * 1000  # ms

        inference_times.append(inference_time)

        # Get results
        output_data = interpreter.get_tensor(output_details[0]['index'])
        predicted_class = np.argmax(output_data)

        if predicted_class == label:
            correct += 1
        total += 1

    accuracy = correct / total * 100
    avg_inference_time = np.mean(inference_times)

    print(f"Accuracy: {accuracy:.2f}%")
    print(f"Average inference time: {avg_inference_time:.2f} ms")

    return accuracy, avg_inference_time

ESP32-S3 TensorFlow Lite Inference:

#include "tensorflow/lite/micro/all_ops_resolver.h"
#include "tensorflow/lite/micro/micro_interpreter.h"
#include "tensorflow/lite/micro/micro_log.h"
#include "tensorflow/lite/micro/system_setup.h"
#include "tensorflow/lite/schema/schema_generated.h"

#define TAG "PET_DETECTION"

// Model data (embedded in firmware)
extern const unsigned char pet_detection_model[];
extern const unsigned int pet_detection_model_len;

// Tensor Arena (allocate inference memory)
constexpr int kTensorArenaSize = 300 * 1024;  // 300KB
alignas(16) uint8_t tensor_arena[kTensorArenaSize];

// Pet class labels
const char* pet_labels[] = {
    "dog",    // Dog
    "cat",    // Cat
    "rabbit", // Rabbit
    "bird",   // Bird
    "hamster" // Hamster
};

typedef struct {
    int class_id;
    float confidence;
    float bbox_x;
    float bbox_y;
    float bbox_w;
    float bbox_h;
} detection_result_t;

class PetDetector {
private:
    const tflite::Model* model;
    tflite::MicroInterpreter* interpreter;
    TfLiteTensor* input;
    TfLiteTensor* output;

public:
    PetDetector() {
        // Load model
        model = tflite::GetModel(pet_detection_model);
        if (model->version() != TFLITE_SCHEMA_VERSION) {
            ESP_LOGE(TAG, "Model schema version mismatch!");
            return;
        }

        // Register all operations
        static tflite::AllOpsResolver resolver;

        // Create interpreter
        static tflite::MicroInterpreter static_interpreter(
            model, resolver, tensor_arena, kTensorArenaSize);
        interpreter = &static_interpreter;

        // Allocate tensor memory
        TfLiteStatus allocate_status = interpreter->AllocateTensors();
        if (allocate_status != kTfLiteOk) {
            ESP_LOGE(TAG, "AllocateTensors() failed");
            return;
        }

        // Get input/output tensors
        input = interpreter->input(0);
        output = interpreter->output(0);

        ESP_LOGI(TAG, "Pet detection model loaded successfully");
        ESP_LOGI(TAG, "Input shape: [%d, %d, %d, %d]",
                 input->dims->data[0], input->dims->data[1],
                 input->dims->data[2], input->dims->data[3]);
    }

    // Run inference
    detection_result_t detect(uint8_t* image_data, int width, int height) {
        detection_result_t result = {0};

        // Preprocessing: resize + normalize
        preprocess_image(image_data, width, height, input->data.uint8);

        // Run inference
        uint32_t start_time = esp_timer_get_time();
        TfLiteStatus invoke_status = interpreter->Invoke();
        uint32_t inference_time = (esp_timer_get_time() - start_time) / 1000;  // ms

        if (invoke_status != kTfLiteOk) {
            ESP_LOGE(TAG, "Invoke failed!");
            return result;
        }

        ESP_LOGI(TAG, "Inference time: %lu ms", inference_time);

        // Parse output
        result = parse_yolo_output(output);

        if (result.confidence > 0.5) {
            ESP_LOGI(TAG, "Detected: %s (%.2f%%)",
                     pet_labels[result.class_id],
                     result.confidence * 100);
        }

        return result;
    }

private:
    // Preprocess image (resize + normalize)
    void preprocess_image(uint8_t* src, int src_w, int src_h, uint8_t* dst) {
        const int dst_w = 320;
        const int dst_h = 320;

        // Simple bilinear interpolation resize
        for (int y = 0; y < dst_h; y++) {
            for (int x = 0; x < dst_w; x++) {
                int src_x = x * src_w / dst_w;
                int src_y = y * src_h / dst_h;

                // RGB conversion (assuming source is RGB565)
                int src_idx = (src_y * src_w + src_x) * 2;
                uint16_t rgb565 = (src[src_idx] << 8) | src[src_idx + 1];

                uint8_t r = ((rgb565 >> 11) & 0x1F) << 3;
                uint8_t g = ((rgb565 >> 5) & 0x3F) << 2;
                uint8_t b = (rgb565 & 0x1F) << 3;

                int dst_idx = (y * dst_w + x) * 3;
                dst[dst_idx] = r;
                dst[dst_idx + 1] = g;
                dst[dst_idx + 2] = b;
            }
        }
    }

    // Parse YOLO output
    detection_result_t parse_yolo_output(TfLiteTensor* output_tensor) {
        detection_result_t best_result = {0};
        float max_confidence = 0.0;

        // YOLOv8 output format: [1, 25200, 9]
        // 9 = [x, y, w, h, conf, class_0, class_1, ..., class_4]
        float* output_data = output_tensor->data.f;
        int num_detections = output_tensor->dims->data[1];

        for (int i = 0; i < num_detections; i++) {
            float* detection = &output_data[i * 9];

            float x = detection[0];
            float y = detection[1];
            float w = detection[2];
            float h = detection[3];
            float obj_conf = detection[4];

            // Find the class with the highest score
            int best_class = 0;
            float best_class_conf = detection[5];
            for (int c = 1; c < 5; c++) {
                if (detection[5 + c] > best_class_conf) {
                    best_class_conf = detection[5 + c];
                    best_class = c;
                }
            }

            float confidence = obj_conf * best_class_conf;

            if (confidence > max_confidence) {
                max_confidence = confidence;
                best_result.class_id = best_class;
                best_result.confidence = confidence;
                best_result.bbox_x = x;
                best_result.bbox_y = y;
                best_result.bbox_w = w;
                best_result.bbox_h = h;
            }
        }

        return best_result;
    }
};

2. WebRTC Low-Latency Video Streaming

ESP32-S3 WebRTC Implementation:

#include "esp_camera.h"
#include "esp_http_server.h"
#include "esp_websocket_server.h"

#define TAG "WEBRTC_STREAM"

// Camera configuration (OV2640 1080P)
camera_config_t camera_config = {
    .pin_pwdn = -1,
    .pin_reset = -1,
    .pin_xclk = 10,
    .pin_sccb_sda = 40,
    .pin_sccb_scl = 39,
    .pin_d7 = 48,
    .pin_d6 = 11,
    .pin_d5 = 12,
    .pin_d4 = 14,
    .pin_d3 = 16,
    .pin_d2 = 18,
    .pin_d1 = 17,
    .pin_d0 = 15,
    .pin_vsync = 38,
    .pin_href = 47,
    .pin_pclk = 13,
    .xclk_freq_hz = 20000000,
    .ledc_timer = LEDC_TIMER_0,
    .ledc_channel = LEDC_CHANNEL_0,
    .pixel_format = PIXFORMAT_JPEG,
    .frame_size = FRAMESIZE_HD,     // 1280x720
    .jpeg_quality = 12,             // JPEG quality (0-63, lower is better)
    .fb_count = 2,                  // Frame buffer count
    .grab_mode = CAMERA_GRAB_LATEST // Always grab the latest frame
};

// WebSocket client management
typedef struct {
    httpd_handle_t server;
    int fd;
    bool connected;
    uint32_t frame_count;
} webrtc_client_t;

static webrtc_client_t webrtc_clients[4] = {0};

// Initialize camera
esp_err_t init_camera(void) {
    esp_err_t err = esp_camera_init(&camera_config);
    if (err != ESP_OK) {
        ESP_LOGE(TAG, "Camera init failed: %s", esp_err_to_name(err));
        return err;
    }

    // Adjust camera parameters (night vision enhancement)
    sensor_t *s = esp_camera_sensor_get();
    s->set_brightness(s, 1);     // Brightness +1
    s->set_contrast(s, 1);       // Contrast +1
    s->set_saturation(s, 0);     // Saturation 0
    s->set_whitebal(s, 1);       // Auto white balance
    s->set_awb_gain(s, 1);       // Auto white balance gain
    s->set_exposure_ctrl(s, 1);  // Auto exposure
    s->set_aec2(s, 1);           // Auto exposure level 2
    s->set_gain_ctrl(s, 1);      // Auto gain
    s->set_agc_gain(s, 10);      // AGC gain

    ESP_LOGI(TAG, "Camera initialized successfully");
    return ESP_OK;
}

// WebSocket connection handler
esp_err_t webrtc_ws_handler(httpd_req_t *req) {
    if (req->method == HTTP_GET) {
        ESP_LOGI(TAG, "WebSocket handshake");
        return ESP_OK;
    }

    // Find an available client slot
    webrtc_client_t *client = NULL;
    for (int i = 0; i < 4; i++) {
        if (!webrtc_clients[i].connected) {
            client = &webrtc_clients[i];
            client->server = req->handle;
            client->fd = httpd_req_to_sockfd(req);
            client->connected = true;
            client->frame_count = 0;
            break;
        }
    }

    if (!client) {
        ESP_LOGW(TAG, "Maximum WebRTC clients reached");
        return ESP_FAIL;
    }

    ESP_LOGI(TAG, "WebRTC client connected: fd=%d", client->fd);

    // Receive client messages (SDP Offer/ICE Candidate)
    httpd_ws_frame_t ws_pkt;
    memset(&ws_pkt, 0, sizeof(httpd_ws_frame_t));
    ws_pkt.type = HTTPD_WS_TYPE_TEXT;

    uint8_t buffer[1024];
    ws_pkt.payload = buffer;

    esp_err_t ret = httpd_ws_recv_frame(req, &ws_pkt, 1024);
    if (ret != ESP_OK) {
        client->connected = false;
        return ret;
    }

    ESP_LOGI(TAG, "Received WebSocket message: %s", ws_pkt.payload);

    // Handle WebRTC signaling (SDP/ICE)
    // Simplified here; actual implementation requires full WebRTC protocol handling
    handle_webrtc_signaling(client, (char*)ws_pkt.payload, ws_pkt.len);

    return ESP_OK;
}

// Video streaming task (FreeRTOS Task)
void webrtc_streaming_task(void *pvParameters) {
    camera_fb_t *fb = NULL;

    while (1) {
        // Capture camera frame
        fb = esp_camera_fb_get();
        if (!fb) {
            ESP_LOGE(TAG, "Camera capture failed");
            vTaskDelay(pdMS_TO_TICKS(100));
            continue;
        }

        // Send to all connected clients
        for (int i = 0; i < 4; i++) {
            if (!webrtc_clients[i].connected) continue;

            httpd_ws_frame_t ws_frame;
            memset(&ws_frame, 0, sizeof(httpd_ws_frame_t));
            ws_frame.type = HTTPD_WS_TYPE_BINARY;
            ws_frame.payload = fb->buf;
            ws_frame.len = fb->len;

            esp_err_t ret = httpd_ws_send_frame_async(
                webrtc_clients[i].server,
                webrtc_clients[i].fd,
                &ws_frame
            );

            if (ret != ESP_OK) {
                ESP_LOGW(TAG, "Client %d disconnected", i);
                webrtc_clients[i].connected = false;
            } else {
                webrtc_clients[i].frame_count++;
            }
        }

        // Release frame buffer
        esp_camera_fb_return(fb);

        // Control frame rate (30fps = 33ms)
        vTaskDelay(pdMS_TO_TICKS(33));
    }
}

3. Pet Behavior Analysis and Alerts

Behavior Recognition System:

// Node.js behavior analysis service
const { InfluxDB, Point } = require('@influxdata/influxdb-client');
const mqtt = require('mqtt');

class PetBehaviorAnalyzer {
    constructor() {
        this.influxDB = new InfluxDB({
            url: 'http://localhost:8086',
            token: 'your-token'
        });
        this.writeApi = this.influxDB.getWriteApi('pet-monitor', 'behaviors');
        this.queryApi = this.influxDB.getQueryApi('pet-monitor');

        this.mqttClient = mqtt.connect('mqtt://localhost:1883');

        this.behaviorHistory = [];
        this.alertThresholds = {
            prolonged_absence: 120,  // Alert if pet absent for 2 hours
            excessive_barking: 5,    // Continuous barking within 5 minutes
            abnormal_activity: 30    // Abnormal activity for 30 minutes
        };

        this.initMQTT();
    }

    initMQTT() {
        this.mqttClient.on('connect', () => {
            this.mqttClient.subscribe('petcam/+/detection');
            this.mqttClient.subscribe('petcam/+/audio');
        });

        this.mqttClient.on('message', (topic, message) => {
            const data = JSON.parse(message.toString());
            const cameraId = topic.split('/')[1];

            if (topic.includes('detection')) {
                this.analyzeDetection(cameraId, data);
            } else if (topic.includes('audio')) {
                this.analyzeAudio(cameraId, data);
            }
        });
    }

    // Analyze pet detection results
    analyzeDetection(cameraId, detection) {
        const point = new Point('pet_detection')
            .tag('camera_id', cameraId)
            .tag('pet_type', detection.class)
            .floatField('confidence', detection.confidence)
            .floatField('bbox_x', detection.bbox_x)
            .floatField('bbox_y', detection.bbox_y)
            .timestamp(new Date());

        this.writeApi.writePoint(point);

        // Record behavior history
        this.behaviorHistory.push({
            timestamp: Date.now(),
            cameraId,
            type: 'detection',
            data: detection
        });

        // Check for abnormal behaviors
        this.checkAbnormalBehaviors(cameraId);
    }

    // Analyze audio (barking detection)
    analyzeAudio(cameraId, audio) {
        if (audio.barking_detected) {
            const point = new Point('pet_audio')
                .tag('camera_id', cameraId)
                .tag('event_type', 'barking')
                .floatField('volume', audio.volume)
                .timestamp(new Date());

            this.writeApi.writePoint(point);

            // Check for excessive barking
            this.checkExcessiveBarking(cameraId);
        }
    }

    // Check for abnormal behaviors
    async checkAbnormalBehaviors(cameraId) {
        // 1. Check prolonged pet absence
        const lastDetection = await this.getLastDetectionTime(cameraId);
        const timeSinceLastSeen = (Date.now() - lastDetection) / 1000 / 60;  // minutes

        if (timeSinceLastSeen > this.alertThresholds.prolonged_absence) {
            this.sendAlert(cameraId, 'prolonged_absence', {
                message: `Your pet has not appeared on camera for ${Math.floor(timeSinceLastSeen)} minutes`,
                severity: 'medium'
            });
        }

        // 2. Check abnormal activity (frequent movement / completely still)
        const activityLevel = await this.calculateActivityLevel(cameraId, 30);

        if (activityLevel > 0.8) {
            this.sendAlert(cameraId, 'high_activity', {
                message: 'Your pet may be overly excited or anxious',
                severity: 'low'
            });
        } else if (activityLevel < 0.1) {
            this.sendAlert(cameraId, 'low_activity', {
                message: 'Your pet may not be feeling well — activity level has dropped significantly',
                severity: 'medium'
            });
        }
    }

    // Check for excessive barking
    async checkExcessiveBarking(cameraId) {
        const fluxQuery = `
            from(bucket: "behaviors")
                |> range(start: -5m)
                |> filter(fn: (r) => r._measurement == "pet_audio")
                |> filter(fn: (r) => r.camera_id == "${cameraId}")
                |> filter(fn: (r) => r.event_type == "barking")
                |> count()
        `;

        let barkingCount = 0;

        await this.queryApi.queryRows(fluxQuery, {
            next(row, tableMeta) {
                const o = tableMeta.toObject(row);
                barkingCount = o._value;
            },
            complete() {
                if (barkingCount > 10) {  // More than 10 barks within 5 minutes
                    this.sendAlert(cameraId, 'excessive_barking', {
                        message: 'Your pet may be anxious or a visitor may be present',
                        severity: 'medium',
                        count: barkingCount
                    });
                }
            }
        });
    }

    // Calculate activity level metric
    async calculateActivityLevel(cameraId, minutes) {
        const fluxQuery = `
            from(bucket: "behaviors")
                |> range(start: -${minutes}m)
                |> filter(fn: (r) => r._measurement == "pet_detection")
                |> filter(fn: (r) => r.camera_id == "${cameraId}")
                |> derivative(unit: 1m, nonNegative: false)
                |> mean()
        `;

        // Calculate position change rate (activity level)
        return new Promise((resolve) => {
            let activityLevel = 0.5;  // Default value

            this.queryApi.queryRows(fluxQuery, {
                next(row, tableMeta) {
                    const o = tableMeta.toObject(row);
                    activityLevel = Math.abs(o._value);
                },
                complete() {
                    resolve(activityLevel);
                }
            });
        });
    }

    // Send alert
    sendAlert(cameraId, alertType, details) {
        const alert = {
            cameraId,
            type: alertType,
            timestamp: new Date().toISOString(),
            ...details
        };

        // Publish MQTT notification
        this.mqttClient.publish(`petcam/${cameraId}/alerts`, JSON.stringify(alert));

        // Send push notification (integrated with Firebase Cloud Messaging)
        this.sendPushNotification(cameraId, alert);

        console.log(`Alert sent: ${alertType} for camera ${cameraId}`);
    }

    // Send push notification
    async sendPushNotification(cameraId, alert) {
        // Integrated with Firebase Cloud Messaging
        // Actual implementation requires FCM SDK
        console.log(`Push notification: ${alert.message}`);
    }
}

module.exports = PetBehaviorAnalyzer;

Project Results

Technical Metrics

  • Pet recognition accuracy: 96.5% (validated with 10,000+ test images)
  • Inference speed: 150ms/frame (ESP32-S3@240MHz)
  • Video streaming latency: < 300ms (WebRTC)
  • Night vision range: 8 meters (850nm infrared LEDs)
  • Treat dispensing accuracy: 92% (with AI-assisted positioning)
  • Battery life: 30 days standby (alert receiving) / 8 hours continuous viewing

Business Results

  • Units sold: 80,000+
  • Average daily usage: 2.5 hours/day
  • User rating: 4.8/5.0
  • Awarded 2024 CES Innovation Award (Pet Technology category)
  • Monthly active subscribers: 25,000+ (cloud recording plan)

Innovation Highlights

  1. Edge AI real-time detection: Pet recognition performed on-device, no cloud upload needed — protecting user privacy
  2. Behavior analysis engine: AI learns pet habits and automatically detects abnormal behaviors
  3. Interactive treat machine: AI-assisted positioning for precise treat dispensing rewards
  4. Two-way HD audio: Noise-canceling algorithm for crystal-clear pet communication

Technology Stack

Hardware Platform:

  • ESP32-S3 (Xtensa LX7 dual-core 240MHz)
  • OV2640 (2MP camera module)
  • Infrared night vision module
  • Stepper motor (treat dispenser)
  • MEMS microphone + speaker

Edge AI:

  • TensorFlow Lite Micro
  • YOLOv8-Nano (INT8 quantized)
  • EdgeTPU (optional accelerator)

Backend Services:

  • Node.js + Express
  • AWS IoT Core
  • InfluxDB (behavior data)
  • Firebase Cloud Messaging

Frontend Applications:

  • React Native (iOS/Android app)
  • WebRTC (real-time video)
  • React.js (web management dashboard)

Client Testimonial

"BASHCAT's AI pet camera has completely transformed pet monitoring products! Edge AI not only protects user privacy but also significantly reduces our cloud costs. The behavior analysis feature adds a human touch to the product, with user retention 40% higher than competing products. We are extremely satisfied with this collaboration!"

CTO, Pet Technology Company


Project Duration: March 2023 - January 2024 Technical Domains: Edge AI, Computer Vision, IoT, Real-Time Communication

$ ls projects/ai / ml/

More work in AI / ML.