AI Robot Voice Control

Created on: February 4, 2025

Question

so like plz help me Project: Ultra-Advanced AI-Powered Talking and Moving Robot with Web-Based Control Prompt:

This project aims to create a sophisticated AI-powered robot that responds to voice commands, generates audio responses using the Google Gemini API (specifically utilizing features from both Gemini Live 2.0 and the Multimodal Live API), and offers a comprehensive, modern, and user-friendly web interface for both manual and AI-based control.

User Background: The user is a beginner (a class 9 student) with limited or no experience in coding, electronics, or robotics. Therefore, all instructions, explanations, and code comments must be extremely detailed, assuming no prior knowledge.

Core Functionality:

Voice Command Recognition:

The user will interact with the robot primarily through voice commands.

The user will speak into their computer's microphone (initially).

The system should leverage the Gemini Live 2.0 project and potentially the Multimodal Live API for real-time, low-latency audio processing.

The Web Speech API will be used within the web UI for continuous voice recognition (listening until the user stops or a button is pressed).

The robot must understand a variety of voice commands, including but not limited to:

"Move forward"

"Move backward"

"Turn left"

"Turn right"

"Spin"

"Stop"

"Play music"

User-defined commands related to uploaded audio files (e.g., "Play [filename]")

The system must be able to recognize keywords within the user's speech to trigger corresponding actions.

The user should be able to define a system prompt to guide the AI's behavior and personality.

AI-Generated Audio Responses:

The system should utilize the Gemini API (Gemini Live 2.0 and/or Multimodal Live API) to generate contextually appropriate text responses to user voice commands and questions.

The generated text responses must be converted to speech (audio) using the API's text-to-speech capabilities.

The audio responses should be played back to the user through the computer's speakers.

Gemini Live 2.0 Documentation: https://github.com/SreejanPersonal/Gemini-Live-2.0

(Code snippet from Gemini Live 2.0's audio_handler.py showing audio input/output)

...

async def listen_audio(self): """Listens to the microphone input and places audio data into the queue for sending.""" mic_info = self.pya.get_default_input_device_info() audio_stream = self.pya.open( format=FORMAT, channels=CHANNELS, rate=SEND_SAMPLE_RATE, input=True, input_device_index=mic_info["index"], frames_per_buffer=CHUNK_SIZE, ) try: print("Listening... You can speak now.") while True: if not self.ai_speaking: data = await asyncio.to_thread( audio_stream.read, CHUNK_SIZE, exception_on_overflow=False ) await self.audio_in_queue.put(data) else: await asyncio.sleep(0.1) except Exception as e: traceback.print_exc() finally: audio_stream.stop_stream() audio_stream.close() print("Stopped Listening.")

async def play_audio(self): """Plays audio data received from the AI session.""" audio_stream = self.pya.open( format=FORMAT, channels=CHANNELS, rate=RECEIVE_SAMPLE_RATE, output=True, ) try: while True: data = await self.audio_out_queue.get() if not self.ai_speaking: self.ai_speaking = True # AI starts speaking print("Assistant is speaking...") await asyncio.to_thread(audio_stream.write, data) if self.audio_out_queue.empty(): self.ai_speaking = False # AI has finished speaking print("You can speak now.") except Exception as e: traceback.print_exc() finally: audio_stream.stop_stream() audio_stream.close()

...

Use code with caution. Multimodal Live API Documentation: https://ai.google.dev/docs/multimodal_live_api_guide

(Code snippet from Multimodal Live API docs showing text input) import asyncio from google import genai

client = genai.Client(api_key="GEMINI_API_KEY", http_options={'api_version': 'v1alpha'}) model_id = "gemini-2.0-flash-exp" config = {"responseModalities": ["TEXT"]}

async def main(): async with client.aio.live.connect(model=model_id, config=config) as session: while True: message = input("User> ") if message.lower() == "exit": break await session.send(input=message, end_of_turn=True)

text
        async for response in session.receive():
            if response.text is None:
                continue
            print(response.text, end="")

if name == "main": asyncio.run(main()) Use code with caution. Robot Movement:

The robot must be capable of the following movements:

Move forward

Move backward

Turn left

Turn right

Spin in place

Stop

Movement commands will be triggered by:

Voice commands recognized by the Gemini AI.

Manual controls in the web UI.

An ESP32 microcontroller will control the robot's motors.

An L298N motor driver will be used to interface with the motors.

The ESP32 will receive commands from the server (running on the user's computer) over Wi-Fi.

Manual Control via Web UI:

The web UI must provide a visually appealing and intuitive way to control the robot manually.

The UI must be mobile-responsive (usable on different screen sizes).

Gamepad-style controls should be implemented for movement control.

A slider should be included for adjusting the robot's speed.

The UI should dynamically display the connection status of the ESP32.

The UI should be implemented using modern web technologies (HTML5, CSS3 with Bootstrap 5, JavaScript).

Music Playback:

The user should be able to upload audio files (MP3, WAV, OGG) to the server via the web UI.

The UI should display a list of uploaded audio files.

The user should be able to select an audio file and trigger its playback through voice commands (e.g., "Play [filename]") or through UI controls.

The ESP32 will play simple tones representing music through a speaker connected via a TDA2030 amplifier. (Full audio file playback on the ESP32 is beyond the scope of this project due to hardware limitations).

Advanced UI Features:

The web UI must be "ultra-advanced," "modern," and have a best-in-class design. It should be organized into multiple pages accessible through a navigation bar:

Dashboard:

Provides a general overview of the robot's status.

Displays the ESP32 connection status (connected/not connected).

Includes a log area to display a history of commands, AI responses, and system messages.

RC Manual Control:

Contains the gamepad-style controls for manual robot movement.

Includes a speed control slider.

AI Live Chat:

A dedicated page for real-time interaction with the Gemini AI.

Features continuous voice input using the Web Speech API (with start/stop buttons).

Displays a live chat log showing both user input (text and voice transcriptions) and AI-generated responses.

Provides input fields for:

Gemini API Key: (Stored securely, ideally on the server-side)

System Prompt: To customize the AI's personality.

Includes a toggle or dropdown to select between "mobile microphone" (computer's mic) and "ESP32 microphone" (currently a placeholder for future expansion).

Settings:

Allows the user to configure:

ESP32 IP Address: The IP address of the ESP32 on the local network.

Gemini API Key: The user's API key for accessing the Gemini API.

System Prompt: A text prompt that guides the AI's behavior.

Microphone Mode: (Mobile/ESP32 - currently, ESP32 mic is a placeholder)

Media:

Provides a form for uploading audio files to the server.

Displays a list of currently uploaded audio files.

Allows the user to play audio files (currently, the ESP32 will play a representation of music using tones).

Hardware Components:

ESP32 Development Board:

The brain of the robot, responsible for:

Connecting to Wi-Fi.

Running a web server to receive commands.

Controlling the motors via the L298N driver.

Reading the sound sensor.

Generating simple tones for audio output.

L298N Motor Driver:

Interfaces with the ESP32 to control the speed and direction of the DC motors.

4x TT Gear Motors:

Provide the robot's movement.

12V Li-ion Battery Pack:

Powers the motors (through the L298N).

Powers the ESP32 (via the L298N's 5V regulator).

KY-038 or LM393-based Sound Sensor Module:

Detects sound above a certain threshold.

Currently used as a simple trigger (e.g., to start/stop listening).

TDA2030 Amplifier:

Amplifies audio signals from the ESP32's DAC output.

Speaker:

Outputs basic audio (tones) generated by the ESP32.

Jumper Wires:

For connecting all the components.

Computer:

Runs the server-side code (Python, Flask, Gemini API).

Hosts the web UI.

Provides audio input (via the computer's microphone) and output (via the computer's speakers).

Software Architecture:

ESP32 (Arduino):

Programming Language: C/C++ (Arduino)

Functionality:

Connects to the user's Wi-Fi network using the provided credentials.

Implements a web server that listens for incoming HTTP requests on specific endpoints (e.g., /forward, /backward, /stop, /speed, /play).

Controls the motors through the L298N driver based on the commands received from the server.

Reads the digital output of the sound sensor to detect loud sounds.

Generates simple tones using the tone() function and outputs them through the speaker via the TDA2030 amplifier.

Sets the motor speed based on commands received from the server.

Code Structure (robot.ino):

/**********************************************************

Ultra Advanced ESP32 Robot Control - Arduino Code
This sketch creates a web server on the ESP32 to control
motors (with speed control) and play audio via a speaker. **********************************************************/ #include <WiFi.h> #include <WebServer.h>

// ***** CONFIGURE YOUR WIFI CREDENTIALS ***** const char* ssid = "YOUR_WIFI_SSID"; // Replace with your Wi-Fi SSID const char* password = "YOUR_WIFI_PASSWORD"; // Replace with your Wi-Fi password

// Create a web server on port 80: WebServer server(80);

// Motor control pins (adjust to your wiring) const int motorLeft_IN1 = 14; const int motorLeft_IN2 = 27; const int motorRight_IN1 = 26; const int motorRight_IN2 = 25;

// Speaker pin for audio output (using tone) const int speakerPin = 32;

// Global speed variable (0-100) int speedValue = 50;

// --- Motor Control Functions ---

void moveForward() { analogWrite(motorLeft_IN1, speedValue * 2.55); // Scale 0-100 to 0-255 for PWM analogWrite(motorLeft_IN2, 0); analogWrite(motorRight_IN1, speedValue * 2.55); analogWrite(motorRight_IN2, 0); }

void moveBackward() { analogWrite(motorLeft_IN1, 0); analogWrite(motorLeft_IN2, speedValue * 2.55); analogWrite(motorRight_IN1, 0); analogWrite(motorRight_IN2, speedValue * 2.55); }

void turnLeft() { analogWrite(motorLeft_IN1, 0); analogWrite(motorLeft_IN2, 0); analogWrite(motorRight_IN1, speedValue * 2.55); analogWrite(motorRight_IN2, 0); }

void turnRight() { analogWrite(motorLeft_IN1, speedValue * 2.55); analogWrite(motorLeft_IN2, 0); analogWrite(motorRight_IN1, 0); analogWrite(motorRight_IN2, 0); }

void spin() { analogWrite(motorLeft_IN1, speedValue * 2.55); analogWrite(motorLeft_IN2, 0); analogWrite(motorRight_IN1, 0); analogWrite(motorRight_IN2, speedValue * 2.55); }

void stopMotors() { analogWrite(motorLeft_IN1, 0); analogWrite(motorLeft_IN2, 0); analogWrite(motorRight_IN1, 0); analogWrite(motorRight_IN2, 0); }

// --- Audio Playback Function (using tone()) ---

void playMusic() { // Example: Play a simple melody tone(speakerPin, 262, 250); // C4 for 250ms delay(300); tone(speakerPin, 294, 250); // D4 for 250ms delay(300); tone(speakerPin, 330, 250); // E4 for 250ms delay(300); noTone(speakerPin); }

// --- Speed Control Function ---

void setSpeed(int val) { speedValue = val; // If you are using a different method for speed control // (e.g., a motor driver that requires different signals), // update the code here accordingly. }

// --- Web Server Setup and Handlers ---

void setupWiFi() { Serial.begin(115200); Serial.print("Connecting to "); Serial.println(ssid);

WiFi.begin(ssid, password);

while (WiFi.status() != WL_CONNECTED) { delay(500); Serial.print("."); }

Serial.println(""); Serial.println("WiFi connected."); Serial.print("IP address: "); Serial.println(WiFi.localIP()); }

void handleRoot() { server.send(200, "text/plain", "ESP32 Robot is online. Send commands to control it."); }

void handleForward() { moveForward(); server.send(200, "text/plain", "Moving Forward"); }

void handleBackward() { moveBackward(); server.send(200, "text/plain", "Moving Backward"); }

void handleLeft() { turnLeft(); server.send(200, "text/plain", "Turning Left"); }

void handleRight() { turnRight(); server.send(200, "text/plain", "Turning Right"); }

void handleSpin() { spin(); server.send(200, "text/plain", "Spinning"); }

void handleStop() { stopMotors(); server.send(200, "text/plain", "Stopped"); }

void handlePlay() { playMusic(); server.send(200, "text/plain", "Playing Music"); }

void handleSpeed() { if(server.hasArg("value")){ int spd = server.arg("value").toInt(); setSpeed(spd); server.send(200, "text/plain", "Speed set to " + String(spd)); } else { server.send(400, "text/plain", "Speed value missing"); } }

void setup() { pinMode(motorLeft_IN1, OUTPUT); pinMode(motorLeft_IN2, OUTPUT); pinMode(motorRight_IN1, OUTPUT); pinMode(motorRight_IN2, OUTPUT); pinMode(speakerPin, OUTPUT); stopMotors(); setupWiFi();

server.on("/", handleRoot); server.on("/forward", handleForward); server.on("/backward", handleBackward); server.on("/left", handleLeft); server.on("/right", handleRight); server.on("/spin", handleSpin); server.on("/stop", handleStop); server.on("/play", handlePlay); server.on("/speed", handleSpeed);

server.begin(); Serial.println("HTTP server started"); }

void loop() { server.handleClient(); } Use code with caution. Arduino Server (Python):

Programming Language: Python 3.8+

Framework: Flask

Libraries: requests, python-dotenv, Werkzeug, google-genai

Functionality:

Hosts the Web UI: Serves the HTML, CSS, and JavaScript files for the user interface.

Handles API Endpoints:

/command: Receives commands from the web UI (e.g., "forward," "backward," "speed").

Forwards these commands as HTTP requests to the ESP32's web server.

/ai_call: Currently simulates interaction with the Gemini API.

Receives user input (text or voice transcription) from the web UI.

Eventually, this endpoint will be modified to make actual calls to the Gemini API (using the provided API key).

Processes the user input and detects keywords related to robot control or other actions (e.g., "play music").

Generates simulated AI responses based on the detected keywords.

Sends commands to the ESP32 based on the detected keywords (e.g., if "move forward" is detected, send a request to the ESP32's /forward endpoint).

Returns the AI's response to the web UI.

/status: Provides the ESP32's connection status to the web UI.

/uploads/<filename>: Serves uploaded audio files.

Handles File Uploads: Allows users to upload audio files through the Media page.

Manages Configuration: Loads configuration settings (ESP32 IP, API key, system prompt, mic mode) from environment variables or a configuration file (currently uses a global CONFIG dictionary for simplicity).

Code Structure (app.py):

import os import requests from flask import Flask, render_template, jsonify, request, redirect, url_for, send_from_directory from werkzeug.utils import secure_filename from dotenv import load_dotenv

Load environment variables from .env file (if it exists)

load_dotenv()

app = Flask(name) app.config['UPLOAD_FOLDER'] = os.path.join('static', 'uploads') ALLOWED_EXTENSIONS = {'mp3', 'wav', 'ogg'}

--- Global Configuration ---

In a production environment, use a more robust method for storing

configuration, such as a database or a dedicated configuration management system.

CONFIG = { "ESP32_IP": os.getenv("ESP32_IP", "192.168.X.X"), # Replace with your ESP32's IP "API_KEY": os.getenv("GEMINI_API_KEY", ""), # Your Gemini API Key here (or in .env) "SYSTEM_PROMPT": os.getenv("SYSTEM_PROMPT", "You are a helpful assistant."), "MIC_MODE": os.getenv("MIC_MODE", "mobile") # "mobile" or "esp32" (placeholder) }

--- Helper Functions ---

def allowed_file(filename): return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS

def check_esp32_connection(): """Sends a simple GET request to the ESP32 to check if it's reachable.""" try: r = requests.get(f"http://{CONFIG['ESP32_IP']}/", timeout=2) return r.status_code == 200 except Exception: return False

--- Flask Routes ---

@app.route("/") def dashboard(): """Serves the main dashboard page.""" esp32_status = check_esp32_connection() return render_template("dashboard.html", esp32_status=esp32_status)

@app.route("/manual") def manual(): """Serves the manual control page.""" return render_template("manual.html")

@app.route("/live_chat") def live_chat(): """Serves the AI live chat page.""" return render_template("live_chat.html")

@app.route("/settings", methods=["GET", "POST"]) def settings(): """Handles the settings page (GET and POST requests).""" if request.method == "POST": CONFIG["ESP32_IP"] = request.form.get("esp32_ip", CONFIG["ESP32_IP"]) CONFIG["API_KEY"] = request.form.get("api_key", CONFIG["API_KEY"]) CONFIG["SYSTEM_PROMPT"] = request.form.get("system_prompt", CONFIG["SYSTEM_PROMPT"]) CONFIG["MIC_MODE"] = request.form.get("mic_mode", CONFIG["MIC_MODE"]) # In a real application, you'd likely save these settings to # persistent storage (e.g., a database or a configuration file). return redirect(url_for("settings")) return render_template("settings.html", config=CONFIG)

@app.route("/media", methods=["GET", "POST"]) def media(): """Handles the media management page (GET and POST for file uploads).""" message = "" if request.method == "POST": if 'file' not in request.files: message = "No file part" else: file = request.files['file'] if file.filename == '': message = "No selected file" elif file and allowed_file(file.filename): filename = secure_filename(file.filename) file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename)) message = "File uploaded successfully" else: message = "Invalid file type" files = os.listdir(app.config['UPLOAD_FOLDER']) return render_template("media.html", files=files, message=message)

@app.route("/uploads/<filename>") def uploaded_file(filename): """Serves uploaded files.""" return send_from_directory(app.config['UPLOAD_FOLDER'], filename)

@app.route("/command", methods=["POST"]) def command(): """ Receives commands from the UI, forwards them to the ESP32, and returns a response. """ data = request.get_json() cmd = data.get("command") params = data.get("params", "") url = f"http://{CONFIG['ESP32_IP']}/{cmd}" if params: url += f"?{params}" try: r = requests.get(url, timeout=5) return jsonify({"status": "success", "response": r.text}) except Exception as e: return jsonify({"status": "error", "error": str(e)}), 500

@app.route("/ai_call", methods=["POST"]) def ai_call(): """ Handles AI interactions. This is currently a SIMULATION.

text
Replace this with actual calls to the Gemini Multimodal API using your API key.
Refer to the Gemini Live 2.0 and Multimodal Live API documentation for details:
  - Gemini Live 2.0: [https://github.com/SreejanPersonal/Gemini-Live-2.0](https://github.com/SreejanPersonal/Gemini-Live-2.0)
  - Multimodal Live API: [https://ai.google.dev/docs/multimodal_live_api_guide](https://ai.google.dev/docs/multimodal_live_api_guide)
"""
data = request.get_json()
user_input = data.get("input").lower()

# --- Simulated AI Response Logic ---
# (Replace this with real Gemini API interaction)
response_text = "Sorry, I didn't get that. Can you repeat?"  # Default response
command_triggered = None

if "move forward" in user_input:
    response_text = "Okay, moving forward now."
    command_triggered = "forward"
elif "move backward" in user_input:
    response_text = "Sure, moving backward."
    command_triggered = "backward"
elif "turn left" in user_input:
    response_text = "Turning left."
    command_triggered = "left"
elif "turn right" in user_input:
    response_text = "Alright, turning right."
    command_triggered = "right"
elif "spin" in user_input:
    response_text = "Spinning around!"
    command_triggered = "spin"
elif "play music" in user_input:
    response_text = "Let's get this party started! Playing music."
    command_triggered = "play"

# --- Send command to ESP32 if a keyword was detected ---
if command_triggered:
    try:
        requests.get(f"http://{CONFIG['ESP32_IP']}/{command_triggered}", timeout=5)
    except Exception as e:
        print(f"Error sending command to ESP32: {e}")

return jsonify({"response": response_text, "command": command_triggered})

@app.route("/status") def status(): """Provides the ESP32 connection status.""" esp32_status = check_esp32_connection() return jsonify({"esp32_connected": esp32_status})

if name == "main": app.run(debug=True) Use code with caution. Python Web UI (HTML, CSS, JavaScript):

HTML (templates/*.html):

Defines the structure of each page (Dashboard, Manual Control, AI Live Chat, Settings, Media).

Uses Bootstrap 5 for responsive design and layout.

Includes appropriate elements for user input (buttons, text fields, forms, etc.).

Includes placeholders for dynamic content (e.g., ESP32 status, chat log).

CSS (static/css/style.css):

Provides custom styling to enhance the visual appearance of the UI.

JavaScript (static/js/main.js):

Handles user interactions (e.g., button clicks, slider changes).

Sends AJAX requests to the server's API endpoints (e.g., /command, /ai_call, /status).

Updates the UI dynamically based on server responses (e.g., updating the ESP32 status, adding messages to the chat log).

Implements continuous voice recognition using the Web Speech API.

Manages audio file playback (currently simulated on the ESP32 with tones).

Project Setup Instructions (Generated by the Bash Script):

Create Project Directory: The bash script will create a directory named ultimate_ai_robot_project and the necessary subdirectories.

Generate Code Files: The script will generate all the code files (app.py, HTML templates, style.css, main.js, robot.ino) with placeholder content and detailed comments.

Create requirements.txt: A file listing the required Python packages will be created.

Next Steps (Printed by the Bash Script):

Navigate to Project Directory:

cd ultimate_ai_robot_project Use code with caution. Bash Create and Activate Virtual Environment:

python3 -m venv venv source venv/bin/activate # Linux/macOS venv\Scripts\activate # Windows Use code with caution. Bash Install Dependencies:

pip install -r requirements.txt Use code with caution. Bash Configure API Key and ESP32 IP:

Obtain a Google Gemini API key.

Update the API_KEY variable in app.py (or set it as an environment variable in a .env file).

Update the ESP32_IP variable in app.py with the actual IP address of your ESP32 after it's connected to Wi-Fi (you'll get this from the Arduino IDE's serial monitor).

Run the Flask Server:

python app.py Use code with caution. Bash Access the Web UI: Open a web browser and go to http://localhost:5000.

Flash the ESP32:

Open robot.ino in the Arduino IDE.

Update the Wi-Fi credentials (SSID and password) in the code.

Adjust motor control pins and speaker pin if necessary.

Select the correct board and port in the Arduino IDE.

Upload the sketch to your ESP32.

Open the serial monitor in the Arduino IDE to get the ESP32's IP address after it connects to Wi-Fi.

Bash Script to Automate Setup:

#!/bin/bash

Ultimate AI Talking & Moving Robot Project - Automated Setup Script

--- Configuration ---

PROJECT_DIR="ultimate_ai_robot_project"

--- Create Project Directory Structure ---

mkdir -p "$PROJECT_DIR"/{templates,static/css,static/js,static/uploads}

--- Create requirements.txt ---

cat << 'EOF' > "$PROJECT_DIR/requirements.txt" Flask requests python-dotenv Werkzeug google-genai EOF

--- Create Flask server (app.py) ---

(The full app.py code from above goes here)

cat << 'EOF' > "$PROJECT_DIR/app.py"

... (Full app.py code from above) ...

EOF

--- Create HTML Templates ---

(The full HTML code for base.html, dashboard.html, manual.html, live_chat.html, settings.html, media.html goes here)

cat << 'EOF' > "$PROJECT_DIR/templates/base.html"

... (Full base.html code from above) ...

EOF

cat << 'EOF' > "$PROJECT_DIR/templates/dashboard.html"

... (Full dashboard.html code from above) ...

EOF

cat << 'EOF' > "$PROJECT_DIR/templates/manual.html"

... (Full manual.html code from above) ...

EOF

cat << 'EOF' > "$PROJECT_DIR/templates/live_chat.html"

... (Full live_chat.html code from above) ...

EOF

cat << 'EOF' > "$PROJECT_DIR/templates/settings.html"

... (Full settings.html code from above) ...

EOF

cat << 'EOF' > "$PROJECT_DIR/templates/media.html"

... (Full media.html code from above) ...

EOF

--- Create CSS file (static/css/style.css) ---

cat << 'EOF' > "$PROJECT_DIR/static/css/style.css"

... (Full style.css code from above) ...

EOF

--- Create JavaScript file (static/js

Use code with caution. Bash 180.0s contonue

--- Create JavaScript file (static/js/main.js) ---

cat << 'EOF' > "$PROJECT_DIR/static/js/main.js" // --- Utility Functions ---

// Log messages to the console and the log output area on the Dashboard function log(message) { console.log(message); let logElement = document.getElementById("logOutput"); if (logElement) { logElement.innerHTML += message + "<br>"; logElement.scrollTop = logElement.scrollHeight; } }

// Send a command to the backend (and subsequently to the ESP32) function sendCommand(command, params = "") { log("Sending command: " + command + (params ? " with params " + params : "")); fetch("/command", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ command: command, params: params }), }) .then((response) => response.json()) .then((data) => { log("Response: " + JSON.stringify(data)); }) .catch((err) => { log("Error: " + err); }); }

// --- Page-Specific Functions ---

// Update the speed value displayed on the Manual Control page function updateSpeed(value) { document.getElementById("speedValue").textContent = value; sendCommand("speed", "value=" + value); }

// Check the ESP32 connection status and update the UI (called periodically) function checkStatus() { fetch("/status") .then((response) => response.json()) .then((data) => { const statusDiv = document.getElementById("status"); const esp32StatusSpan = document.getElementById("esp32-status"); if (statusDiv && esp32StatusSpan) { if (data.esp32_connected) { statusDiv.classList.remove("alert-danger"); statusDiv.classList.add("alert-success"); esp32StatusSpan.textContent = "Connected"; } else { statusDiv.classList.remove("alert-success"); statusDiv.classList.add("alert-danger"); esp32StatusSpan.textContent = "Not Connected"; } } }) .catch((err) => { log("Error checking ESP32 status: " + err); }); }

// --- AI Live Chat Functions ---

function addChatMessage(sender, message) { let chatLog = document.getElementById("chatLog"); let p = document.createElement("p"); p.innerHTML = "<strong>" + sender + ":</strong> " + message; chatLog.appendChild(p); chatLog.scrollTop = chatLog.scrollHeight; }

// Send a text message from the Live Chat input field function sendChatMessage() { let chatInput = document.getElementById("chatInput"); let msg = chatInput.value.trim(); if (msg) { addChatMessage("You", msg); // Call the AI endpoint (simulated) fetch("/ai_call", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ input: msg }), }) .then((response) => response.json()) .then((data) => { addChatMessage("AI", data.response); }) .catch((err) => { log("Error in AI call: " + err); }); chatInput.value = ""; } }

// --- Initialization ---

document.addEventListener("DOMContentLoaded", function () { // Check ESP32 status on page load and every 5 seconds checkStatus(); setInterval(checkStatus, 5000);

// Send chat message when "Send" button is clicked const sendChatButton = document.getElementById("sendChat"); if (sendChatButton) { sendChatButton.addEventListener("click", sendChatMessage); }

// --- Web Speech API for Voice Input (Live Chat) --- let recognition; const startVoiceButton = document.getElementById("startVoice"); const stopVoiceButton = document.getElementById("stopVoice");

if ( "webkitSpeechRecognition" in window && startVoiceButton && stopVoiceButton ) { recognition = new webkitSpeechRecognition(); recognition.continuous = true; recognition.interimResults = true; recognition.lang = "en-US";

text
recognition.onstart = function () {
  startVoiceButton.disabled = true;
  stopVoiceButton.disabled = false;
};

recognition.onerror = function (event) {
  log("Speech recognition error: " + event.error);
  startVoiceButton.disabled = false;
  stopVoiceButton.disabled = true;
};

recognition.onend = function () {
  startVoiceButton.disabled = false;
  stopVoiceButton.disabled = true;
};

recognition.onresult = function (event) {
  let transcript = "";
  for (let i = event.resultIndex; i < event.results.length; ++i) {
    transcript += event.results[i][0].transcript;
  }
  document.getElementById("chatInput").value = transcript;
};

startVoiceButton.addEventListener("click", function () {
  recognition.start();
});

stopVoiceButton.addEventListener("click", function () {
  recognition.stop();
  // Send the captured voice input as a chat message
  let msg = document.getElementById("chatInput").value.trim();
  if (msg) {
    addChatMessage("You", msg);
    fetch("/ai_call", {
      method: "POST",
      headers: { "Content-Type": "application/json" },
      body: JSON.stringify({ input: msg }),
    })
      .then((response) => response.json())
      .then((data) => {
        addChatMessage("AI", data.response);
      })
      .catch((err) => {
        log("Error in AI call: " + err);
      });
    document.getElementById("chatInput").value = "";
  }
});

} else if (startVoiceButton) { startVoiceButton.disabled = true; log("Speech recognition not supported in this browser."); } }); EOF

--- Create ESP32 Arduino sketch (robot.ino) ---

cat << 'EOF' > "$PROJECT_DIR/robot.ino" /**********************************************************

ESP32 Robot Control - Arduino Code
This code implements a simple web server on the ESP32
to control motors (with speed control) and play audio
via a speaker.
The server listens for commands sent from the Flask
backend (e.g., /forward, /backward, /left, /right,
/spin, /stop, /play, /speed?value=...). **********************************************************/

#include <WiFi.h> #include <WebServer.h>

// --- Configuration (Update these with your settings) ---

// Wi-Fi Credentials const char* ssid = "YOUR_WIFI_SSID"; // Replace with your Wi-Fi SSID const char* password = "YOUR_WIFI_PASSWORD"; // Replace with your Wi-Fi password

// Motor control pins (adjust to your wiring) const int motorLeft_IN1 = 14; const int motorLeft_IN2 = 27; const int motorRight_IN1 = 26; const int motorRight_IN2 = 25;

// Speaker pin for audio output (using tone()) const int speakerPin = 32;

// --- Global Variables ---

// Web server running on port 80 WebServer server(80);

// Current speed (0-100, adjust as needed for your motors) int speedValue = 50;