Deploy an AI Agent
This guide explains how to deploy and integrate Large Language Models with your Hackerbot system. We'll use Google's Gemini API in this example, but the steps apply similarly to other APIs as well
Prerequisites
Latest version of python package installed
Python virtual environment configured
Microphone and speaker connected to your robot
Hackerbot AI + recommended
Access to Google Gemini API (or equivalent LLM provider)
Setup
Move into the Hackerbot tutorials directory where the LLM scripts are located:
cd ~/hackerbot/hackerbot-tutorials/AI
Install the Python libraries required:
pip install -r requirements.txt
# Or with uv
uv pip install -r requirements.txt
Obtain an API Key from Gemini
Go to Google AI Studio.
Sign in with your Google account.
Create a new API Key.
Copy the generated key — you’ll need it for the next step.
Set Up the .env
File
.env
FileCreate a .env
file in the directory ~/hackerbot/hackerbot-tutorials/
if it does not exist:
touch .env
Open .env
and add your Gemini API key:
GOOGLE_API_KEY=your-api-key-here
Customize the AI Agent
The behavior and response format of your Hackerbot AI agent can be customized inside speak_w_gemini.py
(or your main script).
Here are the important sections:
1. Configure the Agent’s Personality
You can set the tone or role of the AI when initializing the chat history:
chat = model.start_chat(history=[
{
"role": "user",
"parts": [{"text":
"You are a colleague named Robby, and you are experiencing Monday blues.\n"
...
}]
},
...
])
Tip: You can modify the personality to make the robot more cheerful, formal, or specialized (e.g., teacher, tour guide, etc.).
Example alternatives:
"You are an enthusiastic personal trainer motivating someone to exercise."
"You are a formal assistant robot trained to help users navigate a warehouse."
2. Configure the agent's voice
In the speak
function inside actions.py
, make sure you load the appropriate Piper TTS model. For details on how to do this, check the documentation here.
3. Configure the Response Format
The Gemini agent is instructed to only respond with raw JSON. This allows the robot to parse actions reliably without extra text.
Example of the prompt instructions:
Respond ONLY with JSON in one of the following formats:
- {"action": "action_name"}
- {"action": "speak", "parameters": {"text": "your text"}}
- or a list of such objects if you want the robot to perform multiple actions.
DO NOT add explanations.
DO NOT use markdown formatting (like triple backticks).
This strict format ensures the robot can easily extract and execute actions from the AI’s response.
4. Add New Supported Actions
Supported actions are listed in the same prompt:
Supported actions are: shake_head, nod_head, look_left, look_right, look_up, look_down, spin_right, spin_left, spin_around, and speak.
If you want to add a new action, you must:
Define the function in
actions.py
:def wave_hand(): print("Waving hand!") # Add your robot command here
Update the
execute_robot_action
function inutils.py
:"new_action": lambda: new_action(bot),
Update the Gemini prompt to include the new action name:
Supported actions are: new_action, shake_head, nod_head, wave_hand, look_left, look_right, ...
This tells Gemini it can now trigger the new action.
Run the Robot Assistant
After everything is configured, start the assistant:
python3 speak_w_gemini.py
The robot will:
Listen for your voice commands
Send them to Gemini
Parse the response
Execute the requested action(s)
Troubleshooting
Authentication Error: Make sure
.env
is correctly set with your API key.Speech Recognition Error: Ensure your microphone is accessible and configured, and
espeak
orespeak-ng
is installed.Action Not Triggering: Confirm the action function exists in
actions.py
and the action name matches the prompt.Gemini Response Invalid: If Gemini returns invalid JSON, double-check your prompt to enforce strict JSON responses.
Summary
By following these steps, you can successfully deploy an LLM-powered interaction system on Hackerbot. You can expand functionality further by adding new actions, switching to other LLM APIs, or enhancing the user input handling.
Last updated