Jetson Examples Quick Start
Introduction
The jetson-examples repository by Seeed Studios provides pre-built, ready-to-run AI examples for NVIDIA Jetson devices. Instead of spending hours setting up environments and dependencies, you can deploy complex AI models with a single command using Docker containers.
This module shows you how to use the jetson-examples CLI tool to quickly deploy LLMs and other AI models on your Jetson device.

What is jetson-examples?
jetson-examples is a command-line tool that:
- Provides one-line deployment for popular AI models
- Uses Docker containers for isolated, reproducible environments
- Includes pre-optimized configurations for Jetson hardware
- Supports multiple frameworks: PyTorch, TensorRT, Ollama, and more
| Feature | Description |
|---|---|
| Easy Installation | Install with pip3 install jetson-examples |
| One-Command Run | reComputer run to deploy |
| Docker-based | Consistent environments across devices |
| Auto-Configuration | Pre-tuned for Jetson performance |
| Regular Updates | New examples added regularly |
Installation
Install the CLI Tool
# Install jetson-examples via pip
pip3 install jetson-examples
# Verify installation
reComputer --helpAlternative: Manual Docker Setup
If you prefer manual control:
# Install Docker (if not already installed)
sudo apt-get update
sudo apt-get install -y docker.io
# Add user to docker group (requires logout/login)
sudo usermod -aG docker $USER
# Verify Docker
docker --version
Available LLM Examples
Here are the LLM-related examples available in the jetson-examples repository:
Text-Only LLMs
| Example | Type | Model Size | Command |
|---|---|---|---|
| llama3 | Text (LLM) | 4.9GB model | reComputer run llama3 |
| gemma4 | Text (LLM) | 2.5GB model | reComputer run gemma4 |
| qwen3.5-4b | Text (LLM) | 2.5GB model | reComputer run qwen3.5-4b |
| qwen3.6-35b | Text (LLM) | 28GB model | reComputer run qwen3.6-35b |
| nemotron-3-nano | Text (LLM) | 24.5GB model | reComputer run nemotron-3-nano |
| text-generation-webui | Text (LLM) | 3.9GB model | reComputer run text-generation-webui |
Vision-Language Models (VLM)
| Example | Type | Model Size | Command |
|---|---|---|---|
| llava-v1.5 | VLM | 13GB model | reComputer run llava-v1.5-7b |
| llava-v1.6 | VLM | 13GB model | reComputer run llava-v1.6-vicuna-7b |
| live-vlm-webui | VLM | 13GB model | reComputer run live-vlm-webui |
| gemma4 | VLM | 2.5GB model | reComputer run gemma4 |
Audio & Speech
| Example | Type | Model Size | Command |
|---|---|---|---|
| whisper | ASR (Speech-to-Text) | 1.5GB model | reComputer run whisper |
| parler-tts | TTS (Text-to-Speech) | 6.9GB image | reComputer run parler-tts |
Inference Server
| Example | Type | Model Size | Command |
|---|---|---|---|
| ollama | Inference Server | 10.5GB image | reComputer run ollama |
Getting Started
Example 1: Running Ollama
The quickest way to start with LLMs on Jetson:
# Deploy Ollama with Docker
reComputer run ollamaOnce running, you can pull and run models:
# Enter the Docker container
docker exec -it ollama bash
# Pull and run a model
ollama run llama3.2:3bExample 2: Running a Vision-Language Model
# Deploy LLaVA for image understanding
reComputer run llava-v1.5-7bThis sets up a model that can understand and describe images.
Example 3: Deploying Text Generation WebUI
For a full-featured web interface to interact with LLMs:
# Deploy text-generation-webui (Oobabooga)
reComputer run text-generation-webuiAccess the web interface at http://localhost:5000.
Example 4: Deploying Whisper for Speech Recognition
# Deploy OpenAI Whisper for speech-to-text
reComputer run whisperPractical Example: Building a Chatbot
Let's put it all together to create a simple chatbot using jetson-examples:
Step 1: Deploy Ollama
reComputer run ollamaStep 2: Pull a Model
# Check running containers
docker ps
# Enter the Ollama container
docker exec -it ollama ollama pull llama3.2:3bStep 3: Test via API
# Test the chat API
curl http://localhost:11434/api/chat -d '{
"model": "llama3.2:3b",
"messages": [
{"role": "user", "content": "What can you help me with?"}
],
"stream": false
}'Step 4: Python Integration
import requests
def chat_with_model(user_message, model="llama3.2:3b"):
response = requests.post(
"http://localhost:11434/api/chat",
json={
"model": model,
"messages": [{"role": "user", "content": user_message}],
"stream": False
}
)
return response.json()["message"]["content"]
# Interactive chat
while True:
user_input = input("You: ")
if user_input.lower() in ['quit', 'exit', 'q']:
break
response = chat_with_model(user_input)
print(f"AI: {response}\n")Managing Containers
Common Docker Commands
# List running containers
docker ps
# List all containers (including stopped)
docker ps -a
# View container logs
docker logs <container-name>
# Stop a container
docker stop <container-name>
# Restart a container
docker restart <container-name>
# Remove a container
docker rm <container-name>
# Enter a running container
docker exec -it <container-name> bashResource Monitoring
# Monitor container resource usage
docker stats
# Check disk usage
docker system dfCustomizing Deployments
Environment Variables
Many examples support configuration via environment variables:
# Example: Setting model parameters for Ollama
docker run -d \
--name ollama \
--gpus all \
-e OLLAMA_HOST=0.0.0.0:11434 \
-p 11434:11434 \
ollama/ollama:latestVolume Mounts
Persist data between container restarts:
# Mount a local directory for model storage
docker run -d \
--name ollama \
--gpus all \
-v /path/to/models:/root/.ollama \
-p 11434:11434 \
ollama/ollama:latestNetwork Configuration
Access services from other devices on your network:
# Bind to all interfaces (not just localhost)
docker run -d \
--name ollama \
--gpus all \
-p 0.0.0.0:11434:11434 \
ollama/ollama:latestPerformance Tips
1. Use GPU Passthrough
Always ensure Docker has GPU access:
# Verify GPU access in container
docker exec -it <container-name> nvidia-smi2. Allocate Appropriate Memory
# Limit container memory (optional, for multi-container setups)
docker run -d \
--name ollama \
--gpus all \
--memory=12g \
--memory-swap=16g \
ollama/ollama:latest3. Use SSD Storage
Models load faster from SSD than SD card or eMMC.
Common Issues and Solutions
Issue 1: Docker Permission Denied
Problem: Got permission denied while trying to connect to the Docker daemon socket
Solution:
# Add user to docker group
sudo usermod -aG docker $USER
# Log out and log back in for changes to take effect
# Or run in current session:
newgrp dockerIssue 2: Container Won't Start
Problem: Container exits immediately after starting
Solution:
# Check container logs for errors
docker logs <container-name>
# Check available resources
free -h
df -h
nvidia-smiExample Repositories and Projects
Explore these examples from the jetson-examples repository:
llama-factory
Fine-tune LLMs on your Jetson device:
reComputer run llama-factoryModel/data size: 13.5GB
live-vlm-webui
Real-time vision-language model interface:
reComputer run live-vlm-webuiFeatures: Live camera feed processing with AI understanding
deep-live-cam
Real-time face swap and processing:
reComputer run deep-live-camPractice Exercise
- Install jetson-examples tool
- Deploy Ollama and run
llama3.2:3b - Deploy LLaVA for image understanding
- Create a Python script that interacts with the deployed model
- Set up Open WebUI for a graphical interface
- Explore other examples in the repository
References
- jetson-examples Repository - Official repository
- jetson-examples Documentation - Complete example list
- Docker Documentation - Docker fundamentals
- NVIDIA Container Toolkit - GPU support in Docker
- Ollama Documentation - Ollama usage guide
Next: Continue to Module 5.6: Building ASR + LLM + TTS Pipeline to create a complete voice assistant!