Offline Text LLMs

02 Large offline text model (unimodal)

NameOwnerModifiedCreated
11.02-01 Meta AI: Llama3.2 modelYujiang!2026-01-09 14:512026-01-09 14:51
11.02-02 Aliyun: Qwen3 modelYujiang!2026-01-09 14:512026-01-09 14:51
11.02-03 Microsoft: Phi-4-mini modelYujiang!2026-01-09 14:522026-01-09 14:51
11.02-04 DeepSeek: DeepSeek-R1 modelYujiang!2026-01-09 14:522026-01-09 14:52

11.02-01 Meta AI: Llama3.2 model

Introduction

Meta Llama 3.2, the latest generation of large-scale linguistic models launched by Meta, has been structured to build on Llama 3.1 and has introduced a true multi-modular capability: not only to process and generate text, but also to understand image content, which makes it both linguistic understanding and visual reasoning.

Model size

VariantModalityTypical fit
Llama 3.2 1BTextSmallest local text-only option
Llama 3.2 3BTextPractical local deployment on Jetson
Llama 3.2 11B VisionVision-languageImage understanding with higher compute demand
Llama 3.2 90B VisionVision-languageServer-class deployment rather than single-device use

Performance

Use Llama3.2

Run a model using a run-off command, and if not downloaded locally, olama will download the model before running

bash
ollama run llama3.2:3b

Dialogue Test

bash
who are you?

Closure of the dialogue

Use the Ctrl+d shortcut or XIAITOKEN0 to end the conversation!

References

Ollama.

Official: https://ollama.com/

GitHub: https://github.com/ollama/ollama

Llama 3.2

Official: https://www.llama.com/docs/model-cards-and-prompt-formats/llama3_2/

Ollama Correlation Model: https://ollama.com/library/llama3.2

11.02-02 Aliyun: Qwen3 model

Introduction

Qwen3 is a new generation of large-scale, open-source language model families, launched by the Ali Yun Tun-yu team in 2025, representing an important upgrading of the series in size, reasoning and multilingualism. Qwen3 consists of six dense (Dense) models and two hybrid (MoE) models, ranging from 0.6 B to 235 B parameters, supporting a long context (up to 128 K tokens) and introducing a “mixed reasoning�?model that allows for a smart switch between in-depth thinking (complex task) and rapid response (common task), thus demonstrating greater capability in complex tasks such as logical reasoning, mathematics, coding, etc.

Model size

VariantFamily typeNotes
Qwen3 0.6BDenseSmallest local deployment option
Qwen3 1.7B / 4B / 8BDenseCommon edge and workstation sizes
Qwen3 14B / 32BDenseLarger local or server deployment
Qwen3 30B-A3BMoEMixture-of-experts model with lighter active parameters
Qwen3 235B-A22BMoELargest flagship model for server-scale deployment

Performance

Use Qwen3

Run a model using a run-off command, and if not downloaded locally, olama will download the model before running

bash
ollama run qwen3:8b

Dialogue Test

bash
please tell me a story.

Closure of the dialogue

Use the Ctrl+d shortcut or XIAITOKEN0 to end the conversation!

Ollama.

Official: https://ollama.com/

GitHub: https://github.com/ollama/ollama

Qwen3

GitHub: https://github.com/QwenLM/Qwen3

Ollama Correlation Model: https://ollama.com/library/qwen3

11.02-03 Microsoft: Phi-4-mini model

Introduction

Phi-4-mini is a small, lightweight and efficient language model (Small Language Model) in the Microsoft Phi model series, in a compact version of the Phi-4 family, with approximately 3.8 B parameters, using a decoder-only Transformer structure, and introducing technical designs such as 200 K vocabulary, grouped-query attestation and shared input-output embedded, so that it can also be efficiently reasoned in computing and memory restricted environments, while at the same time supporting super-long contexts (up to 128 K tokens).

Model size

VariantParametersNotes
Phi-4-mini~3.8BCompact reasoning-focused model with long-context support

Model performance

Use the Phi-4-mini model

Run a model using a run-off command, and if not downloaded locally, olama will download the model before running

bash
ollama run phi4-mini:3.8b

Dialogue Test

bash
who are you?

Closure of the dialogue

Use the Ctrl+d shortcut or XIAITOKEN0 to end the conversation!

References

Ollama.

Official: https://ollama.com/

GitHub: https://github.com/ollama/ollama

Phi4-mini

Ollama Correlation Model: https://ollama.com/library/phi4-mini

11.02-04 DeepSeek: DeepSeek-R1 model

Introduction

DeepSeek �?R1 is a large-scale LLM of open reasoning (LLLM), developed by the Chinese AI Laboratory DeepSeek, which, unlike traditional models based on the generation of fluid text, focuses on thinking and solving complex logic, mathematics, programming, and so forth, enhancing “thinking skills�?through intensive learning (RL) training, rather than simply imitating language outputs.

Model size

VariantTypical scaleNotes
DeepSeek-R1 1.5B / 7B / 8BSmall distilled variantsEasiest to test locally
DeepSeek-R1 14B / 32BMedium distilled variantsBetter reasoning quality with higher memory demand
DeepSeek-R1 70B and aboveLarge variantsBetter suited to server-class hardware than a single Jetson

Model performance

Use DeepSeek-R1 model

Run a model using a run-off command, and if not downloaded locally, olama will download the model before running

bash
ollama run deepseek-r1

Dialogue Test

bash
who are you?

Closure of the dialogue

Use the Ctrl+d shortcut or XIAITOKEN0 to end the conversation!

References

Ollama.

Official: https://ollama.com/

GitHub: https://github.com/ollama/ollama

DeepSeek-R1

Ollama Correlation Model: https://ollama.com/library/deepseek-r1