如何在本地运行 LLM 与您的文档进行交互

本文提供了一个全面的教程,指导用户创建私有的本地 LLM 设置来与文档交互,而无需将数据发送到第三方服务器。它引导用户安装 Ollama 来运行模型,以及安装 OpenWebUI 来提供基于浏览器的界面。关键步骤包括根据本地硬件选择并安装合适的 LLM,安装 `nomic-embed-text` 用于文档嵌入,以及配置 OpenWebUI 设置,例如启用 '记忆' 功能并调整分块参数以实现最佳检索效果。该指南还涵盖了创建知识库、上传文档以及使用可选系统提示设置自定义模型以获得一致响应。总体目标是让用户能够在自己的机器上安全地与敏感日志和业务文档进行对话。




Most AI tools require you to send your prompts and files to third-party servers. That’s a non-starter if your data includes private journals, research notes, or sensitive business documents (contracts, board decks, HR files, financials). The good news: you can run capable LLMs locally (on a laptop or your own server) and query your documents without sending a single byte to the cloud.

In this tutorial, you’ll learn how to run an LLM locally and privately, so you can search and chat with sensitive journals and business docs on your own machine. We’ll install Ollama and OpenWebUI, pick a model that fits your hardware, enable private document search with nomic-embed-text, and create a local knowledge base so everything stays on-disk.

Table of Contents

Prerequisites

You’ll need a terminal (all systems—Windows, Mac, Linux—include one, and you can find yours with a quick search), and either Python and pip or Docker, depending on your preferred installation method for OpenWebUI.

Installation

You’ll need Ollama and OpenWebUI. Ollama runs the models, while OpenWebUI gives you a browser interface to interact with your local LLM, like you would with ChatGPT.

Step 1: Install Ollama

Download and install Ollama from its official site. Installers are available for macOS, Linux, and Windows. Once installed, verify it’s running by opening a terminal and executing:

ollama list

If Ollama is running, this will return a list of active models (or an empty list).

Step 2: Install OpenWebUI

You can install OpenWebUI either with Python (pip) or with Docker. Here, we will show how to do it with pip, but you can find instructions for Docker on the official openwebui docs.

Install OpenWebUI with the following command:

pip install open-webui

This works on macOS, Linux, and Windows, as long as you have Python ≥ 3.9 installed.

Next, start the server:

open-webui serve

Then open your browser and go to:

http://localhost:8080

Step 3: Install a Model

Choose a model from the Ollama model list and pull it locally by copying the command provided.

Screenshot of the model download page with an arrow pointing to the upper-right corner box that includes the installation command with a shortcut to copy-paste

For example:

ollama pull gemma3:4b

If you’re unsure which model your machine can handle, ask an AI to recommend one based on your hardware. Smaller models (1B–4B) are safer on laptops.

I would recommend Gemma3 as a starter (you can download multiple models and easily switch between them). Pick the parameter number at the end (“:4b”, “:1b”, and so on) based on this guide:

  • Tier 1 (small laptops or weak computers): RAM ≤8 GB or no GPU → 1B–2B.

  • Tier 2: RAM 16 GB, weak GPU → 2B–4B.

  • Tier 3: RAM ≥16 GB, 6–8 GB VRAM → 4B–9B.

  • Tier 4: RAM ≥32 GB, 12 GB+ VRAM → 12B+.

Once you have installed Ollama and your desired model, confirm that they are active by running ollama list in the terminal:

Image showing the output of running the "ollama list" command (shows the list of downloaded models, in this case "gemma3:1b")

Run WebOpenUI to launch the browser interface with:

open-webui serve

Then head over to http://localhost:8080/. Now you are ready to start using your LLM locally!

Note: it will ask you for login credentials, but these don’t really matter if you only intend to use it locally.

Screenshot of the frontend of a running instance of OpenWebUI, showing the homepage, which includes a text input box in the center with the placeholder "how can I help you today?", and a side panel with the list of previous chats, and links to "search", "notes", "workspace", and "new chat", as well as a setting button. At the top there is a model selector that currently has "gemma3:1b" selected as the model to use.

Settings for Documents

Now we are going to set up everything we need to interact with our local documents. First of all, we need to install the “nomic-embed-text” model to process our documents. Install it with:

ollama pull nomic-embed-text

Note: If you are wondering why we need another model (nomic-embed-text) besides our main one:

  • The embedding model (nomic-embed-text) maps each text chunk from your documents to a numerical vector so OpenWebUI can quickly find semantically similar chunks when you ask a question.​

  • The chat model (for example gemma3:1b) receives your question plus those retrieved chunks as context and generates the natural-language response.

Next, you should enable the “memory” feature if you want the LLM to remember the context of your past conversations in your future ones.

Download the adaptive memory function here. Functions are like plug-ins.

Screenshot showing the page (website) for the "adaptive memory v3" function. It shows a big "get" button, that when clicked opens a pop-up view named "Open WebUI URL" with the current placeholder being "http:localhost:8080" (the default WebUI port) and a button to "import to WebUI" and another one below to "Download as JSON export" in case the first one doesn't work)

Now we will update our settings to enable these features. Click on your name in the bottom-left corner, then “Settings”.

Screenshot showing the menu panel that pops up when clicking on the bottom-left round icon with the user's initital and name, showing a list of options, starting with "Settings" and followed by "Archived Chats", "Playground", "Admin Panel" and "Sign out"

Click on the first one, then go to “Personalization” and enable “Memory”.

“Screenshot of the OpenWebUI settings panel with the Personalization tab open and the Memory toggle switched on for saving past conversation context.”

Now we are going to access the other settings panel (“Admin Panel”). Click again on your name in the bottom-left corner and go to Admin panel → Settings → Documents.

Screenshot of the OpenWebUI Admin → Settings → Documents page, showing a text input field called "Chunk size" currently set to 512

In this section (Admin Panel → Settings → Documents), find the “Embedding” section, go to “Embedding Model Engine” and choose Ollama (find the selectable to the right). Leave the API Key blank.

Now, under “Embedding Model” write nomic-embed-text. Then go to “Retrieval” → enable “Full Context Mode”.

Chunking settings

You should also set the chunk size and overlap. OpenWebUI splits documents into smaller chunks before indexing them, since models can’t embed or retrieve very long texts in one piece.

A good default is 128–512 tokens per chunk, with 10–20% overlap. Larger chunks preserve more context but are slower and more memory-intensive, while smaller chunks are faster but can lose higher-level meaning. Overlap helps prevent important context from being cut off when text is split.

Here’s a guiding table, but I recommend obtaining the recommended values for your specific use case and setup by sharing them (including GPU or laptop model, storage, RAM, and so on) with an LLM like ChatGPT or Claude, as changing the chunking/overlap values later on requires reuploading the documents.

Suggested chunk/overlap by tier

Tier / scenario Typical hardware Chunk size (tokens) Overlap (%) Notes
Tier 1 – constrained ≤8 GB RAM, no/weak GPU 128–256 10–15 Prioritizes speed and low memory use. ​
Tier 2 – mid 16 GB RAM, modest GPU or strong CPU 256–384 15–20 Balanced context vs. performance. ​
Tier 3 – comfortable ≥16 GB RAM, 6–8 GB VRAM 384–512 15–20 More semantics per chunk, still practical. ​
Dense technical PDFs / legal docs Any, but especially Tier 2–3 384–512 15–20 Keeps paragraphs and arguments intact. ​
Short notes, tickets, emails Any 128–256 10–15 Items are small, large chunks not needed. ​
Very long queries, need many retrieved chunks Any with larger context window 256–384 10–15 Smaller chunks fit more pieces into context. ​

Now, the final step: uploading your documents! Go to “Workspace” in the side panel, then “Knowledge”, and create a new collection (database). You can start uploading files here.

Screenshot of the "Workspace" page (after clicking on "workspace" in the side panel) highlight the "Workspace" button on the lefthand side, the "Knowledge" tab being selected from the options at the top within this Workspace page, then "Upload files" which is the first option shown on the list after clicking the "+" (plus) sign button at the right of the text input with the placeholder that says "Search Collection".

Make sure to check for any errors during the upload. Unfortunately, they only show as temporary pop-ups. Some errors might be due to the format of your files, so make sure to check the console for further error logs.

Then, within “Workspace”, switch to the “Models” tab and create a new custom model. Creating a custom model and attaching your knowledge base tells OpenWebUI to automatically search your document collection and include the most relevant chunks as context whenever you ask a question.

Screenshot of the "Workspace" page (after clicking on "workspace" in the side panel), highlighting the first tab/option in the upper menu named "Models", which when clicked shows the list of custom models and an option to create new ones (in this case the user has created one called "Gemma-custom-knowledge")

Here, make sure to select your model (in my case “gemma3:1b”) and attach your knowledge base.

Screenshot of the model creation page, highlighting the selectable options under the "Base model (from)" field, specifically highlighting "gemma3:1b" or the model of choice, under the selected-by-default option "select a base model". The second element highlighted in red is the other field below titled "Knowledge", with a buttom called "Select Knowledge". There are 2 other elements highlighted in yellow (indicating lower priority): the first one is "Model Params" that includes a "system prompt" input field right below, and the other one is "Filters" which includes multiple selectable options depending on the different plugins or "functions" installed.

Screenshot showing the options available after clicking "Select Knowledge" under "Knowledge", highlighting the option that says "COLLECTION" in green followed by the title "Test-knowledge-base" (example title chosen by the author) and the description added by the author ("adding my documents")

(Optional) Adding a system prompt

When creating your custom model in Workspace → Models, you can define a system prompt that the model will use for context throughout all your conversations.

Here are some examples of information you might want to add:

  • context about yourself (“I am a 20-year-old student in bioengineering interested in…”)

  • your preferred communication style (“no fluff", “be direct”, “be analytical”…)

  • context about how your data is structured

Example system prompt:

You are a thoughtful, analytical assistant helping me explore patterns and insights in my personal journals. Be direct, avoid speculation, and clearly distinguish between facts from the documents and interpretation.

This prompt will automatically apply to every chat using this custom model, helping keep responses consistent and aligned with your goals.

How to Run Your LLM Locally

Now open a new chat and make sure to select your custom model:

Screenshot showing the "New chat" page after clicking on the "+" (plus) symbol/button next to the custom model name. It shows the options shown when clicking on the input field that says "Search a model" as a placeholder, and the option highlighted within it is the name of the custom model (in this case the author chose the name "Gemma-custom-knowledge")

Now you are ready to chat with your own docs in a private local environment!

Note: By default, the frontend/browser will stop streaming the response after five minutes, even though it will keep processing your query in the background. This means that if your query takes more than five minutes to process, it will not be displayed on the browser. You can reload the page and click “continue response” to get the latest output.
💡
I recommend installing the Enhanced Context Tracker function (plugin) to get more visibility into the progress of your query.

Conclusion

You now have a private LLM stack (Ollama for models, OpenWebUI for the UI, and nomic-embed-text for embeddings) wired to your on-disk knowledge base. Your journals and business docs stay local; nothing is sent to third parties. The main dials are simple: pick a model that fits your hardware, enable memory and full-context retrieval, use sensible chunk/overlap, and check the console when runs stall.

If you need more headroom, deploy the same setup on your own server and keep the privacy guarantees. From here, iterate on model choice, chunking, and prompts, and add the optional functions if you need deeper visibility during long jobs.


AI 前线

OpenAI 拟收购高管教练人工智能工具 Convogo 核心团队;微软在 Copilot 中引入 AI 购物功能丨 AIGC 日报

2026-1-10 18:31:31

AI 前线

AI 已从神坛进入硬件的重力井,做得“无聊”才有钱赚|2026 CES 观察

2026-1-10 18:31:40

0 条回复 A文章作者 M管理员
    暂无讨论,说说你的看法吧
个人中心
购物车
优惠劵
今日签到
有新私信 私信列表
搜索