LLM fine-tune/deploy demo

smolLMv3 on HuggingFace step-by-step

ZiptieAI (ttbiz)

Jul 26, 2025

DEMO. ANALYZE. DOCUMENT. REPEAT.

Goals of this Substack QS

This substack QS (quick start) describes (very briefly) the steps we (myself and Copilot) used to

Fine-tune a HuggingFace (HF) model smolLMv3.
Deploy the model to HF.
Deploy Gradio (UI) script to HF.

It took one day to create the demo code and test successfully. It took another 3 days to

Document the (somewhat chaotic) workflow.
Create this Substack post.

Conclusion: The challenges in the AI future for devs: Devs will have to

Have very broad systems knowledge.
Effectively maintain control over the immense power of (unintelligent) AI LLMs (to avoid chaos).

For details see Gdrive docx #431.

Workflow:

Final result: https://huggingface.co/spaces/terrytaylorbonn/431_smollmv3:

TL;DR ? Go to chapter “Prerequisites”

Terminology

In this QS:

CPLT = Copilot (from Microsoft).
Co-pilot = generic term for products like CPLT (usually inside VS Code).
HF “space” = HF repo that contains scripts for user app UI (to use a model).
HF “model” = HF repo that contains model files.

Concepts

C1 Wiki AI LLM stacks / sections 3/4
C2 Working with co-pilots
C3 Workflows
C4 Repos (3 total)

C1 Wiki AI LLM stacks / sections 3/4

See diagram below.

Section 3 “Youtube demos”. I previously I focused on this Wiki section (doing Youtube demos, with the help of AI co-pilots). This was good for getting wide exposure the to the ecosystem.

Section 4 “GPT/CPLT demos”. In July 2025 the focus shifted to this section. These are “clean-slate” demos using only AI co-pilots (ChatGPT or MS Copilot) for assistance. This QS (#431) is the latest of these demos.

C2 Working with co-pilots

C2.1 SW dev has changed. SW (and doc) dev is a much more interesting and productive field to be in. It used to be the main qualifications were knowing (from years of trial and error)

Workflow details
How to solve low-level errors

Product design was secondary (or handed off to someone else). Now the SW dev has much more time for focusing on high level requirements.

C2.2 Tech docs will change. I myself have always preferred to read docs that show the exact setup/steps for specific demos, not a general description with lots of verbiage.

Now with SW dev co-pilots, doc content requirements will change. The reader will be working with a trusted co-pilot and therefore needs short and concise workflow/command instructions. The co-pilot can

Explain the details
Fix errors
Provide context directions for the readers specific configuration

C2.3 SW devs must maintain control over co-pilots. This demo almost got away from me. It only took a day to code (with me struggling to keep up with CPLT), but it took a few days to review what happened and figure out some of the details. If you don’t maintain control (understand the code and keep the co-pilot from doing something wrong or overly complex), then you quickly lose control of your project (and it will fail). Co-pilots have no intelligence and are not qualified to be pilots.

C3 Workflows

The following shows the workflows in this QS. Note:

(4) = command (4) in this QS post
“CPLT17” = CPLT prompt 17 (the instructions Copilot gave me)

It’s a bit complicated, because there was an error in step 3. I need to redo this entire demo again without that error and verify.

The 5 parts of this QS are shown in in the diagram below:

Part 1 Run model locally
- 1a Clone (from github) the #431 repo (4) CPLT17.
- 1b Auto-cloned (from HF model repo) LLM model (11, 12, 14) CPLT18
  (auto-cloned when Python scripts run).
Part 2 Fine-tune LLM (15) CPLT18.
Part 3 Deploy the model to HF model repo (16-23) CPLT19-CPLT20. Note: The model had an error (discovered and fixed in part 4).
Part 4
- 4a Test Gradio (HF meta tensor error) (24) CPLT21.
- 4b Fix error and test locally (25-27) CPLT22-CPLT28.
- 4c Redeploy v2 (model_CPLT29) to HF model repo (28) CPLT29.
Part 5
- 5a Create space (HF app code repo),
- 5b Deploy app.py,
- 5c Test (29-37) CPLT30-CPLT34 (BINGO).

The following shows the final successful test.

https://huggingface.co/spaces/terrytaylorbonn/431_smollmv3

C4 Repos (3 total)

See chapters 11 and 12 “C4 Repos” in #431 for more details.

There are 3 repos

R1 GITHUB (python dev)
R2 HF LLM model files
R3 HF SPACE (app.py and final user UI)

The HF website takes a while to get used to.

Its important to keep clear the distinction between

R2 HF model
R3 HFspace

Prerequisites

CPLT17

(1) Microsoft Copilot (or other co-pilot)

You have access to my repo. But there are always errors when setting this up. Your chances for success are vastly better with a co-pilot to fix the errors.

(2) nvidia-smi (verify GPU)

Verify this first. If your GPU not working, it could take a long time to fix (install different drivers, CUDA chaos, etc). You need

NVIDIA GPU (4GB+ VRAM recommended)
Windows 11 + WSL2 or Linux
CUDA 12.0+
Python 3.10+

(3) Create a HF (HuggingFace) account and token with these permissions

Part 1 Run model locally

CPLT17

(4) git clone https://github.com/terrytaylorbonn/431_smolLMv3 431_smolLMv3_QS

(5) cd 431_smolLMv3_QS

(6) code .

(7) python3 -m venv venv

(8) source venv/bin/activate

CPLT18

(9) pip3 install -r requirements.txt (REQUIRED 40mins!)

torch>=2.0.0 transformers>=4.40.0 accelerate>=0.20.0 bitsandbytes>=0.41.0

(10) python3 -c "import torch; print(torch.cuda.is_available())"

Took 30 secs.

(11) python3 smollm_360m.py

Took about 2 mins.

(12) python3 smollm3_simple.py

Took about 2 mins.

About is basically identical.

Part 2: Fine-tune Model

(13) pip3 install -r requirements_finetuning.txt

peft>=0.7.0 datasets>=2.14.0 evaluate>=0.4.0 scikit-learn>=1.3.0

Took 12 mins.

(14) python3 fine_tune_smollm.py

This only took 4 mins. The tuning in this example is just going thru the motions). **Expected**: Training loss decreases from ~3.7 to ~3.1 over 3 epochs.

(15) python3 test_finetuned_model.py

Took 10 mins

NOTE: I did not add sample data (I forgot!). You should probably do this to verify.

Part 3: Deploy to HF (model repo)

CPLT19-CPLT20

(16) git config –global credential.helper store

(17) git config –global get credential.helper store

(18) python3 deplo_to_hf.py

(19) Enter repo name: 431_smollm-model

(20) N

(21) Enter token

Note: Be careful when entering the token. Do this:

CTRL-C the token,
Right-click into the entry location, then
CTRL-V to copy.

(22) Y

(23) Open the HF model repo URL

Part 4: Test Gradio UI local

(GRADIO FAIL / MODEL MERGED / REPLOY TO HF)

CPLT21

(24) Create script files (see GIT REPO)

A simple Gradio interface gradio_demo.py
Requirements file for the Gradio demo: requirements_gradio.txt.

(25) pip install -r requirements_gradio.txt

gradio>=4.0.0 torch>=2.0.0 transformers>=4.40.0 accelerate>=0.20.0

CPLT22-CPLT27 Fix the meta tensor error

(26) python3 simple_gradio_demo.py

CPLT28

(27) Load merged_model

(28) Enter prompt (BINGO)

CPLT: “Summary of what we accomplished:”

1 Diagnosed the issue - Your model was a LoRA/PEFT model that needed special loading

2 Created a merged model - Combined base model + your fine-tuning into a standalone model

3 Fixed the Gradio demos - Resolved the dict/input_ids bug

4 Successfully tested online - Your model is generating quality responses

TT29

(29) Redeploy (fixed/merged) model CPLT29 to HF

MODEL CPLT29 https://huggingface.co/terrytaylorbonn/431_smollmv3_CPLT29

https://huggingface.co/terrytaylorbonn/431_smollmv3_CPLT29/tree/main

Part 5: Deploy app.py to HF spaces (code repo) and test (BINGO BINGO)

CPLT30-CPLT34

(30) Go to the Spaces Page

Visit: https://huggingface.co/spaces

(31) Click “Create new Space”

You need to be logged in to your HuggingFace account.

(32) Fill in the details

(33) Create the space (repo)

SPACE huggingface.co/spaces/terrytaylorbonn/431_smollmv3

(34) Click “Files and versions” > “Add file” > “Create new file”

(35) Add code for app.py

NOTE:

Do NOT use gradio_demo.py directly—it expects a local model path and may have device logic that’s not ideal for Spaces.
Use the template I provided earlier (or below), which is designed for Spaces and loads your model from the HuggingFace Hub, not from disk.

Ready-to-paste app.py for your Space:

NOTE: Verify app.py HF model repo (see LINE 6) / must match your model repo name

Get the code from #431 (search for “code for app.py”).

→ Paste this as app.py in your Space.

(36) Optional: Add print statements (for debugging)

app.py is available at either of these links (Verify app.py HF model repo (see LINE 6))

https://huggingface.co/spaces/terrytaylorbonn/431_smollmv3/resolve/main/app.py
#431 (search for “app.py with print statements”).

(37) Create requirements.txt

For a HuggingFace Space, you only need the packages required to run the demo, not to fine-tune.
Use the content of requirements_gradio.txt (or just these lines):

gradio torch transformers

Do NOT include peft, datasets, or scikit-learn unless your Space needs to fine-tune or evaluate models (which it does not for inference/chat).

(38) Test (BINGO)

https://huggingface.co/spaces/terrytaylorbonn/431_smollmv3

DEMO. ANALYZE. DOCUMENT. REPEAT.

LLM fine-tune/deploy demo

smolLMv3 on HuggingFace step-by-step

Goals of this Substack QS

TL;DR ? Go to chapter “Prerequisites”

TOC

Terminology

Concepts

C1 Wiki AI LLM stacks / sections 3/4

C2 Working with co-pilots

C3 Workflows

C4 Repos (3 total)

Prerequisites

CPLT17

(1) Microsoft Copilot (or other co-pilot)

(2) nvidia-smi (verify GPU)

(3) Create a HF (HuggingFace) account and token with these permissions

Part 1 Run model locally

CPLT17

(4) git clone https://github.com/terrytaylorbonn/431_smolLMv3 431_smolLMv3_QS

(5) cd 431_smolLMv3_QS

(6) code .

(7) python3 -m venv venv

(8) source venv/bin/activate

CPLT18

(9) pip3 install -r requirements.txt (REQUIRED 40mins!)

(10) python3 -c "import torch; print(torch.cuda.is_available())"

(11) python3 smollm_360m.py

(12) python3 smollm3_simple.py

Part 2: Fine-tune Model

(13) pip3 install -r requirements_finetuning.txt

(14) python3 fine_tune_smollm.py

(15) python3 test_finetuned_model.py

NOTE: I did not add sample data (I forgot!). You should probably do this to verify.

Part 3: Deploy to HF (model repo)

CPLT19-CPLT20

(16) git config –global credential.helper store

(17) git config –global get credential.helper store

(18) python3 deplo_to_hf.py

(19) Enter repo name: 431_smollm-model

(20) N

(21) Enter token

(22) Y

(23) Open the HF model repo URL

Part 4: Test Gradio UI local

CPLT21

(24) Create script files (see GIT REPO)

(25) pip install -r requirements_gradio.txt

CPLT22-CPLT27 Fix the meta tensor error

(26) python3 simple_gradio_demo.py

CPLT28

(27) Load merged_model

(28) Enter prompt (BINGO)

CPLT: “Summary of what we accomplished:”

1 Diagnosed the issue - Your model was a LoRA/PEFT model that needed special loading

2 Created a merged model - Combined base model + your fine-tuning into a standalone model

3 Fixed the Gradio demos - Resolved the dict/input_ids bug

4 Successfully tested online - Your model is generating quality responses

TT29

(29) Redeploy (fixed/merged) model CPLT29 to HF

MODEL CPLT29 https://huggingface.co/terrytaylorbonn/431_smollmv3_CPLT29

https://huggingface.co/terrytaylorbonn/431_smollmv3_CPLT29/tree/main

Part 5: Deploy app.py to HF spaces (code repo) and test (BINGO BINGO)

CPLT30-CPLT34

(30) Go to the Spaces Page

(31) Click “Create new Space”

(32) Fill in the details

(33) Create the space (repo)

SPACE huggingface.co/spaces/terrytaylorbonn/431_smollmv3

(34) Click “Files and versions” > “Add file” > “Create new file”

(35) Add code for app.py

NOTE:

Ready-to-paste app.py for your Space:

NOTE: Verify app.py HF model repo (see LINE 6) / must match your model repo name

(36) Optional: Add print statements (for debugging)

(37) Create requirements.txt

(38) Test (BINGO)

https://huggingface.co/spaces/terrytaylorbonn/431_smollmv3

Discussion about this post