Suzanne
A well-researched student project. Like very.
Template description
This repository contains the starter materials for your thesis in Computer Science 600 and 610. The main directory of this repository contains the Markdown template for a project designed for use with GitHub Classroom. To learn more about the course in which these assignments were completed, please refer to the README.md file.
The template specifies various settings in the _metadata.yml file included in the repository. Change the appropriate values under the Project-specific values heading. Changing other values outside of that section may cause the project to fail to build. Modify these values at your own risk.
Author your thesis in the index.md document using appropriate Markdown hierarchy and syntax; GitHub Actions will automatically create a PDF from the relevant files.. Consult the README of the proposal repository to learn how to properly build and release this PDFs.
Citations and references
Including references throughout requires a specific pseudo-Markdown tag, demonstrated in the following blockquote. (Inspect the thesis.md file to see the format.)
A citation, when included correctly, will appear as it does at the end of this sentence. [plaat1996research?]
Labeling figures
To label a figure (i.e. an image), referencing the image using correct Markdown will automatically caption the figure:
Labeling tables
To provide a label for a table, write a short caption for the table and prefix the caption with Table: as in the example below:
Table: A two-row table demonstrating tables
|Row number | Description |
|:----------|:------------|
|1 |Row 1 |
|2 |Row 2 |
Other template information
Two things specific to this template to also keep in mind:
- It is your responsibility to remove this description section before building the PDF version you plan to defend.
- References will only appear if cited correctly in the text
Note on LaTeX commands
Documents may include specific LaTeX commands in Markdown. To render these, surround the commands with markup denoting LaTeX. For example:
Checkmark character: $\checkmark$
Superscript character: $^{\dag}$
If using a special package not included in the template, add the desired LaTeX package or command/macro to the header-includes property in config.yaml.
Should this package not be included in the environment shipped with this template, you may also need to add the package to the GitHub Actions Workflow.
Direct any questions about issues to your first reader.
Introduction
Project overview
Blender is a free, open-source 3D creation suite used for modeling, animation, effects, simulation, and rendering [3, 12]. It powers professional production pipelines and is increasingly used in research, engineering, and higher education. Studies show that Blender’s Python architecture, open-source licensing, and advanced rendering capabilities make it suitable even for scientific and engineering workflows, such as generating synthetic digital image correlation images for computational experiments [11]. Because Blender is both powerful and freely available, it is widely adopted by students, independent artists, and early-career creators building portfolios on limited budgets.
However, Blender’s strength comes with a cost: a steep learning curve. Prior analyses of Blender’s interface highlight how beginners struggle with its dense layout, multi-editor environment, and mode-dependent tool system [12]. New users must manage concepts such as object versus mesh data, operator conventions, modifier order, shading settings, and Python-driven tools long before they can produce high-quality work. This can slow progress, reduce confidence, and limit the number of completed portfolio pieces.
This project introduces Suzanne, a Blender add-on that lives in the right-hand N-panel and provides short, numbered, in-viewport steps for common tasks. Instead of searching externally for guidance while working, users receive instructions directly beside the 3D Viewport. The goal is to reduce context switching, help users execute reliably, and increase the number of polished artifacts that students and independent artists can publish.

Key terms and concepts (grounding the reader)
N-panel. A vertical sidebar toggled with N in the 3D Viewport hosting add-ons and tools. Suzanne is located here to keep guidance directly inside the creative workspace [3].
Mode. Blender tools are mode-specific (e.g., Object Mode vs. Edit Mode). Many operators behave differently or are unavailable depending on the mode, making explicit mode requirements essential in step-by-step guidance [3, 12].
Operator. Any action invoked by menus, buttons, shortcuts, or the F3 search. Naming operators (e.g., Mesh > Normals > Recalculate Outside) is key for reproducibility and structured documentation [3].
Modifier stack. A series of non-destructive operations whose order changes results. For example, applying Bevel before Subdivision Surface produces a different silhouette and shading than the reverse [3].
Grounding. Retrieval-Augmented Generation (RAG) combines large language models with authoritative sources. Integrating RAG ensures instructional steps match Blender’s official terminology and correct behavior [6].

Motivation (why this matters for students and portfolios)
In my personal experience learning Blender over the past five years, most challenges involved “micro-execution”—not understanding the concept of what to do, but figuring out which operator to call, what mode to be in, and what order to perform actions. Research has shown that Blender’s growing scope and tool density can overwhelm newcomers and slow their learning process [12]. Even small tasks such as beveling edges without shading artifacts or setting up lighting often require watching multiple tutorials or searching through forums.
Meanwhile, early-career 3D creators—students, hobbyists, and emerging artists—grow primarily through their portfolios. Recruiters and instructors evaluate:
- Clean topology visible in wireframe or clay renders
- High-quality lighting and presentation
- Turntables and breakdowns
- UV layouts and readable materials
- Clear process documentation
Students typically post these artifacts to ArtStation, Behance, GitHub Pages, or social media. However, producing consistent, high-quality work requires fluid execution, and execution is often slowed by searching for instructions outside the application.
Suzanne aims to address this gap: deliver high-clarity, minimal-step instructions inside Blender, grounded on authoritative documentation and consistent terminology.
This approach is supported by educational research showing that AI-based learning tools improve cognitive outcomes when instructional content is concise, context-specific, and directly actionable [9]. Suzanne follows these principles by placing guidance beside the active viewport and presenting it as small, verifiable steps that can be performed immediately.
Problem statement (what gap this work addresses)
Blender learners lose time, motivation, and project momentum because most guidance lives outside the application, spread across long YouTube videos, scattered forum posts, or generic documentation. These sources are rarely tailored to the user’s current mode, object selection, or workflow context. As a result, beginners struggle with:
- Mode confusion
- Misordered modifiers
- Shading artifacts
- Inconsistent steps from mixed-version tutorials
- Difficulty reproducing actions from memory
The core gap is micro-execution: mode, operator, panel path, and modifier order.
This project addresses that gap by building an in-viewport assistant that:
- Returns verifiable steps inside the N-panel
- Grounds instructions on the official Blender Manual
- Uses retrieval techniques aligned with RAG best practices to maintain correctness [6]
- Supports small troubleshooting branches to handle common issues
Because Blender is increasingly used in research and engineering [11], reliable execution and clear documentation also support academic reproducibility.
Project goals (what this thesis will deliver)
Step-based teaching. Provide concise, numbered instructions that reflect Blender’s operator names and panel paths, always stating prerequisites (mode, selection, object type).
Documentation grounding. Integrate retrieval over the Blender Manual to maintain terminology accuracy and consistency with Blender’s UI labels [3, 6].
Learning iteration. Support follow-ups like next step, repeat this, and corrective branches (e.g., faceting → Shade Smooth → Auto Smooth → Recalculate Normals).
Optional code execution with guardrails. Present small Python snippets that can be safely executed after user confirmation, leveraging Blender’s scripting environment [11].
Portfolio-driven defaults. Provide guidance for modeling, lighting, and rendering workflows commonly used in portfolio pieces.
Evaluation. Assess Suzanne through software verification and task-based Blender experiments focused on reliability, instructional clarity, and workflow support.
Assumptions, limitations, and delimitations
Assumptions
This project rests on several practical assumptions about the environment in which Suzanne is used and the people who choose to install it:
Installed Blender and access to the N-panel.
It is assumed that users already have a working installation of Blender and understand how to access the right-hand N-panel in the 3D Viewport [3]. Suzanne is implemented as an N-panel add-on rather than a standalone application, so users must be able to enable add-ons, save .blend files, and navigate basic interface regions. The project does not attempt to teach operating-system installation, GPU drivers, or basic Blender navigation from scratch.Informed consent for API usage.
When users submit text or audio to Suzanne’s AI features, those requests are processed through a third-party API. The system assumes that users are aware of this and consent to it at the point of configuration—specifically, when they paste an API key into the add-on preferences and enable AI-driven features. The design presumes that users (or instructors, in a classroom setting) have reviewed relevant institutional or personal policies regarding the use of external AI services.English-language interface as the baseline.
The initial release assumes that Blender’s UI is set to English and that users can work with English operator names, menu labels, and instructions. Suzanne’s retrieval pipeline targets the English Blender Manual [3], and the language model is prompted to mirror that vocabulary. Multilingual support and localization are identified as important directions for future work but are not treated as requirements for the first version.Intermediate technical comfort.
Although Suzanne targets novices in terms of Blender skill, it assumes a basic level of technical comfort: users can install add-ons, manage API keys, and understand the idea of “saving before running code.” The system does not, for example, guide users through package manager installation, shell configuration, or advanced debugging.
Limitations
Despite careful design choices, Suzanne has several important limitations that shape how its results should be interpreted:
Residual inaccuracy in LLM-generated steps.
Grounding on the Blender Manual and using retrieval-augmented generation improves factuality and terminology alignment, but it does not eliminate mistakes [6]. The model may still suggest slightly outdated menu paths, omit necessary steps, or assume a different initial selection than the user has. Users are therefore advised to treat Suzanne’s instructions as high-quality suggestions, not guaranteed truths, and to cross-check against the Manual or their own experience when something appears wrong.Restricted code execution scope.
For safety reasons, code execution is intentionally limited to a safe subset of Blender’s Python API, focusing on object creation, transforms, lights, cameras, and shader nodes. While this reduces the risk of destructive or security-sensitive operations, it also means that Suzanne cannot assist with every possible automation scenario. Advanced scripting tasks—such as complex rigging tools, file I/O, or integration with external render farms—are explicitly out of scope for the current version.Version- and hardware-dependent behavior.
Blender evolves rapidly, and operator locations, defaults, or UI layouts can change between releases [3]. Suzanne targets a specific range of Blender versions during development; outside that range, some instructions may no longer match the interface exactly. Similarly, behavior can vary by platform (Windows, macOS, Linux) and hardware configuration (GPU vs CPU, different input devices). The project cannot guarantee identical behavior across all environments, and some user-reported issues may stem from these external differences.Limited awareness of scene context.
Suzanne has only partial visibility into the user’s scene state. While future versions might integrate deeper inspection tools, the current implementation often infers context from user prompts and a small set of requested scene details. This can lead to misalignment when the scene contains unusual setups (e.g., non-standard hierarchies, heavily customized keymaps, or add-on-specific data structures).Evaluation scope.
The evaluation described later in this thesis is constrained to a limited number of tasks, users, and time. Results about time-on-task, perceived usefulness, or error reduction should be interpreted as initial evidence, not definitive proof of general effectiveness across all Blender workflows or learner populations.
Delimitations
In addition to inherent limitations, the project also includes deliberate design choices that narrow its scope. These delimitations are not weaknesses, but boundaries set so the work remains feasible and coherent:
Focus on learnability, modeling, shading, and presentation.
Suzanne is explicitly aimed at core workflows that support beginner and intermediate portfolio pieces: modeling (especially hard-surface and simple organic forms), shading, lighting, and basic presentation (turntables, still renders). Advanced areas such as character rigging, complex simulation (fluids, cloth, smoke), geometry nodes systems, and compositing are intentionally left out of the initial feature set. This allows the project to concentrate on the “bread-and-butter” tasks that most early-career artists must master to build a credible portfolio.No live microphone-based transcription in v0.x.
Although speech-to-text could further reduce friction for some users, the current version does not implement live microphone capture or continuous audio transcription. All prompts are entered as text, and any audio-based features are limited to explicitly uploaded or recorded snippets. This avoids additional privacy and consent complexity and keeps the interaction model simpler for evaluation.No end-user analytics or behavioral tracking.
Suzanne does not collect telemetry or analytics about how users interact with the add-on. There are no built-in dashboards tracking which prompts are most common, which steps cause difficulty, or how often scripts are executed. While such data could be valuable for iterative design and research, it would also introduce significant privacy and governance concerns. Instead, this thesis relies on local software verification and authored task demonstrations rather than behavioral tracking or human-subject data collection.No human-subject dataset in this thesis. The thesis does not report survey responses, timing logs, or other human-subject study data. Claims are therefore limited to implemented system behavior, automated reliability checks, and the documented task-based experiments presented later. Formal user studies remain future work.
Single primary documentation source.
For grounding, Suzanne relies primarily on the Blender Manual and does not, in this version, integrate other textual sources such as third-party books, course notes, or forum archives [3]. This delimitation keeps the retrieval pipeline manageable and the terminology consistent, but it also means that insights from community best practices (for example, from popular tutorial series or studio pipelines) are only reflected indirectly through prompt design, not through direct retrieval.
By making these assumptions, limitations, and delimitations explicit, the thesis clarifies the conditions under which Suzanne is expected to work well and the boundaries beyond which its claims should be treated cautiously. Later chapters return to these points when interpreting evaluation results and outlining directions for future work.
Ethical considerations (narrative)
Privacy and consent
Submitted prompts and audio files are processed through a third-party API. Research shows that large language models (LLMs) face serious risks related to privacy, data leakage, and unintended memorization of sensitive content, even when providers claim to filter or anonymize data [5]. In the context of student work and personal projects, leaked prompts could reveal identifying details, coursework, or unpublished research, including descriptions of in-progress thesis ideas, screenshots of original models, or references to real people. Since Blender is often used to create highly personal or autobiographical work, these risks are not abstract; a prompt describing a “self-portrait scene in my dorm room at Allegheny” can easily become identifying if mishandled.
To reduce these risks, Suzanne adopts a local-first design wherever possible. API keys are stored only on the user’s machine, inside Blender’s add-on preferences, and are never transmitted to any external server controlled by the add-on developer. Suzanne does not implement its own logging of prompts or responses; once a session ends, there is no add-on-level history of user queries. The only data sent to the third-party provider is the text and/or audio that the user explicitly submits as part of a request. The interface includes clear warnings about avoiding sensitive material (e.g., real names, proprietary data, or confidential assets), and the documentation encourages users to consult institutional policies on AI tool usage before integrating Suzanne into graded coursework or research workflows.
In any future formal study of Suzanne, volunteers should be informed about what data leaves their machine, which provider processes it, and how long it may be retained according to that provider’s terms [5]. They should also be able to disable API-based features entirely and still use the add-on as a structured reminder of manual workflows grounded in the Blender Manual [3]. In classroom settings, instructors are encouraged to provide alternative, non-AI pathways to complete assignments so that students who are uncomfortable with third-party processing are not penalized.
Reliability and user control
While grounding on the Blender Manual and retrieval-augmented generation reduces some errors, LLMs remain fallible and can still propose incorrect or incomplete sequences of steps [5, 6]. For example, a generated workflow might reference an operator that moved in a newer Blender version, assume the wrong selection mode, or omit a crucial modifier step. Survey research on AI systems emphasizes that tools used in high-stakes or educational contexts must be designed around human oversight, transparency, and reversible actions to maintain user trust [5]. An assistant that silently edits the scene or hides its reasoning would be misaligned with these principles.
Suzanne therefore treats the model as an advisor, not an authority. It always displays instructional steps before any changes are applied and explicitly labels them as suggestions that should be verified by the user. When code snippets are generated, they appear in a dedicated panel where users can inspect the Python before deciding whether to run it. Code execution is strictly opt-in: Suzanne never executes code automatically in response to a prompt. Users must press a separate confirmation button, reinforcing the mental separation between “seeing advice” and “changing the scene.”
Blender’s own undo stack is highlighted as the primary recovery mechanism if something behaves unexpectedly. The add-on’s documentation recommends that users save incremental versions of their .blend file (for example, scene_v03.blend, scene_v04.blend) before experimenting with code-driven changes. The interface also encourages users to cross-check instructions against the Blender Manual when results look suspicious or differ from expectations [3]. In effect, the design continually nudges users to maintain interpretive control: Suzanne can suggest the next move, but the user decides whether it is appropriate for their current scene and learning goals.
Bias and inclusivity
Educational research on AI-based learning tools highlights the importance of clear, accessible feedback that supports diverse learners, rather than favoring only those with high prior knowledge or specific linguistic backgrounds [9]. Blender itself already presents a high barrier to entry: the interface is dense, the terminology is specialized, and much community documentation assumes familiarity with English technical jargon and gaming culture. Without care, an AI assistant could easily amplify these barriers—by using slang, skipping explicit prerequisites, or tailoring examples to a narrow subset of users.
Suzanne’s instruction style is therefore intentionally plain and procedural. Each step names the relevant mode, operator, and UI path instead of assuming tacit knowledge or relying on vague phrases like “clean up the mesh.” For instance, instead of saying “fix the shading,” Suzanne might say “Switch to Object Mode, select the object, then choose Object > Shade Smooth and enable Auto Smooth in the Object Data Properties > Normals panel.” This benefits students, self-taught artists, and non-native English speakers who may be less familiar with community slang or informal tutorial styles. It also supports learners who prefer to map instructions carefully to the interface rather than following along with a video at the instructor’s pace.
At the same time, the project acknowledges that underlying language models can encode societal biases in examples, metaphors, or suggested asset names [5]. To mitigate this, Suzanne intentionally scopes its responses toward technical actions (operators, modes, and parameters) and away from content that labels or describes people. The documentation discourages prompts that rely on demographic stereotypes (e.g., asking for “typical” appearances of certain groups) or that seek value judgments about whose work “looks better.” When portfolio examples are mentioned, they are framed in terms of topology cleanliness, lighting clarity, and presentation conventions, not in terms of personal attributes. The long-term goal is to support skill-building and confidence, particularly for learners who may not see themselves represented in mainstream 3D education spaces.
Cost transparency and outages
LLM providers can change pricing, rate limits, and model availability with little notice. This volatility is especially relevant for students and independent artists working with limited budgets, who may not be able to absorb unexpected charges or interruptions. A tool that quietly consumes API credits in the background or fails without explanation would undermine both trust and accessibility.
Suzanne addresses this by making API usage explicit and interruptible. The add-on requires users to paste their own API key rather than bundling any shared or hidden key, which makes the cost relationship clear: any charges are between the user and the provider. When an API request fails—because of quota exhaustion, authentication errors, or network issues—Suzanne surfaces explicit error messages rather than silently falling back to an empty response. Users are pointed toward their provider dashboard to check usage and are encouraged to set their own spending limits.
Whenever API-based features are unavailable, Suzanne falls back on workflows grounded in Blender’s official practices—for instance, pointing users directly to relevant sections of the Blender Manual or suggesting manual operator paths [3]. In classroom scenarios, instructors can choose to disable the API-dependent features entirely and still use the add-on as a structured, manual recipe panel. This ensures that the tool remains a useful learning aid even when AI services are unavailable or unaffordable, and it reinforces that the core knowledge lives in Blender’s open documentation rather than in any single commercial model.
Security and scope
Because unrestricted Python execution in Blender can cause serious harm—from deleting or corrupting scenes to interacting with the file system or network—Suzanne deliberately limits what kind of code it can propose. Security surveys of LLMs warn that generated code can be manipulated or misused to escalate privileges, exfiltrate data, or perform other unintended actions, especially when execution is automated or opaque [5]. Blender add-ons that expose “run arbitrary code” endpoints without constraints effectively grant the model the same power as an expert user with full access to the scene and environment.
In response, Suzanne restricts script generation to a narrow slice of Blender’s API: object creation, transforms, lights, cameras, and shader nodes. Operations such as deleting objects, applying all modifiers, or resetting entire scenes are either excluded or heavily discouraged. The add-on explicitly avoids file operations (opening, saving, or deleting files), external network calls, or direct system-level access, thereby reducing the potential attack surface. Any script that appears in the UI is kept short enough for a motivated user to skim, and it is formatted clearly so that parameter values and operator names are visible.
Suzanne also leverages Blender’s existing safety mechanisms. Scripts run within Blender’s Python environment, which already exposes undo and redo for most scene operations. Users are encouraged to save .blend files frequently and to experiment on copies rather than production scenes. The documentation includes a “safety checklist” that recommends: (1) saving before executing any script, (2) inspecting code for obviously destructive calls, and (3) using undo immediately if an unexpected change occurs. These guardrails do not eliminate all risk, but they align Suzanne with best-practice recommendations for LLM-driven code execution: minimize permissions, maximize visibility, and keep humans firmly in control [5]. In combination with the privacy and consent measures above, this scoped design aims to make Suzanne a responsible, student-friendly integration of AI into Blender rather than a source of hidden technical or ethical debt.
What does not belong in the Introduction
Practical modeling recipes, shading fixes, or operator sequences should not be included here. These belong in Methods or an Appendix, where Suzanne’s generated steps can be presented clearly. The Introduction is focused on background, motivation, problem definition, goals, scope, and ethics.
Chapter roadmap
The remainder of this thesis proceeds as follows:
- Related Work reviews Blender learning challenges, prior add-ons, AI tutoring systems, RAG-based grounding, and safe model usage.
- Methods details the N-panel UI, retrieval pipeline, grounding strategy, and safe code execution model.
- Evaluation presents the study design, tasks, metrics, and analysis.
- Discussion interprets results and limitations.
- Future Work explores multilingual support, richer scene graph awareness, and expanded tool-calling capabilities.
Methods
Chapter purpose and methodological stance
This chapter explains how Suzanne was designed, implemented, and prepared for evaluation as an in-viewport Blender assistant. The Introduction established the core problem as micro-execution friction (mode, operator, panel path, and action order), while Related Work positioned Suzanne against two common alternatives: external learning resources and automation-first AI add-ons [1, 3, 7, 12]. Methods therefore focuses on how the system operationalizes those insights in software, interface behavior, and safety controls.
The project follows a design-and-build methodology common in applied HCI and educational tooling:
- Define requirements from literature and practice (Blender learning pain points, in-context tutoring needs, and safety constraints).
- Build an executable prototype inside Blender’s N-panel.
- Iterate on reliability and usability through repeated local testing in authentic modeling workflows.
- Prepare measurable outputs for a later experimental chapter (task time, completion quality, and perceived usefulness).
Rather than treating model responses as opaque outputs, the implementation treats each interaction as a reproducible pipeline: user input -> validated request -> model response -> formatted procedural output in the viewport. This pipeline orientation is central to the methodological goal of reducing context switching and improving repeatable task execution.
Development process
Phase 1: Requirement extraction
Requirements were extracted from three sources: (a) Blender documentation and interface behavior [3], (b) prior studies on beginner friction in Blender [12], and (c) AI-learning-tool findings emphasizing in-context and actionable guidance [9].
The resulting requirement set emphasized:
- Locality: assistance must appear in the active workspace, not in a separate website.
- Procedural clarity: responses should be short, ordered, and immediately actionable.
- Safety: no silent scene modifications and no hidden execution of generated code.
- Practical deployment: installation and operation should fit student hardware and software constraints.
Phase 2: Prototype architecture and implementation
The first implementation target was a Blender add-on loaded through the standard add-on registration system (register() / unregister()), with persistent interaction state stored in Scene properties, user-level configuration stored in add-on preferences, and local conversation history persisted to disk with a temporary-directory fallback. This choice aligns with Blender’s architecture and keeps the workflow entirely in-app [3].
Core interaction paths were implemented as operators:
- Text path: submit prompt -> optionally attach recent conversation turns and Blender Info history -> receive response.
- Voice path: start/stop microphone capture -> transcribe -> optionally attach recent context -> submit transcript -> receive response.
- Conversation path: create, select, rename, delete, and preview local conversations.
- Utility path: API-key validation, model refresh, microphone/transcription diagnostics, and recordings-folder access.
Phase 3: reliability hardening
After the initial feature set worked end-to-end, iteration prioritized failure behavior rather than feature expansion. The main hardening tasks were:
- Clear status signaling (
Ready,Recording...,Sending...,Error). - Explicit handling for missing keys, missing files, empty transcripts, empty outputs, and HTTP failures.
- Local fallback logic for recordings and conversation storage when add-on directories are not writable.
- UI formatting logic for long responses, output previews, and empty states so multi-step instructions remain readable in the panel.
These hardening steps were chosen because the dominant user risk in educational contexts is not only wrong answers, but interrupted or confusing workflows that break learner momentum.
Phase 4: evaluation readiness
The final development phase prepared the system for controlled comparison in later chapters by stabilizing the feature surface and defining what is considered in-scope behavior for experiments. At this stage, Suzanne is treated as a mixed-initiative assistant: it recommends, the user decides, and all scene edits remain user-mediated.
System requirements and traceability
To keep claims testable, each major thesis goal was mapped to an implementation responsibility and observable system behavior.
| Requirement | Design decision | Observable behavior |
|---|---|---|
| In-viewport assistance | N-panel integration in VIEW_3D |
User never leaves Blender to ask for help |
| Procedural responses | Prompt shaping + UI formatting for numbered steps | Output appears as short action sequence |
| Input flexibility | Text prompt plus microphone-driven flow | Both typed and spoken intents are supported |
| Context-aware help | Optional conversation memory and Blender Info-history attachment | Responses can incorporate recent workflow context when enabled |
| API transparency | Explicit key entry in preferences and key-test operator | User can verify connectivity before tasks |
| Fault tolerance | Guard checks and HTTP/IO error handling | Failures are visible and recoverable |
| Safety-first behavior | No automatic scene mutation from generated text | User remains the final actor |
| Responsible deployment | Local storage of settings, conversations, and recordings with no telemetry path | Lower privacy exposure for student use |
This traceability table shaped coding priorities and chapter-level evaluation planning.
System architecture
Suzanne is implemented as a Blender-resident, event-driven assistant with three layers:
- Interface layer: collapsible
Status,Ask,Voice,Context,Conversation, andLatest Outputcards in the N-panel. - Orchestration layer: operators that manage validation, request sequencing, recording toggles, conversation management, and state transitions.
- Service layer: network calls for transcription and response generation, local audio capture utilities, and local JSON-backed conversation storage.
The architecture is intentionally simple because reliability and transparency were prioritized over autonomous behavior. Instead of hidden background orchestration, each major transition is user-triggered and surfaced in the UI.

In the implemented pipeline, the assistant does not introspect the full scene graph automatically. Scene awareness is inferred primarily from user prompts plus optional attached context: recent local conversation turns and the last 100 lines of Blender’s Info history. This keeps integration lightweight while still allowing limited context-sensitive assistance, though it constrains precision for unusual scenes or workflows not visible in the recent interaction history.
Blender integration details
Registration and state model
Suzanne follows Blender add-on conventions for class registration and property initialization [3]. Runtime interaction data is maintained in Scene properties, including:
- Current status string.
- Current message prompt.
- Last transcript text and last model response.
- Last recorded audio file path.
- Active conversation selection and conversation-context settings.
- Info-history attachment toggle and the last captured Info-history block.
- Latest-output view and transcript/response expansion state.
Configuration values (API key, response/transcription model selections, file prefix, conversation auto-save behavior, and diagnostics feedback) are stored in add-on preferences. This separation was chosen so task state and configuration state remain distinct and easier to reason about during testing.
Panel design and interaction constraints
The panel is now structured around a set of Blender-native collapsible cards: Status, Ask, Voice, Context, Conversation, and Latest Output. This layout keeps the primary interaction loop visible while allowing supporting controls to remain compact until needed.
The interaction loop is:
- Enter (or dictate) intent.
- Optionally attach recent conversation turns or Blender Info history.
- Send request.
- Read the latest transcript or step-oriented response.
- Apply steps manually in the scene.
A single microphone button toggles recording on/off to reduce control-surface complexity for beginners. The status card updates on each transition, functioning as lightweight feedback for asynchronous operations (recording, network request, response rendering). Empty states in the Conversation and Latest Output cards make it clear when no history or response is available yet, and long transcripts or responses can be expanded in place when necessary.
Cross-platform audio capture strategy
Because Blender is cross-platform and student devices vary, audio capture was implemented with OS-specific command paths:
- Linux:
ffmpegwith ALSA input. - Windows:
ffmpegwith DirectShow input. - macOS: bundled
atuncutility for capture.
Recorded files are normalized to mono 16 kHz WAV for consistent transcription behavior. If the add-on directory cannot store recordings, the system falls back to a temporary directory. This avoids hard failures in locked-down lab environments.
End-to-end interaction workflows
Text workflow
The text path was designed for direct, low-latency interaction from the viewport.
Algorithm 1: Text request handling
Input: user_prompt
Output: formatted procedural response in N-panel
1: if user_prompt is empty then
2: show validation error in panel
3: return
4: end if
5: read API key from add-on preferences
6: if API key missing then
7: show key error and return
8: end if
9: collect optional conversation context and Blender Info history if enabled
10: apply Blender-only prompt prefix and build request payload
11: send request to response model endpoint
12: parse output_text (or structured fallback content)
13: store response in scene state and append local conversation exchange
14: render wrapped lines in response box
This workflow is intentionally explicit and synchronous from the user’s perspective. There are no hidden retries or silent fallbacks that could obscure what happened during a request.
Voice workflow
The voice path extends the text workflow by inserting capture and transcription stages.
Algorithm 2: Voice request handling
Input: microphone toggle events
Output: transcript + procedural response in N-panel
1: on first press, start recording process and set status=Recording
2: on second press, stop process and wait for output file
3: if file missing then show error and abort
4: send audio file to transcription endpoint
5: if transcript empty then show error and abort
6: collect optional conversation context and Blender Info history if enabled
7: apply Blender-only prompt prefix
8: send transcript to response endpoint
9: store transcript, file path, response, and local conversation exchange
10: render transcript and response in panel
This two-press model was selected over push-to-talk hold behavior because it lowers motor-demand complexity for novices and allows longer utterances without continuous key holding.
Network and model interaction layer
API endpoints and payload flow
The implementation uses HTTPS requests to model APIs for two tasks:
- Audio transcription (
/v1/audio/transcriptions) with multipart file payloads. - Text response generation (
/v1/responses) with JSON payloads.
A lightweight key-test operation (/v1/models) is provided in preferences to reduce setup uncertainty before first use. This small affordance significantly reduced setup friction during internal testing because users can distinguish key issues from prompt-quality issues.
Error handling and response robustness
The system treats network interaction as failure-prone and therefore includes guarded parsing and user-facing error messages for:
- Missing or malformed API keys.
- HTTP transport failures.
- Non-JSON or unexpected response structures.
- Empty transcripts or empty model outputs.
When primary response fields are absent, the parser attempts structured fallback extraction from nested output content. This improves resilience across model-response format differences while keeping the UI contract stable.
Grounding and response-formation strategy
A major methodological goal is grounding outputs in authoritative Blender language so instructions remain reproducible and verifiable [3, 6]. The full grounding strategy is defined in three layers:
- Domain constraint layer. An always-applied Blender-only prefix prevents off-domain drift and keeps responses task-focused.
- Terminology alignment layer. Prompting style favors explicit mode names, operator names, and panel paths.
- Retrieval layer. A retrieval-augmented extension is specified to inject relevant Manual passages before generation.
The current evaluated build implements layers (1) and (2) directly and is architected to accept layer (3) as a modular extension. This allows transparent reporting of what is already operational versus what is specified for the full thesis target.
For the retrieval extension, passage ranking follows standard vector-similarity scoring [6]:
\[ \mathrm{score}(q, d_i) = \cos(\mathbf{e}_q, \mathbf{e}_{d_i}) = \frac{\mathbf{e}_q \cdot \mathbf{e}_{d_i}}{\|\mathbf{e}_q\|\,\|\mathbf{e}_{d_i}\|} \]
where \(\mathbf{e}_q\) is the query embedding and \(\mathbf{e}_{d_i}\) is the embedding of document chunk \(d_i\). Top-ranked chunks are then inserted into the generation context to reduce terminology drift and menu-path hallucination.
Response schema for procedural clarity
Regardless of input modality, the response format is designed to preserve instructional structure:
- Short, ordered steps.
- Explicit prerequisite states (mode, selection assumptions).
- Concrete operator/menu naming where possible.
- Troubleshooting branches when likely failure points are detected.
This schema reflects findings from AI-learning literature that actionable, context-proximate feedback is more useful than generic prose [9].
Safe code-assistance model
Related work showed that automation-first Blender copilots often execute generated code quickly, which improves speed but can increase risk [1, 5, 7]. Suzanne’s method is deliberately conservative:
- Primary output is human-readable procedure, not autonomous execution.
- Any code-like content is treated as optional scaffolding for user inspection.
- Scene changes remain user-initiated in Blender.
For the planned guarded execution extension, the policy model includes:
- Explicit confirmation before any run action.
- Restricted operation classes (object creation/transforms/lights/cameras/shader nodes).
- Blocked operations for high-risk file/network/system effects.
- Immediate rollback guidance using Blender’s undo stack.
By separating advice from execution authority, the method keeps user agency central and aligns with security guidance on minimizing model-side permissions [5].
Responsible-computing controls in implementation
Ethical concerns were translated into concrete implementation controls rather than left as abstract policy.
Privacy and data minimization
- API keys are user-supplied in local add-on preferences.
- No separate telemetry service is embedded in the add-on.
- Conversation history and recordings are stored locally on the user’s machine.
- Data sent externally is limited to explicit user inputs plus any context blocks the user chooses to attach (recent conversation turns and/or recent Blender Info history).
This local-first approach reduces unnecessary data propagation while acknowledging that third-party API processing remains part of the architecture [5].
Transparency and cost visibility
The system surfaces failures directly (e.g., key, quota, network, decode) instead of silently degrading output quality. Making failure modes visible helps users manage API budgets and prevents misattributing infrastructure issues to user competence.
Inclusivity by instruction style
UI output is cleaned and line-wrapped for readability, and markdown-heavy formatting is normalized before display. The intent is to improve clarity for novices and non-native readers by emphasizing operational language over stylistic flair.
Implementation environment and reproducibility
Software stack
The prototype runs as a Blender add-on for Blender 5.0.0+ [3], using Python within Blender’s runtime and standard libraries for process and HTTP orchestration. External tooling dependencies are intentionally minimal:
ffmpeg(Linux/Windows) for microphone capture.- bundled
atuncutility (macOS) for microphone capture. - Network access to model APIs for transcription and response generation.
Reproducibility protocol
To support repeatable demonstrations and evaluation setup, the following run protocol was used:
- Install/enable add-on in Blender.
- Configure API key and response/transcription model settings in preferences.
- Run built-in diagnostics such as
Test API Key,Test Microphone, andTest Transcriptionas needed. - Set context options (
Use Conversation Context,Context Turns,Include Info History (100 lines)) for the trial. - Run fixed benchmark prompts/tasks and record completion observations.
Because Blender versions and OS audio stacks vary, environment metadata (OS, Blender version, selected models, and whether conversation or Info-history context was enabled) is logged as part of experiment setup documentation.
Versioned capability statement
To avoid overclaiming, methods reporting distinguishes current implementation from scoped extension work.
| Capability area | Implemented in current build | Scoped extension |
|---|---|---|
| In-viewport text assistant | Yes | N/A |
| Voice capture and transcription | Yes | N/A |
| Blender-only domain gating | Yes (always-on prompt prefix) | Richer intent classification |
| Conversation/context support | Yes (local conversation memory + optional Info-history attachment) | Deeper scene introspection and richer grounding |
| Retrieval grounding from Manual | Partial (prompt-level alignment) | Full chunk retrieval + citation injection |
| Procedural step formatting | Yes | Adaptive difficulty/fading |
| Code execution inside add-on | No autonomous execution | Guarded, opt-in constrained runner |
| User safety controls | Yes (validation/status/errors) | Formal policy engine and audit trails |
This separation supports methodological integrity: the chapter captures both delivered engineering work and the explicit next-step architecture required to fully realize the thesis design goals.
Methods-level limitations
Several methodological constraints influence interpretation of later results:
- Scene-context inference is still indirect: it relies on prompts, recent conversation turns, and Info-history snapshots rather than full scene introspection.
- Grounding is currently strongest at terminology/prompt levels, with full RAG integration staged as extension work.
- API-dependent behavior introduces latency and availability variability outside Blender control.
- Microphone quality and device configuration can affect transcription quality and therefore downstream instruction quality.
These limits are not hidden defects; they are declared boundaries that shape valid claims in evaluation and discussion chapters.
Transition to evaluation
This Methods chapter established the system design and implementation pipeline used to operationalize Suzanne as an in-viewport instructional assistant. The next chapter evaluates this method through software verification and task-based experiments aligned with portfolio-relevant Blender tasks.
Experiments
This chapter evaluates Suzanne at two levels. First, it reports a completed software-verification pass over the Blender add-on implementation. Second, it presents three task-based experiments that demonstrate how Suzanne supports representative portfolio-oriented Blender workflows inside the viewport. This structure keeps the claims matched to the evidence: the automated suite evaluates deterministic software behavior, while the task experiments evaluate whether Suzanne can deliver usable instructional support for simple, complex, and context-aware workflows.
Experimental Design
Evaluation goals and research questions
The evaluation is organized around three practical research questions:
- RQ1: Reliability. Do Suzanne’s core interaction paths behave consistently enough to support repeated use inside Blender?
- RQ2: Task support. Can Suzanne provide usable in-viewport guidance for representative Blender tasks without forcing the workflow out into external search?
- RQ3: Instructional clarity and context. Do Suzanne’s responses remain clear, actionable, and context-sensitive enough to support learning-oriented Blender work?
These questions follow directly from the claims made in the Introduction and Methods chapters. Suzanne is not framed as a fully autonomous copilot; it is framed as an in-viewport instructional tool whose value depends on stable interface behavior, actionable task guidance, and user trust [4, 9].
Staged evaluation structure
The evaluation is staged across deterministic verification and task-based demonstrations.
| Evaluation layer | Purpose | Evidence presented here |
|---|---|---|
| Software verification | Confirm deterministic behavior of operators, panel rendering, preferences, and state transitions | Automated Python test suite with 65 passing checks |
| Task-based evaluation | Demonstrate usable in-viewport guidance across representative Blender workflows | Three authored experiments covering basic question answering, complex procedure generation, and context-aware action reconstruction |
This design fits the realities of Blender add-on development. Some claims are best tested through automation, such as whether blank prompts are rejected or whether an error state is rendered correctly. Other claims are best illustrated through concrete task runs in the Blender interface, where the usefulness of the resulting guidance can be seen directly.
Evaluation scope
This chapter reports completed software-verification results and task demonstrations executed inside Blender. Accordingly, the claims supported here are about system stability, procedural usefulness, and task coverage rather than population-level usability outcomes.
Evaluation
Completed software verification
Before the task demonstrations, the build was tested as a software artifact. The Suzanne repository includes a Python test suite covering operators, panel behavior, preferences, state registration, and shared utility functions. These tests use a mocked bpy environment so that Blender-specific logic can be exercised repeatably without manual clicking inside the UI. This is especially useful for edge cases that are tedious to reproduce by hand, such as missing API keys, offline requests, empty output panes, or repeated property registration.
The first set of checks verifies that Suzanne rejects invalid input early and with a readable message:
def test_send_message_execute_rejects_blank_prompt():
modules = load_suzanne_modules()
context = make_context(
modules.common.ADDON_MODULE,
scene=make_scene(suzanne_va_prompt=" "),
)
operator = modules.operators.SUZANNEVA_OT_send_message()
result = operator.execute(context)
assert result == {"CANCELLED"}
assert operator._reports[-1][1] == "Please type a message first."This test matters because a tutoring tool that fails opaquely can interrupt learner momentum faster than one that simply refuses an invalid request. The desired behavior is not only cancellation, but cancellation with a concrete, beginner-readable explanation.
The next example verifies that network failure surfaces a visible error state instead of silently failing:
with mock.patch.object(
modules.operators,
"_call_chatgpt",
side_effect=modules.operators.URLError("offline"),
):
result = operator.execute(context)
assert result == {"CANCELLED"}
assert scene.suzanne_va_status == "Idle (error)"
assert "Send failed" in operator._reports[-1][1]This supports one of the main design goals established in Methods: failure should be explicit and recoverable. In a classroom or portfolio workflow, a user needs to know whether a poor outcome came from their Blender steps, their prompt, or the external service connection.
A third group of checks verifies that the panel communicates state clearly:
scene.suzanne_va_status = "Idle (error)"
assert sidebar._status_presentation(scene, False) == (
"Error",
"Idle (error)",
"ERROR",
True,
)Even though this is a small UI test, it is directly relevant to the thesis argument. Suzanne is meant to reduce micro-execution confusion, so its own status language must stay simple and legible. The suite also checks other presentation branches, including Ready, Sending..., Recording..., conversation empty states, and the hiding of API-key details in preferences.
The 65 passing checks map directly onto the verification areas summarized in the following table. The table is therefore not a separate high-level abstraction; it is a grouped explanation of what those 65 checks were actually checking for across the add-on.
| Test family | What the checks verified | Why it matters |
|---|---|---|
| Prompt validation | Blank or whitespace-only prompts are rejected and reported in readable language | Prevents confusing empty requests and keeps feedback beginner-friendly |
| Send pipeline | Successful requests update transcript, latest response, conversation state, and status fields consistently | Confirms the core interaction loop works end to end |
| Error handling | Offline or API failures surface explicit error states and readable diagnostics | Makes failures recoverable instead of silent |
| Panel usability | Status cards, empty states, previews, and UI presentation branches render expected labels and messages | Keeps the interface legible during real task work |
| Preference safety | API-key controls and diagnostics behave correctly without exposing sensitive setup details | Supports trust and safer configuration |
| State lifecycle | Scene properties register idempotently, persist correctly, and clear without corruption | Prevents instability across repeated runs |
Together, these categories account for the breadth of the 65 checks. The significance of the test suite is therefore not only the number itself, but the spread of coverage across validation, interaction flow, failure behavior, interface clarity, and add-on state management.
When the local test suite was executed, all 65 checks passed in 0.30 seconds. This is not the same as proving that every model-generated instruction is correct. What it does show is that the non-model scaffolding around Suzanne is stable: validation works, user-facing failures are surfaced, panel states are coherent, and repeated runs do not corrupt add-on state. For a system intended to support learners, this reliability layer is a necessary prerequisite for broader usability claims.
Task-based evaluation
The three task-based experiments below move from a simple instructional query to a longer procedural workflow and then to a context-aware reconstruction task. Together, they show how Suzanne behaves when used as an in-viewport guide for increasingly demanding kinds of support.
Experiment 1: Basic question answering in the Blender viewport
The first task-based experiment tests Suzanne’s most basic instructional path: whether a user can ask a simple Blender question inside the add-on and receive a correct, readable, and immediately actionable response without leaving the interface. For this trial, the prompt entered into Suzanne was, “How do I add a light to my scene in Blender?” This is a suitable first test because lighting is a common beginner workflow, the expected procedure is easy to verify, and the task exercises the core text-query pipeline from prompt entry to visible response.

Suzanne returned a numbered, step-by-step answer that instructed the user to remain in Object Mode, open the Add menu with Shift + A, choose a light type, position the light, and then adjust its properties in the right-side panels. This response is functionally correct for standard Blender interaction and uses interface terms that match what a beginner would actually see on screen. Just as importantly, the answer is procedural rather than vague. Instead of describing lighting conceptually, Suzanne gives the user a sequence they can follow immediately in the same workspace.
| Aspect | Observation |
|---|---|
| Goal | Verify that Suzanne can answer a simple Blender question correctly inside the N-panel |
| Prompt | “How do I add a light to my scene in Blender?” |
| Observed output | Suzanne produced a short numbered procedure for adding, placing, and adjusting a light |
| Outcome | Successful |
| Interpretation | Suzanne’s baseline question-answering workflow functioned correctly and provided usable in-viewport guidance |
This experiment establishes that Suzanne’s simplest interaction loop is already useful at the point of use. Before the system can be trusted with longer workflows, it must first show that it can handle ordinary interface questions correctly and clearly.
Experiment 2: Complex procedural guidance for fire simulation
The second task-based experiment tests whether Suzanne can support a more advanced Blender workflow that requires multiple ordered setup steps rather than a short, single-action answer. For this trial, the prompt entered into Suzanne was, “How do I create a basic fire simulation in Blender?” Fire simulation is a stronger stress test than the first experiment because it involves several connected systems, including a simulation domain, a flow emitter, physics settings, material setup, and baking or caching behavior. In other words, the task is complex enough that an incomplete or poorly ordered answer would be much harder for a user to apply successfully.
Because Suzanne’s response exceeded the visible height of the N-panel, the result was captured in two screenshots.


Suzanne returned a structured, ordered procedure that included creating a domain object, configuring the domain as a gas simulation, adding a separate emitter, setting the emitter as a fire flow source, baking the simulation, assigning a material, and adjusting render settings. This is the kind of longer procedural answer that Suzanne is intended to support: it keeps the user inside Blender while still presenting a workflow that would otherwise require searching across multiple external references.
The generated response is also broadly consistent with the standard workflow described in the Blender Manual, which explains that gas simulations require at least a domain object and a flow object, followed by material assignment and cache baking [3]. Suzanne’s answer therefore appears substantively correct at the workflow level, even though this experiment documents procedural completeness rather than a timed benchmark.
| Aspect | Observation |
|---|---|
| Goal | Evaluate whether Suzanne can provide usable guidance for a more complex Blender simulation task |
| Prompt | “How do I create a basic fire simulation in Blender?” |
| Observed output | Suzanne produced a multi-step workflow covering domain setup, emitter setup, bake steps, material assignment, and render considerations |
| Outcome | Successful as a complex-response test |
| Interpretation | Suzanne handled a longer, more technically demanding query and returned guidance that broadly matches documented Blender workflow |
This second experiment strengthens the evaluation by showing that Suzanne is not limited to very short beginner questions. It can also generate longer instructional sequences for tasks that involve several dependent setup stages, which is central to the thesis claim that in-viewport guidance can reduce friction for practical Blender work.
Experiment 3: Context-aware reconstruction of recent Blender actions
The third task-based experiment evaluates Suzanne’s context feature rather than its general question-answering ability alone. In this trial, the Include Info History (100 lines) option was enabled in the Context panel, and the user asked, “what actions did I just perform in Blender? Please summarize them in order and explain what I was trying to do.” This is an important test because it asks Suzanne to infer recent activity from Blender session history instead of answering a generic procedural question from prior knowledge alone.
As with the previous experiment, the response extended beyond the visible panel height and was captured in two screenshots.


Suzanne responded with an ordered summary of recent scene operations, including deleting objects, adding a cube, scaling it, rotating it, translating it, resizing it again, adding a bevel modifier, and applying smooth shading. It then interpreted those actions as part of a likely modeling workflow, specifically suggesting that the cube may have been prepared as a domain object for a fire simulation. This is a meaningful result because it shows Suzanne using recent session context to produce a situationally grounded explanation rather than only returning generic help text.
Methodologically, this experiment is especially relevant to the thesis because it addresses one of the core limitations of many external help sources: they do not know what the user has just done. By contrast, Suzanne can incorporate Blender’s recent Info history and reflect it back into the conversation. In the captured session, conversation context was also enabled, so this example should be interpreted as evidence of context-aware assistance rather than as an isolated benchmark of Info-history retrieval alone. Even with that caveat, the response clearly tracks recent viewport activity in a way that ordinary static documentation cannot.
| Aspect | Observation |
|---|---|
| Goal | Evaluate whether Suzanne can use recent Blender session context to infer and summarize user actions |
| Prompt | “what actions did I just perform in Blender? Please summarize them in order and explain what I was trying to do.” |
| Observed output | Suzanne reconstructed an ordered sequence of recent modeling operations and inferred the likely purpose of the workflow |
| Outcome | Successful as a context-aware assistance test |
| Interpretation | Suzanne used recent session context to generate a grounded, workflow-specific explanation rather than only generic Blender advice |
Together, the three experiments illustrate a clear progression of capability: basic question answering, longer procedural guidance for a complex task, and context-aware interpretation of recent user actions. That progression supports the thesis claim that Suzanne is not merely a generic chatbot embedded in Blender, but a more situated instructional assistant designed to reduce micro-execution friction inside the viewport.
Answers to the research questions
RQ1: Reliability
RQ1 is answered positively. The software-verification layer shows that Suzanne’s core interaction paths behave predictably across input validation, request handling, failure recovery, panel presentation, preference controls, and state registration. The 65 checks matter because they cover the full support structure around the add-on rather than only a single happy-path request.
RQ2: Task support
RQ2 is also supported by the task-based evaluation. Across the three experiments, Suzanne kept guidance inside Blender for a beginner lighting question, a longer fire-simulation workflow, and a context-aware explanation of recent user activity. In each case, the output was specific enough to support continued work in the viewport rather than forcing the user out to search for the next step elsewhere.
RQ3: Instructional clarity and context
RQ3 is supported by the observed response quality. Suzanne’s outputs were readable, step-oriented, and aligned with Blender terminology, and the third experiment showed that the system can incorporate recent Info-history context to produce situationally grounded guidance. Taken together, the experiments show that Suzanne is not only present in the interface, but capable of producing assistance that is concrete enough to be useful for learning-oriented work.
Threats to Validity
Internal validity
The task-based experiments use author-selected prompts and scene setups, so task choice can influence how strong Suzanne appears. Familiar workflows may naturally produce stronger outputs than unusual scenes or ambiguous prompts. External API availability and network latency can also affect response timing and wording across runs.
Construct validity
Time-on-task and perceived trust are not directly measured in this chapter, so the evaluation should not be read as a complete learning study. The automated test suite measures software reliability rather than pedagogical truthfulness, and the task demonstrations show procedural usefulness for selected workflows rather than long-term retention or transfer. Passing checks show that Suzanne behaves consistently as an add-on; they do not guarantee that every generated instruction sequence is correct in every Blender scene.
External validity
The selected tasks emphasize beginner and intermediate portfolio workflows. That is appropriate for Suzanne’s scope, but it limits generalization. Results from lighting setup, fire simulation, and modeling-context reconstruction should not be overstated as evidence for advanced rigging, simulation pipelines, compositing, or studio-scale production work. Generalization is also constrained by Blender version, English-language UI assumptions, prompt phrasing, and local hardware differences.
Conclusion validity
The evidence in this chapter is descriptive and qualitative rather than inferential. It supports claims about reliability and demonstrated task support, but it does not justify broad quantitative claims about efficiency gains or superiority over alternative tools. The defensible conclusion is that Suzanne is technically stable and demonstrably useful for the representative workflows evaluated here.
Overall, these validity threats do not negate the value of the evaluation. They clarify the kind of claim this chapter can support: initial evidence that a carefully scoped, in-viewport Blender assistant is technically reliable and practically useful for reducing micro-execution friction on common portfolio-building tasks.
Conclusion
This thesis began from a practical problem: Blender is powerful, but beginners often lose momentum not because they lack creative ideas, but because they get stuck on micro-execution. They need to know which operator to call, which mode to use, which panel to open, and what order to follow. Existing help systems often answer these questions outside Blender through manuals, videos, forums, or automation-heavy AI tools. Suzanne was developed as a different response to that problem: an in-viewport instructional assistant that lives in Blender’s N-panel, keeps help close to the active scene, and returns short, actionable guidance rather than opaque automation.
Across the thesis, Suzanne was framed not as a fully autonomous copilot, but as a mixed-initiative learning tool shaped by three design priorities. First, it keeps assistance local to the workspace, reducing the need to switch away from Blender [4]. Second, it emphasizes procedural clarity through step-based responses aligned with Blender terminology and interface structure [3, 9]. Third, it preserves user control through explicit feedback, bounded scope, and the refusal to silently modify scenes or execute arbitrary actions without oversight [5].
Summary of Results
The implemented system shows that this design is technically viable. Suzanne now supports typed prompts, microphone-based prompt capture, local conversation memory, optional inclusion of recent Blender Info history, status-aware interface feedback, and a structured Latest Output review workflow. The Methods chapter demonstrated that these features are not just interface decoration; they are part of a reproducible interaction pipeline with clear validation, state transitions, and recovery behavior. In architectural terms, Suzanne successfully operationalizes the central thesis idea that AI guidance can be embedded directly into Blender without collapsing into hidden automation.
The strongest completed evidence in the thesis is the software-verification layer. The automated test suite passed all 65 checks, covering prompt validation, send behavior, failure handling, panel rendering, preference safety, and state lifecycle behavior. This does not prove that every generated instruction is always correct, but it does show that the non-model scaffolding around Suzanne is stable enough to support repeated use and task-based evaluation. That reliability matters because instructional tools fail pedagogically when their own interface behavior is confusing or inconsistent.
The three use-case experiments also support the thesis argument at a practical level. The first experiment showed that Suzanne can answer a straightforward Blender question with a clear, usable procedure inside the N-panel. The second showed that Suzanne can generate a longer workflow for a more complex task, in this case a basic fire simulation, without collapsing into vague or purely conceptual advice. The third showed that Suzanne can use recent Blender session context to summarize what the user has just done, suggesting a path toward more situationally grounded assistance. Together, these results support the claim that Suzanne is more than a generic chatbot embedded in Blender. It functions as a scoped, context-sensitive instructional add-on aimed at reducing friction in portfolio-oriented Blender work.
At the same time, the thesis remains methodologically careful about what has and has not yet been proven. The current evidence is strongest on implementation quality, reliability, and demonstrated task support. The most defensible conclusion is that Suzanne is a credible and well-scoped prototype with clear evidence of technical stability, practical usefulness in representative Blender workflows, and a design that aligns with both prior literature and practical Blender learning needs.
Future Work
The most immediate next step is to complete the planned human-facing evaluation. A within-subject comparison between Suzanne and external-search workflows would make it possible to measure time-on-task, completion quality, context switching, recovery burden, and perceived usefulness under controlled conditions. That study would be especially valuable for testing whether the instructional advantages suggested by the current experiments translate into measurable gains for students and early-career creators working on realistic Blender tasks.
Beyond evaluation, Suzanne itself has several natural expansion paths. One major direction is deeper grounding. The current build already aligns strongly with Blender terminology and can attach recent interaction context, but a fuller retrieval layer over the Blender Manual could improve menu-path accuracy, reduce hallucinated steps, and make it possible to cite or surface specific documentation passages alongside responses [6]. Another direction is richer scene awareness. Right now Suzanne infers context indirectly through prompts, conversation memory, and Info-history snippets. Future versions could inspect selected objects, active modes, visible modifiers, material slots, or render settings directly, allowing the assistant to tailor guidance more precisely to the user’s actual scene state.
Another important future direction is adaptive pedagogy. Suzanne currently returns concise procedural steps, but later versions could adjust explanation depth for different learners, provide optional troubleshooting branches automatically, or gradually fade support as users become more confident. For example, a beginner-facing mode might emphasize every prerequisite and panel path, while a more advanced mode could focus on only the critical actions or likely failure points. This would move Suzanne closer to a true tutoring system while preserving its in-viewport usability.
The future I find most exciting, however, is broader than Suzanne alone. This project suggests the beginning of a family of Blender add-ons built around the same philosophy: small, safe, in-context tools that reduce friction for creators without taking control away from them. Suzanne could grow into a wider ecosystem that includes add-ons for portfolio review, lighting critique, material setup guidance, scene-preparation checklists, topology analysis, or pipeline documentation. One add-on might help users prepare clean presentation renders; another might explain shader nodes in plain language; another might assist with organizing assets or documenting reproducible scene workflows. In that sense, Suzanne is not only a single tool but also a proof of concept for a broader design pattern in Blender add-on development.
That larger add-on ecosystem could also support my own future work as a developer and researcher. Rather than building one increasingly monolithic assistant, I can imagine creating a set of specialized tools that share a common design language: clear status feedback, grounded terminology, strong safety guardrails, and support for learning through doing. Suzanne can remain the general in-viewport tutor, while other add-ons explore adjacent needs in 3D creation. This would allow future projects to remain focused and maintainable while still contributing to the same overall goal: making Blender more approachable, more teachable, and more productive for students and independent artists.
Future Ethical Implications and Recommendations
If Suzanne or future related add-ons are released more broadly, ethical considerations will remain central. The first major issue is privacy. Because prompts, audio, and optional context may be sent to a third-party provider, users must understand what leaves their machine and what does not. Future public versions should continue to make API usage explicit, avoid hidden telemetry, and clearly label any contextual data attached to a request. In educational settings, users should always have a non-AI alternative so that students are not forced into third-party processing to complete coursework [5, 9].
The second issue is reliability and over-trust. Even when grounded on documentation, language-model outputs can still be incomplete, outdated, or slightly misaligned with a given scene. Future versions should therefore keep Suzanne’s advisor role visible. The system should continue to present suggestions as suggestions, preserve opt-in behavior for any code execution, and encourage verification against Blender’s official documentation when appropriate [3]. As richer scene awareness or automation features are added, the design should remain conservative: visibility before action, confirmation before execution, and undo-friendly workflows wherever possible.
The third issue is equity and access. Tools like Suzanne can help lower the barrier to Blender, but they can also create new inequalities if they depend on paid APIs, stable internet access, or English-only documentation. Future development should therefore prioritize cost transparency, graceful degradation when AI services are unavailable, and eventual exploration of cheaper or local model options. If I expand into additional add-ons, I should carry forward the same principle: the tool should help learners build skill and confidence, not create new hidden dependencies that only some users can afford.
Overall, this project argues that AI in creative software is most promising when it is narrow enough to be trustworthy, visible enough to be inspectable, and supportive enough to help users keep ownership of their work. Suzanne does not solve every Blender learning problem, and it is not yet the final form of an intelligent in-viewport tutor. What it does show is that there is real value in designing AI assistance around instructional clarity, user agency, and workflow locality. That combination offers a practical foundation not only for Suzanne’s continued growth, but also for a wider future of Blender add-ons that teach, assist, and empower rather than simply automate.
