PowerShellGPT User Manual (v1.6)

PowerShellGPT User Manual (v1.6)

Expanded Image

1. Introduction to PowerShellGPT

Welcome to PowerShellGPT, a revolutionary Windows application designed to transform how you interact with your computer and the web. It uniquely combines the conversational power of leading AI models with the practical execution capabilities of Windows PowerShell and web browser automation using JavaScript.

What is PowerShellGPT?

Think of PowerShellGPT as more than just a chatbot. It’s an intelligent assistant that can:

  • Understand Your Requests: Communicate naturally using text or voice in over 80 languages.
  • Generate Code: Leverage AI models like Gemini, Claude, ChatGPT, Grok or even local models (via LM Studio) to write PowerShell scripts, JavaScript snippets, Python code, C# applications, and more.
  • Execute Actions: Run the generated PowerShell commands directly on your Windows system or execute JavaScript within dedicated browser environments to automate web tasks.
  • Learn and Adapt: Analyze the results or errors from executed actions and use that feedback to refine its approach, correct mistakes, or proceed with the next step in a complex task.
  • Speak Responses: Utilize an advanced Text-To-Speech (TTS) system, offering over 1400 voices in 90 languages to read out AI responses or custom text.

PowerShellGPT operates within three distinct browser environments:

  • AI Model Browser: The main window where you interact directly with your chosen AI model (Gemini, Claude, ChatGPT, etc.). This browser includes a dedicated AI Output Control Panel for managing how responses are captured and for TTS control.
  • BrowserGPT (Injection Browser): A powerful, multi-tabbed browser designed for web automation. The AI can inject and run JavaScript here to interact with websites, fill forms, click buttons, or extract data. It also hosts the primary TTS engine tab.
  • Console Browser: A specialized interface for viewing the output of PowerShell commands executed by the AI or user, featuring controls for common actions and a unique, user-customizable UI rendered via JavaScript.

By connecting AI understanding with real-world execution, feedback, and versatile voice output, PowerShellGPT turns your AI assistant into an active participant in managing your system and navigating the web.

Core Concept: The Intelligent Feedback Loop

The magic behind PowerShellGPT lies in its unique feedback loop:

  1. Input: You provide a request via text or voice.
  2. AI Processing: The AI interprets your request and generates the necessary code (e.g., a PowerShell command or JavaScript).
  3. Execution: PowerShellGPT detects the code, asks for your permission (if needed), and runs it in the appropriate environment (PowerShell engine or BrowserGPT).
  4. Feedback: The application captures the result, output, or any errors from the execution. This includes both the primary output and a specially prepared text-to-speech (TTS) version.
  5. AI Learning: This feedback is sent back to the AI, allowing it to understand the outcome, learn from errors, and plan the next action.
  6. Voice Output (Optional): The TTS-prepared version of the AI’s response can be automatically spoken aloud.

This continuous cycle enables the AI to tackle complex, multi-step tasks, debug its own code, and interact dynamically with both your system and websites.

Who Is PowerShellGPT For?

PowerShellGPT is designed for a wide range of users:

  • Power Users & Automation Enthusiasts: Automate repetitive tasks, create custom workflows, and manage your system more efficiently.
  • Developers & DevOps: Accelerate development with AI-assisted code generation, testing, and execution across multiple languages.
  • Accessibility Users: Control your computer powerfully using voice commands and receive spoken feedback in your preferred language and voice.
  • Learners & Experimenters: Explore PowerShell, JavaScript, AI interaction, and web automation in a hands-on environment.
  • DIY Agent Builders: Create personalized AI agents and custom interfaces tailored to specific needs.

Overview of Key Capabilities

  • Multi-AI Support: Choose from Gemini, Claude, ChatGPT, Grok, or local LM Studio models.
  • Advanced Voice Control & Universal TTS: Natural language commands in 80+ languages, multiple recognition modes, and a powerful TTS system with 1400+ voices in 90 languages for all AI models.
  • Direct PowerShell Execution: Run any PowerShell command or script.
  • Browser Automation (BrowserGPT): Inject JavaScript to control websites, featuring multi-tab browsing, bookmarks, advanced directives for tab control, and two-way communication.
  • Customizable PowerShell UI (Console Browser): Modify the look and functionality of the PowerShell output window using standard HTML/JS/CSS.
  • Cross-Language Execution (MACFARI): Generate and run Python, C#, Ruby, Node.js, etc., via PowerShell wrappers.
  • Rich Command System: Save/load prompts, commands, scripts; assign voice triggers and aliases; use dynamic [KEYWORD] commands; chain actions with and then; introduce delays with wait for.
  • Agent Bridge: Enable communication and command execution between different PowerShellGPT instances or from external applications.
  • Plugin System: Automatically run custom JavaScript on page loads in browser environments.
  • AI Self-Correction: Automatic error feedback loop for iterative code refinement.
  • AI vs. AI Conversations: Orchestrate debates between different AI models, with spoken output in distinct voices.
  • Real-World Interaction: Examples include COM port control and face detection integration.
  • Security Features: Permission prompts and command password protection.
  • In-Browser AI Control Panels: For managing response capture and TTS for each AI model.
  • LM Studio Internet Search: Enables local models to request and process web search results.

PowerShellGPT empowers you to leverage AI not just for information, but for action and audible interaction. Let’s explore how to get started!

2. Getting Started

This section guides you through the initial setup and requirements for PowerShellGPT.

System Requirements

To run PowerShellGPT effectively, ensure your system meets the following minimum requirements:

  • Operating System: Windows 10 or Windows 11 (32-bit or 64-bit). While it’s a 32-bit application, it runs on 64-bit Windows.
  • Internet Connection: Required for connecting to online AI models (Gemini, Claude, ChatGPT, Grok), checking for application updates, validating registration (if applicable), and potentially for web browsing/automation tasks, including the LazyPy TTS service.
  • (Optional) PowerShell Version: While PowerShell is built into Windows, ensure you have a reasonably modern version (PowerShell 5.1 or later recommended) for best compatibility with generated scripts.
  • (Optional) Microphone: Required for using the Voice Control features.
  • (Optional) Webcam: Required for the Face Detection feature.
  • (Optional) Python Installation: Required if you intend to use the Face Detection feature or ask the AI to generate and run Python code via the MACFARI system. Ensure Python is added to your system’s PATH environment variable during installation.
  • (Optional) .NET Framework: Required for compiling and running C# code generated via the MACFARI system (specifically, the version targeted by the csc.exe compiler referenced in the system prompt, typically v4.x).
  • (Optional) Other Interpreters/Compilers: Required if you plan to use MACFARI for other languages (Ruby, Node.js, etc.).

Installation

Simply extract the folder’s contents to a location of your choice (e.g., C:\Documents\PowerShellGPT or your Desktop).

First Launch & Initial Setup

When you run PowerShellGPT for the first time:

  • Initial AI Model: By default, PowerShellGPT will load the ChatGPT interface in its main AI Model Browser window. You can change this later via the interface or voice commands.
  • Interface Load: The main application window will appear displaying the Console Browser. The BrowserGPT window will also initialize.
  • Settings Load: Default settings will be loaded.
  • System Prompt: The application will automatically send the primary system prompt to the loaded AI model to initialize its understanding of how to interact with PowerShellGPT’s features. You will see the AI’s confirmation response shortly after startup.

You are now ready to start interacting with PowerShellGPT! The next section will familiarize you with the main user interface.

3. The Main Interface

The PowerShellGPT main window serves as your central hub for interacting with the AI, managing commands, and accessing settings. Here’s a breakdown of its key components:

Main Interface

AI Model Browser

Purpose: This is where the web interface of your selected AI model (Gemini, Claude, ChatGPT, Grok, or the LM Studio interface) is displayed. You can interact with the AI directly through this browser just as you would in a standard web browser.

Interaction: PowerShellGPT monitors this browser for AI-generated code (@PowerShellGPT@ or @JsGPT@ tags) and can inject prompts or JavaScript into it. Each AI model’s browser view also hosts an “AI Output Control Panel” (see Chapter 4).

Command/Prompt Management

Purpose: This area allows you to manage saved PowerShell commands, AI prompts, as well as view the last executed command.

Components:

  • Mode Selector: Toggles the settings panel visibility and Switches the view between editors for:
    • Prompts: Shows controls for managing saved AI prompts, Save/Delete buttons.
    • Commands: Shows controls for managing saved PowerShell commands/scripts, Save/Delete buttons.
    • Last Command: Displays the most recently executed PowerShell command for review, editing, saving, or re-running. Includes Undo/Redo buttons.
  • Editor: Display the content of the selected saved item or the last executed command. You can edit content here before saving or executing.
  • Dropdown Lists: Select saved items to load them into the editor.
  • Action Buttons: For Submit Prompt, Run Command, Save Prompt, Save Command, Delete Command, Delete Prompt. Context-sensitive buttons appear depending on the selected mode.

Settings Panel

Purpose: Contains the primary application settings. Accessed by selecting “Settings” in the Mode Selector.

Direct PowerShell Input

Purpose: Allows you to type or paste a PowerShell command directly and execute it immediately by pressing Enter or clicking the adjacent Run button.

Status and Control Icons

  • AI Model Selection: Icons AI Icons allow switching between loaded AI models. The active model’s icon is in color; inactive ones are greyscale.
  • Voice Recognition Status: Microphone icons Mic Off (Black/Off), Mic On (Green/On), Mic Error (Red/Error), or animated icons indicate the current state. Clicking toggles listening. Right-clicking or double-clicking opens the Voice Recognition settings window.
  • BrowserGPT Access: Icons Browser Icon (Expand/Collapse BrowserGPT) and Console Browser Icon (Show Console Browser) provide access. Right-clicking the BrowserGPT icon opens it in “Collapsed” (Console) mode.
  • Command Alias Manager: Icon Alias Icon opens the Command Alias manager window.
  • YouTube Help: Icon YouTube Icon links to the PowerShellGPT tutorial playlist.
  • Permission Request Panel: Appears over the settings panel when PowerShell or BrowserGPT JavaScript execution requires user confirmation, containing Allow, Deny, and Preview buttons. Permission Request

4. Interacting with the AI

PowerShellGPT acts as an intelligent intermediary between you and powerful AI models. This section explains how to communicate with the AI and understand its unique interactions within the application.

Choosing Your AI Model

PowerShellGPT offers the flexibility to connect to different AI providers, allowing you to choose the best model for your needs.

Supported Models:

  • Gemini
  • Claude
  • ChatGPT
  • Grok
  • Local Models via LM Studio (requires LM Studio running with a compatible API endpoint, typically http://127.0.0.1:1234)

Switching Models:

  • Interface Icons: Use the dedicated AI model icons (Gemini, Claude, ChatGPT, Grok, LM Studio logos AI Icons on the main interface. Clicking an inactive model’s icon will load its interface into the AI Model Browser. The active model’s icon is colored whilst inactive models’ icons are greyed out.
  • Voice Commands: Use commands like "Switch to Gemini", "Switch to Claude", etc.

Loading: When you switch models, PowerShellGPT navigates the main AI Model Browser to the respective service’s URL or the LM Studio interface.

Sending Prompts to the AI

You can communicate your requests to the loaded AI model in several ways:

  • Direct Typing: Type directly into the AI model’s input area within the embedded AI Model Browser.
  • Voice Input (Speech Recognition): Activate voice recognition and speak your prompt. Depending on settings, it may be sent automatically or require confirmation.
  • Using Saved Prompts: Select from the “Prompts” dropdown, and click “Submit Prompt”.
  • Using the AI Feedback/Prompt Input Editor (BrowserGPT): Text from BrowserGPT (via postMessage). Edit if needed, then click “Send Output To Model”, or enable automatic sending.
  • Sending PowerShell/Browser Output: If enabled in settings, results/output from executed PowerShell or JavaScript are automatically sent back to the AI.

Understanding AI Responses with Code Tags

The AI must wrap executable code in specific tags within a markdown code block:

  • @PowerShellGPT@ ... @/PowerShellGPT@: For PowerShell commands.
  • @JsGPT@ ... @/JsGPT@: For JavaScript code.

PowerShellGPT extracts code between these tags for execution. Any text outside is treated as conversational.

The AI Output Control Panel

A floating control panel appears within each AI Model’s browser window (ChatGPT, Claude, Grok, Gemini, LM Studio) providing enhanced control over how AI responses are captured and vocalized. This panel is draggable and will auto-collapse to the edge of the screen if dragged far enough, becoming a small tab that can be clicked to expand.

AI Control Panel Example

Panel Features:

  • Output Mode Selection (Radio Buttons):
    • Full Text: When selected, the [modelsresponse] placeholder will be populated with the AI’s entire last textual response. @PowerShellGPT@..@/PowerShellGPT@ and @JsGPT@..@/JsGPT@ tags will be detected OUTSIDE of markdown code blocks
    • Code Only (Default for most): When selected, [modelsresponse] will contain only the text from the *last* code block found in the AI’s response. If no code block is found, [modelsresponse] will be empty.
    • This choice directly impacts what data your scripts receive when using the [modelsresponse] placeholder.
  • Recapture Response Button:
    • Clicking this button manually re-triggers the content extraction process based on the currently selected Output Mode. The extracted content is then sent to the application.
    • This is useful if an automatic capture was missed, if you changed the Output Mode after a response was generated, or if you want to re-process the AI’s last visible output.
  • Integrated TTS Speaker Icon TTS Speaker Icon:
    • Click this icon to have the AI’s last response (the TTS-prepared version, ModelsTTSResponse) read aloud using the external TTS engine LazyPy via BrowserGPT tab.
    • The icon changes to TTS Stop Icon during speech and TTS Loading Icon while loading.
  • (ChatGPT Specific) “Use ChatGPT TTS” Checkbox:
    • This checkbox appears only when the ChatGPT model is active.
    • Checked: PowerShellGPT allows ChatGPT’s own native text-to-speech feature to be used if the main “Read aloud ChatGPT’s responses” setting (in Settings Panel) is also enabled.
    • Unchecked: PowerShellGPT sends the full, code-stripped ModelsTTSResponse to the external TTS engine, allowing the LazyPy voices to be used for ChatGPT’s output.

The Feedback Loop in Action

This cycle allows for dynamic, iterative interaction:

  1. You prompt the AI.
  2. AI generates code (e.g., @PowerShellGPT@...) or text.
  3. PowerShellGPT detects code, asks permission (if needed), executes it.
  4. Output/errors are captured. The “AI Output Control Panel” also scrapes the response for the [modelsresponse] and [modelsttsresponse] placeholders.
  5. If enabled, PowerShell output or BrowserGPT postMessage feedback is sent to the AI.
  6. The AI uses this to learn, correct, or continue.
  7. If TTS is active, the ModelsTTSResponse is spoken.

AI Self-Correction Explained

If executed code produces an error, the error message is sent back to the AI. A well-prompted AI can analyze this, identify its mistake, and generate corrected code.

Preventing Conversational Loops

If the same AI output and application response pair repeats multiple times (and “Prevent Looping” is enabled), a specific intervention prompt is sent to the AI to break the cycle and encourage a new approach.

LM Studio Internet Search Capability

When using a local model via LM Studio, the model might not have real-time internet access. To address this:

  • **AI Instruction:** The editable system prompt for LM Studio instructs it: “When asked for up-to-date information, output a VARIATION of `I’ll look online nowhttps://www.google.com/search?q=SUBJECTOFENQUIRY` then say what, if any current info you have.”
  • **Tag Detection:** PowerShellGPT detects these `URL` tags in LM Studio’s output.
  • **URL Extraction:** The URL is extracted into the `[LMStudioSearchURL]` variable.
  • JavaScript Execution: A pre-saved JavaScript command, “LM studio internet search”, is designed to work with this. This opens the extracted search URL in a BrowserGPT tab .
  • Scraping & Summarization: The rest of the “LM studio internet search” script then scrapes the text content of the loaded search results page and sends it back to LM Studio with a prompt asking it to “Summarize this AS BRIEFLY AS POSSIBLE… Start your output with ‘I found this on the Internet:’”.
  • Result: This allows LM Studio models to effectively perform and utilize web searches via BrowserGPT.

5. Executing PowerShell Commands

PowerShellGPT empowers your chosen AI model to interact directly with your Windows system by generating and executing PowerShell commands and scripts. This section covers how this process works, how output is handled, and how you can customize the experience.

How AI-Generated PowerShell is Handled

  1. AI Generates Code: Based on your prompt (e.g., “List the running processes” or “Create a new text file on the desktop”), the AI model generates the necessary PowerShell code.
  2. Tagging: Crucially, the AI must wrap this code within the special tags: @PowerShellGPT@ at the beginning and @/PowerShellGPT@ at the end, typically presented within a markdown code block. Example:
    @PowerShellGPT@
    Get-Process | Select-Object -First 5
    @/PowerShellGPT@
  3. Detection & Parsing: PowerShellGPT continuously monitors the AI’s responses in the AI Model Browser. When it detects these specific tags, it extracts the PowerShell code contained between them. Any text outside these tags is ignored for execution purposes.
  4. Permission Request (Optional): If you haven’t granted permanent PowerShell access (see below), a confirmation dialog will appear over the Settings Panel, asking for your permission via “Allow” or “Deny” buttons. You can optionally preview and edit the full command before granting permission.
  5. Execution: If permission is granted (either via the dialog or permanently), PowerShellGPT uses its internal PowerShell Manager to execute the extracted command in an invisible PowerShell process. This manager handles process creation, capturing output streams (stdout and stderr), and process termination.
  6. Output Capture: The PowerShell Manager captures all text output generated by the executed command, including results and error messages.

Viewing Output: The Console Browser

While the main window has a simple log, the primary interactive interface for PowerShell results is the Console Browser.

  • Purpose: Provides a dedicated, scrollable view for PowerShell command output, mimicking a traditional console experience but rendered within a browser environment. It also includes interactive controls.
  • Display: As PowerShell commands execute, their output is dynamically sent to the Console Browser, where it’s displayed.
  • Default Interface: The standard Console Browser interface (defined by a default JavaScript plugin) includes:
    • A main output area.
    • A top control bar with the title “PowerShellGPT Console” and buttons for common actions (e.g., Volume controls, Settings toggle, Face Detection activation).
    • A side panel with quick-action icons (e.g., Refresh Console, TTS controls for ChatGPT, Hotspot toggle, Email Clipboard, Empty Recycle Bin, Change Background, Overlay toggles).
    • A hint bar at the bottom displaying tooltips for hovered buttons.

Customizing the Console Browser UI (via JavaScript Plugin)

This is one of PowerShellGPT’s most unique features. The entire Console Browser interface is generated by JavaScript.

  • Location: The default script responsible for the Console Browser’s UI is named PowerShell Console System.
  • Modification: You can edit this Javascript using the BrowserGPT’s JavaScript Scratchpad or an external editor.
  • Capabilities: By modifying the HTML structure, CSS styling, and JavaScript logic within this file, you can:
    • Change the appearance (colors, fonts, layout).
    • Add new buttons that trigger specific saved PowerShell commands or JavaScript snippets via window.chrome.webview.postMessage("[runcommand][BROWSERCOMMANDPASSWORD]YourCommandName").
    • Remove or rearrange existing controls.
    • Integrate custom data displays or visualizations.
  • Requirement: Requires knowledge of HTML, CSS, and JavaScript.
  • Caution: Be careful when editing, as errors in the script could break the Console Browser interface. Always back up the original file before making significant changes.

Handling Dangerous Commands

Executing arbitrary PowerShell code can be risky. PowerShellGPT includes safeguards:

  • Detection: The application checks commands against a predefined list of potentially dangerous keywords (e.g., Remove-Item, Stop-Process, reg, iex, registry paths like HKLM:, -Force flag).
  • Confirmation Setting: If the “Confirm critical commands” setting is enabled, and a command contains a dangerous keyword, an additional warning dialog appears before the standard Allow/Deny prompt.
  • Warning Dialog: This dialog highlights the dangerous keyword found and asks for explicit confirmation (Yes/No) before proceeding to the Allow/Deny step. Clicking No cancels the execution immediately.
  • Disabling Warning: Disabling this function bypasses this extra warning layer, allowing potentially harmful commands to execute after only the standard Allow/Deny prompt (or immediately if permanent access is granted). Use extreme caution if disabling this feature.

Running Other Languages via PowerShell (MACFARI Concept)

PowerShellGPT facilitates executing code in languages like Python, C#, Ruby, Node.js, etc., through a technique named MACFARI (Make A Code File And Run It).

Process:

  1. You ask the AI to write code in a specific language (e.g., “Write a Python script to…”).
  2. The AI, guided by its system prompt, generates a PowerShell command.
  3. This PowerShell command embeds the target language code (Python, C#, etc.) within the script.
  4. The PowerShell script saves this embedded code to a temporary file.
  5. The same PowerShell script then invokes the appropriate interpreter or compiler for that file (e.g., python script.py, csc.exe app.cs).
  6. If requested, the PowerShell script will include a confirmation dialog asking if you want to execute the newly created/compiled file.

Result: This allows the AI to effectively “run” code in various languages by leveraging PowerShell as the orchestrator for file creation and process execution. The output from the executed script/application is captured by PowerShell and returned to the AI via the standard feedback loop.

By integrating deeply with PowerShell and offering the customizable Console Browser, PowerShellGPT provides a powerful and transparent environment for system-level automation driven by AI.

6. BrowserGPT: The Injection Browser

Separate from the AI Model Browser and the Console Browser, PowerShellGPT features BrowserGPT. This isn’t just for viewing web pages; it’s a powerful, multi-tabbed environment designed specifically for web automation and interaction, driven by AI-generated or user-written JavaScript.

BrowserGPT Interface

Purpose: Web Automation and Interaction

BrowserGPT acts as the primary execution ground for JavaScript code intended to interact with live websites. It allows PowerShellGPT’s AI (or you directly) to:

  • Navigate web pages.
  • Fill out forms automatically.
  • Click buttons and links.
  • Extract (scrape) data from page content.
  • Modify the appearance or behavior of websites dynamically.
  • Communicate information back from the webpage to the AI or user.

The Multi-Tab Interface

BrowserGPT provides a familiar tabbed browsing experience:

  • Tab Bar: Displays open web pages, each in its own tab. Tabs show a loading animation () while content is loading and a favicon once loaded.
  • Creating Tabs:
    • Manually: Click the “New Tab” button New Tab to open a new tab navigated to your defined Home Page.
    • Via Command/Script: Use the //load page in new tab [URL]//
      [CODE] directive.
    • Via Link Clicks: Right-clicking a link provides an “Open link in new tab” option.
  • Switching Tabs: Click on the desired tab header. Use //switch to tab...// directives for scripted switching.
  • Closing Tabs: Right-click on a tab header (except the first/main tab) and select “Close Tab”.
  • Tab Identification: Tabs can be targeted by User ID, Title, or URL via directives, enabling precise automation.
  • Saving and Loading Sessions: The entire multi-tab layout—including URLs, User IDs, and the active tab—can be saved and restored using the Session Management icon Save/Load Session Icon. Right-click to save your session to a file, and left-click to load a session from a file. This powerful feature is explained in detail in Chapter 13.

Navigation Controls

Standard browser navigation elements are present:

  • Address Bar: Displays the active tab’s URL. Type URL or search term and press Enter or click Go.
  • Back, Forward, Refresh, Home, Go/Search buttons function as expected. Home page is customizable.

Bookmarks Bar

Functionality: Click an icon to navigate to its bookmarked URL.

Setting/Deleting Bookmarks: Right-click a bookmark slot. Select “Bookmark current page” to save the active tab’s URL and favicon. Select “Delete Bookmark” to clear the slot.

Visibility: Toggle with the Bookmark Button icon.

The JavaScript Scratchpad & AI Feedback/Prompt Input Editor

This panel (toggle with Expand/Collapse or Shift+Enter) includes:

  • JavaScript Scratchpad: For writing, editing, loading, saving, and running JavaScript snippets. Includes Undo/Redo.
  • AI Feedback/Prompt Input Editor: Displays messages from webpages (via postMessage) for review, editing, and sending to the AI. Automatic forwarding to AI is optional.

Executing JavaScript

  • Direct Execution: From Scratchpad via “Run JavaScript” button. Placeholders are replaced before execution.
  • AI-Generated: From @JsGPT@... tags.
  • Saved Scripts/Voice/Agent Bridge: Trigger execution of saved Javascript.
  • Plugins: Auto-run on page loads.
  • HTML Rendering: Code in Scratchpad starting <!DOCTYPE html> is rendered as HTML.

Dynamic Placeholders in JavaScript Execution

When JavaScript is executed in BrowserGPT, the following placeholders are replaced with dynamic content:

  • [modelsresponse]: Replaced with the AI’s last response. Its content (Full Text or Code Only) depends on the “Output Mode” selected in the AI Model Browser’s control panel.
  • [modelsttsresponse] Replaced with the AI’s last full textual response *after* any code blocks have been stripped out. Ideal for sending to TTS engines for cleaner speech.
  • [defaultvoice] Replaced with the voice name string set as the “Default Voice” in PowerShellGPT’s main settings. Useful for scripting TTS voice selection (e.g., with the LazyPy TTS plugin).
  • [agentname]: Replaced with the current “Agent Name” configured in PowerShellGPT’s settings.
  • [lmstudiosearchurl] If the last AI response was from LM Studio and contained an <Internet>URL</Internet> tag, this placeholder is replaced with that extracted URL.
  • [browserstate]: Replaced with a real-time JSON string that describes the current state of all open BrowserGPT tabs, including their index, active status, user-assigned ID, title, and URL. This gives the AI situational awareness of the browser, allowing it to make intelligent decisions about tab management (e.g., checking if a tab is already open before creating a new one).

Tip: Use JavaScript template literals (backticks “ ` “) when using these placeholders to handle multi-line content and special characters correctly.

Suppressing Browser Notifications

//silent// Directive

If you include the comment //silent// anywhere within a JavaScript block (whether in the Scratchpad, a saved file, or AI-generated), the blue pop-up notification (e.g., “Executing script…”, “Loading HTML…”) that normally appears in BrowserGPT for that specific execution will be suppressed. This is useful for background scripts or frequent plugin actions where notifications might be intrusive.

Targeted Code Execution

Browser directives (see Appendix A) control where JavaScript runs (active tab, specific ID/Title/URL, AI Model Browser, Console Browser), including options to create tabs if they don’t exist (//orcreate [URL]//).

Two-Way Communication: The Browser’s Voice

BrowserGPT isn’t just a one-way street; web pages can send messages back to the PowerShellGPT application. This is crucial for web scraping, monitoring tasks, and providing feedback to the AI. This is achieved using a single JavaScript function, but its behavior is controlled by special prefixes.

The Core Function: window.chrome.webview.postMessage()

To send a message from a webpage in BrowserGPT, your JavaScript must call this function.

Security: The Browser Command Password

Every message intended for the application must include the [BROWSERCOMMANDPASSWORD] placeholder. PowerShellGPT automatically replaces this with the real password, ensuring that only trusted, internally-run scripts can trigger application actions.

window.chrome.webview.postMessage("[BROWSERCOMMANDPASSWORD]Your message here...");

The Communication Protocol: Choosing Your Message Type

What happens to your message depends on the prefix you use. This protocol gives you precise control over how feedback is logged and when the AI is prompted.

1. No Prefix (Default Behavior)

  • Action: Replaces all content in the “AI Feedback / Prompt Input Editor” with your message.
  • AI Prompt: Prompts the AI with this message only if the “Send browser output to model” setting is checked.
  • Use Case: Ideal for displaying the final, singular result of a script, where previous messages are no longer relevant.
    // Shows "Task Complete." in the feedback editor.
    window.chrome.webview.postMessage("[BROWSERCOMMANDPASSWORD]Task Complete.");

2. The [newlog] Prefix

  • Action: Replaces all content in the Feedback Editor, starting a fresh log with your message.
  • AI Prompt: Never prompts the AI.
  • Use Case: Use this as the very first step of a new multi-step task to clear out any old, irrelevant logs and provide a clean starting point.
    // Clears the editor and starts it with this line.
    window.chrome.webview.postMessage("[newlog][BROWSERCOMMANDPASSWORD]Starting user profile update sequence...");

3. The [log] Prefix

  • Action: Appends your message to a new line in the Feedback Editor, building a running log.
  • AI Prompt: Never prompts the AI.
  • Use Case: The AI’s equivalent of console.log(). Use this to record intermediate steps, status updates, or debug values during a long process without interrupting the AI’s workflow.
    // Adds a new line to the existing log in the editor.
    window.chrome.webview.postMessage("[log][BROWSERCOMMANDPASSWORD]Step 2 of 5: Profile picture uploaded.");

4. The [sendlog] Prefix

  • Action: Appends your message to a new line in the Feedback Editor.
  • AI Prompt: Prompts the AI with the entire accumulated log content from the Feedback Editor, but only if the “Send browser output to model” setting is checked.
  • Use Case: For critical milestones in a multi-step task. It records the step in the log and gives the AI the full context of the task so far, allowing it to make an informed decision on what to do next.
    // Adds this line, then sends the whole log to the AI if the setting is on.
    window.chrome.webview.postMessage("[sendlog][BROWSERCOMMANDPASSWORD]Update failed at validation step. Please review log and advise.");

5. The [runcommand] Prefix

  • Action: Does not affect the Feedback Editor. Instead, it directly triggers an internal application command.
  • AI Prompt: Does not prompt the AI.
  • Use Case: Allows a webpage to trigger saved PowerShell/JavaScript commands or built-in functions.
    // Executes the saved command named "empty the recycle bin".
    window.chrome.webview.postMessage("[runcommand][BROWSERCOMMANDPASSWORD]empty the recycle bin");

This flexible system allows you to design sophisticated web automation scripts that can provide silent logging, request user/AI intervention at key points, or simply report a final status, giving you complete control over the communication flow.

7. Mastering Voice Control & Universal TTS

PowerShellGPT integrates powerful voice recognition and universal Text-To-Speech (TTS) system, allowing you to interact with the AI, execute commands, and receive spoken feedback in a vast array of languages and voices.

Voice Recognition Settings

Activating and Deactivating Voice Recognition

Control voice input using the microphone icons Mic Off (Off), Mic On (On), Mic Error (Error) or by saying "Stop listening". The command "Cancel cancel cancel" immediately aborts and restarts voice recognition.

Voice Recognition can also be activated by running the command Activate voice recognition using the Agent Bridge, Javascript, window.chrome.webview.postMessage("[runcommand][BROWSERCOMMANDPASSWORD]Activate voice recognition"), or entered as a command to run when the application starts

Voice Recognition Modes

Accessed via Settings Panel:

  • Click to Talk: Activates on mic click, stops after speech.
  • Constant: Continuously listens and processes after activation.
  • Wake Word: Continuously listens, processes speech after hearing the “Agent Name”. Plays an audio cue (“Listening…”) after detecting the wake word. The “Speech Recognition Finalization Delay” setting (0.5-5.0s) fine-tunes how long it waits after you stop speaking.

Language Selection

Choose from over 80 languages and dialects in the Voice Recognition window (right-click/double-click mic icon) for optimal accuracy.

Assigning an Agent Name

Set in the main Settings panel. It acts as the Wake Word and the identifier for the Agent Bridge.

Universal Text-To-Speech (TTS) System

PowerShellGPT features a powerful, centralized TTS system that works across all supported AI models (Gemini, Claude, ChatGPT, Grok, LM Studio), offering over 1400 voices in 90 languages.

How it Works:

  • External TTS Engine (LazyPy): The system primarily utilizes the LazyPy TTS service (https://lazypy.ro/tts/). PowerShellGPT automatically manages a dedicated BrowserGPT tab that loads the LazyPy interface.
  • Cleaned Text for Speech: When an AI model responds, PowerShellGPT prepares a special version of the text for TTS by stripping out code blocks and other non-speech elements. This cleaned version is available to JavaScript via the [modelsttsresponse] placeholder.
  • Sending Text to TTS Tab:
    • If the “Read aloud Model’s responses” setting is checked, the [modelsttsresponse] is automatically sent to the LazyPy TTS tab for speech synthesis.
    • The built-in voice command "Read that to me" also sends the AI’s last response to the TTS tab.
    • Custom JavaScript commands (like AI Chat Speak [keyword]) can directly send text to the TTS tab.
  • Voice Selection:
    • Default Voice: Any voice selected on the LazyPY TTS Page will be used as the “Default Voice” (e.g., “Ryan (English, United Kingdom)”). This voice will be used by the TTS system unless overridden.
    • Dynamic Voice Change with [SETVOICE]: Text sent to the LazyPy TTS tab can include the command [SETVOICE][Voice Name From LazyPy] (e.g., [SETVOICE][Aria (English, American)]Hello there!). This dynamically changes the voice for the subsequent text in that speech request. This is key for the “AI vs. AI” feature.
    • The JavaScript in the LazyPy tab (TextToSpeech ENGINE) handles these [SETVOICE] commands and manages the voice queue.
  • Interaction with ChatGPT’s Native TTS:
    • The “Use ChatGPT TTS” checkbox (within ChatGPT’s AI Output Control Panel, see Chapter 4) fine-tunes *which* voice is used:
      • If “Use ChatGPT TTS” is checked, PowerShellGPT allows ChatGPT’s own voice to play (if the main “Read aloud…” is also checked).
      • If “Use ChatGPT TTS” is unchecked, the code-stripped text is sent to LazyPy, and you’ll hear the selected LazyPy/Default voice.

Executing Saved Items via Voice Trigger

Speak the exact saved name of a command, prompt, or JavaScript snippet to execute it. This bypasses sending the text to the AI.

Built-in Voice Commands

Refer to Appendix B for a full list of commands that control the application itself. These take precedence over saved items.

8. Managing Commands, Prompts, and Scripts

PowerShellGPT allows you to save and reuse frequently used AI prompts, PowerShell commands/scripts, and JavaScript snippets. This significantly speeds up workflow and enables powerful automation through voice triggers and the Agent Bridge.

Accessing Management Areas

Use the Mode Selector on the main window to switch between different management views:

  • Prompts: Manage saved AI prompts.
  • Commands: Manage saved PowerShell commands and scripts.
  • Last Command: View and manage the most recently executed PowerShell command. Includes Undo/Redo buttons for the editor.
  • JavaScript (in BrowserGPT): Manage saved JavaScript snippets using the dedicated controls on the JavaScript Scratchpad etc.

Saving and Loading AI Prompts (Prompts Mode)

Purpose: Save text prompts that you frequently send to the AI model (e.g., system instructions, specific question formats).

Interface:

  • Dropdown list: Lists all saved prompts by name. Select a prompt to load its content.
  • Editor: Displays the content of the selected prompt. You can also type or paste new prompt text here.
  • Save Button: Saves the current content of the editor.
  • Delete Button: Deletes the prompt currently selected in dropdown list.
  • Submit Button: Sends the content of the editor to the current AI model.

Saving:

  1. Enter or load the desired prompt text into the Prompt Editor.
  2. Click the “Save Prompt” button.
  3. You’ll be prompted to enter a name for the prompt. This name also becomes its voice trigger. Choose a descriptive and easy-to-say name. Avoid reserved names (see Appendix B) and characters invalid for filenames.
  4. The prompt is saved in the Prompts subfolder.

Loading:

Select the desired prompt name from the dropdown list. Its content appears in the Prompt Editor.

Executing (Voice):

Activate voice recognition and say the exact saved prompt name.

Executing (Manual):

Load the prompt into the Prompt Editor and click “Submit Prompt”.

Saving and Loading PowerShell Commands/Scripts (Commands Mode)

Purpose: Save reusable PowerShell commands or multi-line scripts.

Interface:

  • Dropdown list: Lists saved PowerShell commands/scripts by name.
  • Editor: Displays the content of the selected command/script. Type or paste new PowerShell code here.
  • Save Button: Saves the content of the editor.
  • Delete Button: Deletes the command selected in the dropdown list.
  • Run Saved Button: Executes the PowerShell code currently loaded in the Command Editor.

Saving:

  1. Enter or load the PowerShell code into the Command Editor.
  2. Click the “Save Command” button.
  3. Enter a name. This name is also the voice trigger.
  4. The command is saved in the Commands subfolder.

Loading:

Select the command name from Dropdown list of saved PowerShell Commands. Its content appears in the Command Editor.

Executing (Voice):

Activate voice recognition and say the exact saved command name.

Executing (Manual):

Load the command into the Command Editor and click “Run Command”.

Using the ‘Last Command’ View (Last Command Mode)

Purpose: Review, reuse, or save the most recently executed PowerShell command (whether generated by AI, run directly, or a saved command).

Interface:

  • Editor: Displays the last executed PowerShell command. You can edit it here.
  • Run Button: Executes the command currently shown in the Last Command Editor.
  • Save Button: Saves the command currently in the Last Command Editor as a new named command (prompts for name/voice trigger).

Saving and Loading JavaScript Snippets (BrowserGPT)

Purpose: Save reusable JavaScript code for web automation or interaction.

Interface:

  • Droplist: Lists saved JavaScript snippets by name.
  • Scratchpad: Displays the content of the selected snippet. Type or paste new JavaScript here. Includes Undo/Redo buttons.
  • Save Button: Saves the content of Scratchpad.
  • Delete Button: Deletes the script selected in the Dropdown list.
  • Run JavaScript Button: Executes the JavaScript currently in the Scratchpad (respecting browser directives).
  • Plugin Checkbox: Indicates/sets if the script is a plugin (//plugin//).
  • HTML Checkbox: Indicates/sets if the content should be treated as raw HTML (<!DOCTYPE html>).

Saving:

  1. Enter or load the JavaScript code into the Scratchpad.
  2. Click the “Save” button.
  3. Enter a name, which also serves as its voice trigger.
  4. The script is saved as a .js file in the JavaScript folder.

Loading:

Select the script name from the Dropdown list of saved JavaScript. Its content appears in the Scratchpad.

Executing (Voice):

Activate voice recognition and say the exact saved script name.

Executing (Manual):

Load the script into the Scratchpad in the BrowserGPT window and click “Run JavaScript”.

Deleting Saved Items

  1. Select the appropriate mode (“Prompts”, “Commands”, or go to BrowserGPT Window for JavaScript).
  2. Select the item you wish to delete from the corresponding dropdown list.
  3. Click the relevant “Delete” button.
  4. Confirm the deletion when prompted.

Effectively managing your saved prompts, commands, and scripts allows you to build a powerful library of reusable automations accessible via simple clicks or voice commands.

9. Advanced Automation Techniques

Beyond simple command execution, PowerShellGPT offers several powerful features for creating sophisticated, flexible, and sequential automations.

Command Aliases: Creating Shortcuts

Alias Manager

Purpose: Assign simpler, more natural, or alternative phrases to trigger existing saved commands (PowerShell, JavaScript, or Prompts) or even built-in voice commands. This allows you to use phrasing that feels more intuitive to you without renaming the underlying saved file.

How it Works:

  1. Access the Command Alias Manager by clicking its icon (Alias Icon) or using the "Show command aliases" voice command.
  2. The manager displays a list of existing aliases and allows you to add, edit, or delete them.
  3. When adding/editing:
    • Select the Original Command: Select the Command you wish to create an Alias for from the Dropdown list containing all saved commands.
    • Specify the Alias Command: Enter the new phrase you want to speak.
  4. Aliases are stored in CommandAlias file in the application directory.

Execution:

When you speak an alias phrase, PowerShellGPT first checks its alias list. If a match is found, it substitutes the alias with the corresponding Original Command and processes that command instead.

Example:

  • Original Command Name: empty the recycle bin
  • Alias Command: take out the trash
  • Saying “take out the trash” will execute the empty the recycle bin command.

Dynamic Commands with [KEYWORD]

Purpose: Create versatile command templates where part of the command is filled in dynamically by your spoken words at the time of execution. This allows one saved item to perform many variations of a task.

How it Works:

  • Saving: When saving a PowerShell command or JavaScript script, include the exact placeholder [KEYWORD] (case-insensitive within the content of the script/command) where you want the dynamic input to be inserted. Include [KEYWORD] in the filename itself where the variable part of your voice command will be.
  • Voice Trigger Naming: The name you save the file with must also include [KEYWORD] at the position representing the variable part of your intended voice command.

Execution:

Activate voice recognition and speak the command, replacing [KEYWORD] in the spoken phrase with your desired dynamic text.

Processing:

  1. PowerShellGPT matches your spoken phrase against saved javascript, powershell and prompts names containing [KEYWORD].
  2. It extracts the portion of your speech that corresponds to the [KEYWORD] placeholder in the filename.
  3. It loads the content of the matched PowerShell, Javascript or prompt.
  4. It replaces all instances of [KEYWORD] within the loaded code with the text extracted from your speech.
  5. It executes the resulting modified PowerShell or JavaScript code.

Example (PowerShell):

  • Saved PowerShell Command Name: output the text [KEYWORD] in powershell
  • File Content: echo "[KEYWORD]"
  • Speak: “Computer, output the text this is a test in powershell”
  • Extracted Keyword: “this is a test”
  • Executed Code: echo "this is a test"

Example (JavaScript):

  • Save Javascript Name: show an alert that says [KEYWORD]
  • File Content: alert("[KEYWORD]");
  • Speak: “Computer, show an alert that says hello world
  • Extracted Keyword: “hello world”
  • Executed Code: alert("hello world");

Key Points: This feature relies on matching the spoken phrase structure to the filename structure containing [KEYWORD]. It works best for commands where the variable part comes naturally at the end or middle of a phrase. For dynamic variables specific to JavaScript executed in BrowserGPT, also see [modelsresponse], [modelsttsresponse], [defaultvoice], [agentname], and [lmstudiosearchurl] detailed in Chapter 6.

Command Chaining with and then

Purpose: Execute multiple commands (saved items, built-in commands, keyword commands, aliases) sequentially from a single voice instruction or Agent Bridge call.

Syntax: Separate individual commands within your spoken phrase or Agent Bridge command string using the Chain Command Phrase (default: " and then ", note the surrounding spaces).

Execution:

  1. PowerShellGPT detects the chain phrase.
  2. It splits the entire instruction into individual command parts based on the phrase.
  3. It processes each part sequentially:
    • Checks for Wait commands (see below).
    • Resolves aliases.
    • Checks for [KEYWORD] commands.
    • Checks for saved items or built-in commands.
    • Executes the resolved command/prompt/script.

Example: "Show browser and then search youtube for relaxing music and then wait for 5 seconds and then play relaxing music"

Adding Delays with wait for [number] seconds/minutes

Purpose: Introduce timed pauses within a command chain.

Syntax: Use the Wait Command Phrase (default: "wait for ") followed by a number (1-60, either digits like 5 or words like five) and the unit (seconds or minutes). This entire phrase acts as one “command” within the chain.

Execution:

When the CheckCommand function encounters a wait command within a chain:
  1. It extracts the number and the unit (seconds or minutes).
  2. It converts the number word (if used) to an integer.
  3. It pauses the execution of the chain for the specified duration.
  4. After the delay, it proceeds to the next command in the chain.

Example: "Empty the recycle bin and then wait for ten seconds and then show settings"

Combining chaining, waiting, aliases, and keyword commands allows for the creation of highly sophisticated, timed, and flexible automation sequences triggered by a single voice command or external call.

10. The Agent Bridge: Multi-Instance Coordination

The Agent Bridge (agent_bridge.exe) is a lightweight yet powerful companion utility included with PowerShellGPT. Its purpose is to enable communication and command execution between different running instances of PowerShellGPT or to allow external applications, scripts, or even simple Windows shortcuts to trigger actions within a specific PowerShellGPT instance.

Concept: Named Agents

  • Each running instance of PowerShellGPT can be assigned a unique Agent Name (configured in the main Settings panel).
  • This name acts as an identifier, allowing the Agent Bridge to target a specific running PowerShellGPT process.

How it Works

  1. Targeting: The agent_bridge.exe utility is designed to be run from the command line (or called by another program/script). It requires two primary arguments: the target Agent Name and the command or prompt to be executed.
  2. Finding the Instance: When executed, the Agent Bridge searches through the currently running processes on the system. It specifically looks for a window whose title contains the exact Agent Name provided as the first argument. (Therefore, it’s important that the Agent Name is unique among running instances).
  3. Sending the Command: Once it finds the window matching the Agent Name, the Agent Bridge sends the second argument (the command/prompt) directly to that specific PowerShellGPT instance.
  4. Receiving and Processing: The target PowerShellGPT instance receives the message. It extracts the received command/prompt.
  5. Execution: The received command/prompt is then processed. This means the received command/prompt can be:
    • The name of a saved AI prompt.
    • The name of a saved PowerShell command/script.
    • The name of a saved JavaScript snippet.
    • A built-in voice command phrase (e.g., "stop listening", "show settings").
    • A command chain using and then and wait for.
    • A dynamic [KEYWORD] command.
    • An alias defined in the target agent’s saved command aliases.
  6. Action: The target PowerShellGPT instance performs the action corresponding to the resolved command (sends prompt to AI, runs PowerShell, executes JavaScript, changes a setting, etc.).

Command Line Usage

agent_bridge.exe "TargetAgentName" "CommandOrPromptToExecute"
  • "TargetAgentName": The exact Agent Name (case-sensitive, enclosed in quotes) of the running PowerShellGPT instance you want to control.
  • "CommandOrPromptToExecute": The exact name of the saved item, built-in command, alias, or chained command sequence (enclosed in quotes) that you want the target agent to execute.

Example Usage Scenarios:

  • Run a saved PowerShell script on Agent “ServerMonitor”:
    agent_bridge.exe "ServerMonitor" "check disk space"
  • Send a specific system prompt to Agent “MainAssistant”:
    agent_bridge.exe "MainAssistant" "system prompt"
  • Trigger a complex chained command on Agent “HomeControl”:
    agent_bridge.exe "HomeControl" "turn on living room light and then wait for 2 seconds and then play relaxing music"

Agentic Possibilities & Multi-Agent Coordination

The Agent Bridge unlocks significant potential for creating more complex, distributed systems:

  • Specialized Agents: Run multiple PowerShellGPT instances, each configured with a unique Agent Name and perhaps a specific set of commands/prompts or AI model, dedicated to particular tasks (e.g., one for system monitoring, one for web scraping, one for general assistance).
  • Hierarchical Control: One “master” PowerShellGPT agent could use the Agent Bridge (triggered by its own AI or commands) to delegate tasks to other “worker” agents.
  • External Integration: Integrate PowerShellGPT actions into larger automation scripts (Python, AutoHotkey, standard batch files) by calling agent_bridge.exe.
  • Scheduled Tasks: Use Windows Task Scheduler to run agent_bridge.exe at specific times to trigger PowerShellGPT commands automatically.

Important Considerations:

  • Window Title Dependency: The bridge relies on finding a window title containing the Agent Name. Ensure Agent Names are unique.
  • No Direct Feedback: The Agent Bridge itself typically does not provide feedback to the caller about whether the command was successfully received or executed by the target agent. It’s primarily a one-way command sender.

The Agent Bridge is a simple yet powerful tool that significantly enhances PowerShellGPT’s extensibility and potential for complex, coordinated automation scenarios.

11. Plugins: Extending Browser Functionality

PowerShellGPT includes a Plugin system that allows you to automatically execute custom JavaScript code whenever web pages load within its browser environments. This is ideal for persistently modifying website behavior, adding helper functions, injecting custom tools, or automating actions across different sites.

What are Plugins?

  • A Plugin is simply standard JavaScript saved as a command in BrowserGPT.
  • What makes it a Plugin is the inclusion of the special comment //plugin// anywhere within the file’s content (usually placed near the top for clarity).

How Plugins Work

  1. Loading: When PowerShellGPT starts, it loads the content of all saved JavaScript snippets into memory.
  2. Execution: When the DOMContentLoaded event fires in any tab ie the webpage has loaded:
    • The application iterates through the loaded scripts in it’s memory.
    • For each script, it checks if the content contains the //plugin// marker.
    • If //plugin// is found, the entire content of that Javascript (after replacing dynamic variables like [modelsresponse] and [agentname]) is executed within the context of the browser tab/window that just finished loading.
  3. Targeting: By default, a script marked only with //plugin// will run in any BrowserGPT tab when its content loads. However, you can control where the plugin executes using browser directives within the plugin code itself.

Controlling Plugin Execution with Directives

You can refine where a plugin runs by including specific //run in...// directives within the plugin’s code:

  • //plugin//
    //run in console browser//
    • Behavior: This plugin will only execute when content loads in the Console Browser. It will not run in BrowserGPT tabs or the AI Model Browser.
    • Use Case: Modifying the PowerShell output interface, adding custom controls specifically for PowerShell interaction. The default Console Browser UI script is a prime example of this.
  • //plugin//
    //run in ai browser//
    • Behavior: This plugin will only execute when content loads in the main AI Model Browser.
    • Use Case: Automatically modifying the UI of the specific AI model’s web interface (e.g., adding custom buttons, changing styles), though use this with caution as AI web interfaces can change frequently.
  • //plugin// (with no //run in...// directive)
    • Behavior: This plugin will execute in any BrowserGPT tab whenever their DOMContentLoaded event fires. This occurs when the webpage has finished loading It will not run in the main AI Model Browser or the Console Browser.
    • Use Case: Injecting general-purpose helper functions, applying consistent style changes across browsed pages, adding universal tools to web pages viewed in BrowserGPT.

Creating a Plugin

  1. Write your desired JavaScript code. Remember you can use dynamic variables like [modelsresponse] and [agentname].
  2. Add the comment line //plugin// somewhere in the file (near the top for clarity).
  3. (Optional) Add a //run in console browser// or //run in ai browser// directive on a separate line if you want to restrict its execution.

Example: The Default Console Browser UI Plugin

The script that defines the entire user interface (output area, buttons, styles) for the Console Browser is itself implemented as a plugin. It contains:

  • //plugin// : Marks it as a plugin.
  • //run in console browser// : Ensures it only runs within the Console Browser environment.
  • The HTML, CSS, and JavaScript necessary to build and manage that interface.

This demonstrates the power of the plugin system for deep customization of specific application components.

Use Cases for Plugins:

  • Injecting custom CSS to restyle websites consistently.
  • Adding utility buttons or functions to specific sites you frequent.
  • Automatically filling parts of common forms.
  • Creating persistent on-page tools or information displays.
  • Modifying the core Console Browser interface.

Plugins provide a robust way to extend and customize the browser environments within PowerShellGPT automatically.

12. Special Features & Examples

Beyond the core functionality, PowerShellGPT incorporates several special features and examples that showcase its versatility and power for real-world interaction and advanced automation.

Face Detection Integration

Concept: PowerShellGPT can interact with an external Python script (face_detector.py, included) that uses your webcam to detect human faces.

How it Works:

  1. Activation: You run a saved PowerShell command "activate face detection" or use the corresponding voice trigger.
  2. PowerShell Script: This command executes the face_detector.py script using python. It passes your current Agent Name and a “detection command” (e.g., "face detected") as arguments to the Python script. It can also include parameters for detection interval, threshold, timestamping, and camera device index.
  3. Python Script (face_detector.py):
    • Initializes the specified webcam using OpenCV.
    • Uses a Haar Cascade classifier (haarcascade_frontalface_default.xml) to scan the video feed for faces.
    • Includes logic to manage detection thresholds (how long a face must be present) and notification intervals (how often to notify if a face remains present).
    • Crucially: When a face is detected according to the set parameters, the Python script calls the Agent Bridge (agent_bridge.exe).
    • It passes the Agent Name (received from PowerShell) and the detection command ("face detected", also received from PowerShell) to the Agent Bridge.
    • (Optional) If timestamping is enabled, it saves a snapshot of the detected face to the FaceCaptures folder.
  4. Agent Bridge Action: The Agent Bridge receives the call from Python, finds the PowerShellGPT instance with the matching Agent Name, and sends the “detection command” "face detected" to it.
  5. PowerShellGPT Reaction: PowerShellGPT receives the command via the bridge and executes the saved command named "face detected". The default example (face detected) plays a random .wav file from the audio subfolder.

Result: An interactive system where the application can audibly greet you or perform a predefined action whenever it “sees” you via the webcam.

Control: You can pause/resume the Python detection script using the "pause face detection" and "resume face detection" commands or by using the toggle button in the Console Browser UI.

Serial Port (COM) Communication Example

Concept: Demonstrates the ability for PowerShellGPT to execute PowerShell code generated by AI Models to interact with physical hardware connected via a serial COM port.

How it Works:

  1. You use a dynamic [KEYWORD] command like "send [message] over the serial port".
  2. The PowerShell script (send [keyword] over the serial port):
    • Configures serial port parameters (Port Name, Baud Rate, Parity, etc. – these must be edited in the file to match your hardware).
    • Uses the .NET System.IO.Ports.SerialPort class.
    • Opens the specified COM port.
    • Takes the [KEYWORD] text extracted from your voice command and sends it over the serial port using $serialPort.WriteLine().
    • (Optional) Waits for a response from the connected device and displays it.
    • Closes the port.

Result: Allows voice commands or AI prompts or the AI models to send data to and potentially receive data from microcontrollers (like Arduino), sensors, legacy equipment, or any device using serial communication.

C# App Compilation & Execution Example

Concept: Showcases the MACFARI capability – AI generating, compiling, and running code in a language other than PowerShell, using PowerShell as the orchestrator.

How it Works:

  1. You prompt the AI using a [KEYWORD] command like "write an application in c sharp that [your app description]". The AI, guided by the specific instructions in that prompt file:
    • Generates the C# source code for the requested application.
    • Generates a PowerShell script.
  2. The PowerShell script saves the C# code to a .cs file.
  3. It then calls the .NET C# compiler (csc.exe, path specified in the prompt) to compile the .cs file into a .exe executable.
  4. If compilation is successful, it uses PowerShell to display a Windows Forms confirmation dialog asking if you want to run the newly created .exe.
  5. If you click “Yes”, it uses Start-Process to execute the compiled C# application.

Result: A seamless flow from a natural language request to a running, compiled Windows application, all orchestrated by the AI and PowerShell.

Flight Finder Web Automation Example

Concept: Demonstrates BrowserGPT’s ability to automate complex web interactions using AI-generated JavaScript, including form filling, date selection, button clicking, and data scraping.

How it Works:

  1. You use a dynamic [KEYWORD] voice command like "find me a flight from [origin] to [destination] on [date] returning [date]" (based on search flight [keyword]).
  2. PowerShellGPT sends the full request and the content of search flight [keyword] (which contains example JavaScript) to the AI model.
  3. The AI uses the example JavaScript as a template:
    • It extracts the origin, destination, and dates from your specific request ([KEYWORD]).
    • It modifies the example JavaScript, replacing placeholders or example values with the details from your request.
    • It sends back the modified JavaScript wrapped in @JsGPT@... @/JsGPT@ tags.
  4. PowerShellGPT executes this tailored JavaScript in BrowserGPT:
    • The script likely first navigates to Google Flights.
    • It programmatically fills the “Where from?” and “Where to?” input fields.
    • It clicks to open the date pickers.
    • It includes logic to navigate the calendar (clicking “Next”) until the correct months are visible.
    • It clicks the specific departure and return dates.
    • It clicks the “Search” button.
    • After waiting for results, it scrapes flight details (times, airlines, price, stops) from the results page.
    • It formats the scraped results (often as JSON).
    • It sends the results back to PowerShellGPT using window.chrome.webview.postMessage("[BROWSERCOMMANDPASSWORD]Summarize these flights... " + results).
  5. PowerShellGPT receives the results and sends them (along with the “Summarize…” prefix) to the AI.
  6. The AI summarizes the flight options conversationally.

Result: A voice-controlled flight search and summarization experience, automating a multi-step web interaction.

Agent Vision Example

Concept: Allows the AI to “see” by describing an image captured from the user’s webcam.

How it Works:

  1. You use the voice command "what can you see".
  2. This triggers the saved JavaScript What can you see.
  3. The script first ensures BrowserGPT is visible (//show browser//).
  4. It then navigates BrowserGPT to an image-to-text AI tool website (//load page...//).
  5. After the page loads, the main part of the script executes:
    • It accesses the user’s webcam.
    • It briefly displays the video feed (optionally).
    • It takes a snapshot from the video stream.
    • It converts the snapshot to a data URL (Base64).
    • It programmatically “drops” this image data onto the image upload area of the image-to-text website.
    • It clicks the “Generate Description” button on the website.
    • It waits for the website to generate the text description.
    • It scrapes the generated description text from the website.
    • It sends this scraped description back to PowerShellGPT using window.chrome.webview.postMessage("[BROWSERCOMMANDPASSWORD][Description: ... ] The user asked: what can you see?").
  6. PowerShellGPT receives this message and sends it as a prompt to the AI model (Gemini/Claude/ChatGPT/Grok).
  7. The AI receives the description and the original question (“what can you see?”) and formulates a response as if it were seeing the scene described.

Result: Creates the illusion that the AI can perceive the user’s immediate environment via the webcam, responding contextually to the visual information.

AI vs. AI: Orchestrating Debates and Conversations

Concept: Leverage PowerShellGPT’s multi-browser architecture and TTS capabilities to have two AI models engage in a conversation or “argument” on a topic you specify, with their responses spoken aloud in distinct voices.

How it Works:

  1. Initiation: You use a dynamic voice command, typically a [KEYWORD] command, like "Have an argument with [AI Model Name] about [Subject]".
    • Example: "Have an argument with Claude about the future of AI".
    • This triggers a saved JavaScript file (e.g., Have an argument with claude about [keyword]).
  2. Targeting and Setup:
    • The initiating JavaScript ensures the main AI Model Browser is active and the specified “opponent” AI model is loaded into a designated BrowserGPT tab. For instance, //run in tab ID claudechittychat//orcreate https://claude.ai// ensures Claude is loaded in a tab with ID claudechittychat.
    • The script sends an initial prompt to the “opponent” AI in BrowserGPT, instructing it to engage in a combative argument on the [KEYWORD] subject and to format its responses for TTS and for sending back to the main AI. This prompt includes a specific format like:
      @JsGPT@//run in tab ID [Main_AI_Tab_ID]//speak(`YOUR RESPONSE HERE`)@/JsGPT@
      (Where `speak()` is a custom JavaScript function that handles TTS and cross-AI prompting).
    • Simultaneously, the script sends a similar instructional prompt to the AI model in the main AI Browser, telling it to also format its responses for the debate.
  3. The “speak()” JavaScript Function: This is a crucial helper function, defined within a plugin named `Right to left all in 1`. When called by an AI’s generated @JsGPT@ tag:
    1. It takes the AI’s textual response as an argument.
    2. It sends this text to the LazyPy TTS tab (systemtts1) for speech synthesis, using a [SETVOICE][Voice Name] command to ensure each AI has a distinct voice (e.g., “AI Chat Speak [text]” or “AI Chat Speak2 [text]” which internally use different voice commands).
    3. It then takes the same text and injects it as a prompt into the *other* AI’s browser window, continuing the conversation.
  4. The “Argument”: Each AI receives the other’s spoken response as a new prompt, formulates a counter-argument according to its instructions, and outputs its reply using the @JsGPT@...speak(`...`)@/JsGPT@ format. This loop continues, creating an audible and textual “argument.”

Result: An engaging (and often amusing) way to explore different AI perspectives on a topic, with the added dimension of spoken dialogue in distinct voices, leveraging the universal TTS system.

These examples illustrate how PowerShellGPT’s core features combine to enable complex, interactive, and practical applications beyond simple command-and-response interactions.

13. Advanced Browser & Session Management

Session Management transforms BrowserGPT from a simple browser into a persistent and intelligent workspace. These features allow you to manually save and restore your work and, more importantly, give your AI agent situational awareness of its own browser environment through a live state variable.

Saving and Loading Your Browser Session

The Session Manager allows you to save your entire BrowserGPT state to a file and restore it later, controlled by the Session Management icon Save/Load Session Icon.

How to Use the Session Manager:

  • Save Session (Right-Click): Right-click the icon Save/Load Session Icon. This captures the state of all open tabs—including their URLs, assigned User IDs, and which tab is active—and prompts you to assign a name before it is saved.
  • Load Session (Left-Click): Click the icon Save/Load Session Icon. You will be prompted to select a previously saved session. PowerShellGPT will close all current tabs and perfectly restore the tabs from the file, navigating each to its saved URL, reapplying any User IDs, and re-activating the previously active tab.

Use Cases:

  • Save your research tabs for a specific project.
  • Quickly switch between different sets of commonly used websites.
  • Recover your workspace after restarting the application or your computer.

The `[browserstate]` Variable: Giving the AI Situational Awareness

This manual save/load feature is the user-facing side of a powerful internal mechanism: the [browserstate] variable. This is a live, real-time JSON string, automatically maintained in Agent Memory, that describes the current state of all open BrowserGPT tabs.

JSON Structure Example:

[
  {
    "tabIndex": 0,
    "isActive": false,
    "user_id": "main_tts_engine",
    "title": "LazyPy Text-to-Speech",
    "url": "https://lazypy.ro/tts/"
  },
  {
    "tabIndex": 1,
    "isActive": true,
    "user_id": "research_docs",
    "title": "PowerShellGPT User Manual",
    "url": "file:///C:/.../Manual.html"
  }
]

Why This Matters:

Because this state is available as a variable, you can pass it to the AI model. This allows the AI to make intelligent, context-aware decisions about how to manage the browser.

Example Prompts Unlocked by `[browserstate]`:

  • "Please analyze the current [browserstate] and tell me what URLs are open."
  • "Based on this [browserstate], is there a tab with the User ID 'research_docs' already open? If not, create one by running the command 'open new research tab'."
  • "Using the [browserstate], generate a JavaScript command to switch to the tab whose title contains 'Google Flights'."

By leveraging the [browserstate] variable, you can create prompts and scripts that allow your AI agent to manage its own workspace intelligently, avoiding duplicate tabs and performing actions based on what’s already happening in the browser.

Automatic Session Persistence

In addition to manual saving and loading, PowerShellGPT provides an automatic session persistence feature to enhance reliability and convenience.

  • Automatic Save on Exit: When you close PowerShellGPT, it automatically takes the final state of BrowserGPT (all open tabs, their URLs, User IDs, and the active tab) and saves it to the Windows Registry.
  • Automatic Restore on Startup: If the “Restore Browser Session When App starts” option is enabled in the main Settings panel, PowerShellGPT will automatically load this last-saved state from the registry when it next launches. This ensures your workspace is restored exactly as you left it, providing recovery from crashes or normal restarts.

This automated system works in tandem with the manual file-based save/load, giving you both a persistent default state and the ability to manage specific, named session configurations.

14. Remote Voice Relay: Commanding PowerShellGPT from Any Device

The Remote Voice Relay is a powerful feature designed to overcome the limitations of typical remote desktop applications, which often don’t forward microphone access to the host PC. This allows you to use your mobile phone or tablet’s excellent built-in dictation capabilities as a remote microphone to issue voice commands to PowerShellGPT.

What It Solves

When you’re controlling your PC remotely using a tool like RustDesk, Chrome Remote Desktop, or TeamViewer from a tablet or phone, you can see your PowerShellGPT interface, but the application can’t hear your voice. The Remote Voice Relay bridges this gap, providing a text-based “inbox” for your dictated commands.

How It Works

  1. You open the small “Remote Voice Relay” window in PowerShellGPT on your host PC.
  2. On your remote device (phone/tablet), you tap into the Relay’s text input box.
  3. You use your mobile keyboard’s built-in microphone/dictation feature to speak a command.
  4. Your spoken words are transcribed into text and appear in the Relay’s input box on your PC.
  5. The Relay window then automatically (or manually) sends this text command to your main PowerShellGPT agent for execution.

This process effectively turns your mobile device into a high-quality, remote microphone for PowerShellGPT.

Using the Remote Voice Relay

1. Connect Remotely: First, establish a remote connection to your PC where PowerShellGPT is running.

2. Open the Relay Window: On the main PowerShellGPT interface, click the Remote Voice Relay icon (Remote Voice Relay Icon) to open the Relay window. It will position itself conveniently near the main or BrowserGPT window.

Remote Voice Relay Window

3. Dictate Your Command: On your remote device, tap to focus the text input box in the Relay window. Use your virtual keyboard’s microphone icon to start dictation and speak your command (e.g., “show browser and then search YouTube for epic space battles”).

4. Submit the Command: You have two options for submission:

  • Manual Submission: Simply press Enter on your virtual keyboard or tap the “Submit” button in the Relay window.
  • Automatic Submission: Check the “Auto Submit” box. Now, after you finish speaking, the Relay will wait for the number of seconds specified in the spin-edit box (e.g., 2 seconds of silence) and then automatically submit the command for you. This is ideal for a more hands-free experience.

Configuration

  • Auto Submit Checkbox: Toggles the automatic submission feature.
  • Delay (Seconds): When Auto Submit is on, this sets the duration of silence (in seconds) the Relay will wait for before submitting the dictated text. Adjustable from 1 to 10 seconds.

Use Cases

  • Remote System Administration: Run PowerShell scripts and check system status from your tablet while in another room.
  • Media Center Control: Control a PC connected to your TV from the couch using your phone to play music or videos via PowerShellGPT’s AI DJ feature.
  • Accessibility: For users who find mobile dictation more accurate or convenient than desktop speech recognition, the Relay provides an excellent alternative input method.

15. Agent Memory: The AI’s Live, Shared Notepad

Agent Memory is one of PowerShellGPT’s most advanced features, giving your AI assistant a session-wide, short-term memory. This system allows you to store, append, and recall information—like text, URLs, or data—in temporary variables that are shared across the entire application, including all BrowserGPT tabs and PowerShell scripts.

This transforms your AI from a simple command-response tool into a dynamic assistant that can perform complex, stateful, multi-step tasks by carrying information from one step to the next seamlessly.

Two Ways to Use Memory: Static Placeholders vs. The Live API

PowerShellGPT offers two complementary systems for using Agent Memory, each suited for different tasks.

1. The Classic Placeholder System (For Simple Injection)

This is the simplest way to use a stored variable. When you write a command, you can include a variable name wrapped in square brackets. PowerShellGPT will perform a one-time “find and replace” before the command is executed.

Syntax: [variableName]

Best for: Quick, one-shot commands where you just need to inject a stored value at the moment of execution.

Example (PowerShell):

# First, set a variable using the Live API (see below) or the Memory Panel.
# For this example, assume [projectName] has been set to "Report-Q4".

# Your command:
New-Item -ItemType Directory -Path "C:\Work\[projectName]"

# What PowerShellGPT actually runs:
New-Item -ItemType Directory -Path "C:\Work\Report-Q4"

Limitation: This method is static. If the value of `[projectName]` changes later, scripts that have already been run will not be affected.


2. The Live Agent Memory API (For Dynamic and Cross-Tab Workflows)

For advanced and interactive tasks, PowerShellGPT injects a powerful JavaScript “bridge” object, window.powershellGpt, into every browser tab. This API allows scripts to communicate with the Agent’s memory in real-time.

Core Concept: Cross-Tab Communication: Think of this as a shared digital notepad that every browser tab is watching simultaneously. When a script in Tab A writes a new note, Tab B sees the change instantly and can react to it without needing a page refresh. This is the key to creating interconnected agentic workflows.

How to Use the Live API: The Basic Actions

You can perform all memory actions from any JavaScript environment within BrowserGPT. You can write these snippets in the JavaScript Scratchpad or have the AI generate them for you.

Action 1: Set or Overwrite a Variable
This creates a new variable or completely replaces the value of an existing one.

// Stores a file path under the name 'projectPath'
window.powershellGpt.setVar('projectPath', 'C:\\Reports\\Q4');

Action 2: Append to a Variable
This adds new text to the end of a variable, automatically starting on a new line. It’s perfect for building lists or logs.

// Assuming 'todoList' was already set to "My Tasks:", this adds a new item.
window.powershellGpt.appendVar('todoList', '- Finish the slides');

Action 3: Get the Current Value of a Variable
This lets your script retrieve the latest value directly from the agent’s memory.

async function showPath() {
  const path = await window.powershellGpt.getVar('projectPath');
  alert('The current project path is: ' + path);
}
showPath();

Action 4: Clear a Variable from Memory

window.powershellGpt.clearVar('projectPath');
The Most Powerful Feature: Subscribing to Live Changes

You can make a script “subscribe” to a variable. This means a function you write will **automatically run every time** that variable’s value is changed, from anywhere in the application.

Syntax: powershellGpt.onVarChange('variableName', (newValue) => { /* Your code here */ });

Live Dashboard Example

This demonstrates how two different tabs can communicate in real-time.

  1. Step 1: Launch a Monitoring Dashboard in Tab 1
    Run a script that creates a display box and subscribes to changes for the `currentStatus` variable.
    // Script for Tab #1
    const displayBox = document.createElement('div');
    displayBox.style.cssText = 'position:fixed; top:20px; left:20px; padding:15px; background:blue; color:white; border-radius:5px; z-index:9999;';
    document.body.append(displayBox);
    
    // This function will run automatically EVERY time 'currentStatus' changes
    window.powershellGpt.onVarChange('currentStatus', (newValue) => {
      displayBox.textContent = `AGENT STATUS: ${newValue || '(idle)'}`;
    });
    
    // Also get the initial value when the script first loads
    window.powershellGpt.getVar('currentStatus').then((initialValue) => {
      displayBox.textContent = `AGENT STATUS: ${initialValue || '(idle)'}`;
    });
  2. Step 2: Run a Task in Tab 2
    In a completely different tab, run commands that update the status.
    // Run this first in Tab #2
    window.powershellGpt.setVar('currentStatus', 'Processing data...');
  3. The Live Result: The moment the script in Tab 2 sets the variable, the display box in Tab 1 will **instantly update its text** to “AGENT STATUS: Processing data…” without any user interaction or page reloads.

By combining the simplicity of static placeholders for basic tasks with the power of the live API for dynamic workflows, Agent Memory provides a complete toolkit for building sophisticated and state-aware automation solutions.

16. Settings Deep Dive

PowerShellGPT offers a range of settings, now primarily stored in the Windows Registry (under HKEY_CURRENT_USER\Software\PowerShellGPT\Settings), to customize its behavior. Access the main Settings Panel by selecting “Settings” in the Mode Selector or using the "Show settings" voice command. Settings are automatically saved when changed.

Settings Panel

Main Settings:

  • Send system prompt when app starts: Sends Prompts\system prompt on launch. (Default: Checked)
  • Run command when app starts:Runs user specified command on launch. (Default: Unchecked)
  • Send PowerShell output to model: Automatically sends PowerShell results back to the AI. (Default: Checked)
  • Submit speech to model as prompt: Automatically sends finalized speech to the AI. (Default: Checked)
  • Grant permanent PowerShell access: Allows AI-generated PowerShell commands to run without prompts. (Default: Unchecked)
  • Confirm critical commands: Adds an extra warning for potentially dangerous PowerShell. (Default: Checked)
  • Read aloud AI’s responses:
    • Checked (Default): Enables automatic Text-To-Speech for AI responses.
      • For ChatGPT: Works in conjunction with the “Use ChatGPT TTS” checkbox in ChatGPT’s in-browser AI Control Panel. If “Use ChatGPT TTS” is checked, ChatGPT’s native voice is used. If unchecked, the external LazyPy TTS system is used.
      • For other AI Models (Gemini, Claude, Grok, LM Studio): Uses the external LazyPy TTS system with the configured Default Voice.
    • Unchecked: AI responses are text-only unless TTS is triggered manually (e.g., “Read that to me” or the speaker icon in the AI Control Panel).
  • Prevent Looping: Activates detection and intervention for repetitive AI-application cycles. (Default: Checked)
  • Restore Browser Session When App starts: Restores Browser’s Tabs When the application first launches. (Default: Unhecked)
  • Speech Recognition Mode:
    • Click to Talk
    • Constant
    • Wake Word
  • Agent Name (AgentName): Sets the Wake Word and Agent Bridge identifier. (Default: “Computer”)
  • Speech Recognition Finalization Delay: Controls speech input timeout (1-10, representing 0.5s-5.0s). (Default: 4 (2.0 seconds))
  • Default TTS Voice (DefaultVoice) Specifies the default voice to be used by the LazyPy TTS system if no [SETVOICE] command is active. Example: “Scientist (English, American)”. This is a string that must match a voice name available in LazyPy. (Default: “Scientist (English, American)”)

BrowserGPT Settings (JavaScript Execution & Web Browsing):

  • Grant permanent browser access: Allows AI-generated JavaScript to run in BrowserGPT without prompts. (Default: Unchecked)
  • Send browser output to model: Automatically forwards postMessage data from BrowserGPT to the AI. (Default: Checked)
  • Home Page: URL for new tabs/Home button. (Default: “https://www.google.com”)
  • Default Editor: External editor for Scratchpad/output. (Default: “notepad.exe”)
  • Bookmark Bar Visibility: Remembers if the bar was visible. (Default: False)
  • Bookmark URLs: Stores the URLs for the 13 bookmark slots.

Other Settings

  • Language: Stores selected speech recognition language/dialect codes. (Default: US English)
  • Join Commands Phrase: Phrase for chaining commands. (Default: “and then”)
  • Wait Phrase: Phrase for timed delays in chains. (Default: “wait for”)

Run Command on Startup

This setting allows you to define a command that will run automatically every time PowerShellGPT starts. It’s perfect for personalizing your environment and automating your initial workflow.

How to Configure:

  1. Navigate to the main **Settings Panel**.
  2. Click the checkbox next to “Run Command When App Starts”. An input dialog will appear, You can also right-click the checkbox.
  3. In the input dialog, enter the exact command you want to run. This can be:
    • The name of a saved PowerShell command (e.g., check system status).
    • The name of a saved JavaScript script (e.g., load my work tabs).
    • The name of a saved AI prompt (e.g., load developer persona).
    • A built-in command (e.g., show browser).
    • A chained command (e.g., show browser and then wait for 3 seconds and then load my work tabs).
  4. Click “OK”. The command is now saved.

The next time you launch the application, this command will execute after the core components have initialized, setting up your workspace just the way you like it.

Run command on wake-word

  • Run command on wake-word: When checked, you can define a command (saved command, prompt, script, alias, or chain) that will execute automatically every time the Wake Word is detected. This allows for powerful, context-free actions to be triggered just by getting the agent’s attention. Configure it by right-clicking or left-clicking the checkbox. (Default: Unchecked)

17. Security Considerations

PowerShellGPT is an incredibly powerful tool precisely because it allows AI models to execute code directly on your system (via PowerShell) and interact with web pages (via JavaScript). This power comes with inherent risks that you must understand and manage responsibly.

Understanding the Risks

PowerShell Execution:

PowerShell commands can do almost anything on your Windows system, including:
  • Deleting files or entire directories (Remove-Item, del).
  • Modifying system settings via the Registry (reg, Set-ItemProperty).
  • Stopping critical processes (Stop-Process, kill).
  • Downloading and running external scripts or executables (Invoke-WebRequest, Start-Process).
  • Changing security policies (Set-ExecutionPolicy).
An AI, especially if poorly prompted or encountering unexpected errors, could potentially generate commands that cause data loss, system instability, or security vulnerabilities if executed without scrutiny.

JavaScript Execution (BrowserGPT):

JavaScript executed within BrowserGPT runs with the context of the loaded webpage. While generally sandboxed within the browser, malicious or poorly written JavaScript could potentially:
  • Attempt phishing attacks by mimicking legitimate login forms within the browser context.
  • Try to exploit browser vulnerabilities.
  • Interact with websites in unintended ways if the AI misunderstands the page structure or your request.
  • Send sensitive information from the page back to the AI (if postMessage is used improperly).

PowerShellGPT’s Security Features

The application includes several features designed to mitigate these risks:

  • Permission Prompts (Default Behavior):
    • PowerShell: By default, PowerShellGPT always prompts you with an Allow/Deny dialog before executing any PowerShell code generated by the AI. You see the command and must explicitly allow it.
    • BrowserGPT JavaScript: By default, you are prompted before any AI-generated JavaScript is executed within BrowserGPT tabs.
    • Importance: These prompts are your primary line of defense. Always review the code shown in the prompt before clicking “Allow”. If you don’t understand what it does, click “Deny”.
  • Critical Command Warnings:
    • Purpose: Provides an extra layer of warning specifically for PowerShell commands identified as potentially dangerous (file deletion, registry modification, process killing, etc.).
    • Behavior: If enabled (default), you get a second warning dialog before the main Allow/Deny prompt for these specific commands, highlighting the risky keyword.
    • Recommendation: Keep this enabled unless you are an advanced user fully aware of the risks and comfortable reviewing all PowerShell code yourself.
  • Browser Command Password (BROWSERCOMMANDPASSWORD):
    • Purpose: Prevents malicious websites loaded in BrowserGPT from arbitrarily triggering internal PowerShellGPT commands (like running saved scripts) using window.chrome.webview.postMessage().
    • Mechanism: Any postMessage intended to trigger an action within PowerShellGPT (specifically, the [runcommand] prefix) must include the correct, secret Browser Password. Messages without the valid password prefix that attempt [runcommand] will be ignored by the application’s command parser.
    • Management: This password is auto-generated if not set and can be viewed/changed via the padlock icon (Padlock Icon) in BrowserGPT. Keep it secure. Note that JavaScript generated by the AI and executed by the application automatically has the correct password embedded. Before executing Javascript internally PowershellGPT converts [BROWSERCOMMANDPASSWORD] to the actual password, ensuring commands cannot be executed externally without the password.

Granting Permanent Access (Use with Extreme Caution!)

  • PowerShell: Checking this box disables all Allow/Deny prompts for AI-generated PowerShell code. The code will execute immediately upon detection.
  • BrowserGPT: Checking this box disables all Allow/Deny prompts for AI-generated JavaScript code executed in BrowserGPT.

RISK: Granting permanent access means you are placing full trust in the AI model and your prompting skills to never generate harmful or unintended code. A single mistake or unexpected AI response could lead to immediate negative consequences without a chance for you to intervene.

Recommendation: It is strongly recommended to leave permanent access disabled, especially for PowerShell, unless you have a very specific, controlled use case and fully understand the implications. Rely on the Allow/Deny prompts for regular use. If you grant permanent access, it is essential to keep the “Confirm critical commands” setting enabled as a partial safeguard for PowerShell.

Safe Scripting Practices

  • Be Specific with Prompts: Clearly define what you want the AI to do and any constraints. Avoid overly broad or ambiguous requests, especially when dealing with system modifications or file operations.
  • Review AI-Generated Code: Even with prompts enabled, take a moment to read the PowerShell or JavaScript code presented in the Allow/Deny or Preview dialogs. If it looks suspicious or you don’t understand it, deny execution.
  • Start Simple: Begin with basic commands and gradually increase complexity as you gain confidence in the AI’s responses and your ability to oversee it.
  • Backup Your Data: Regularly back up important files. Automation, whether AI-driven or manually scripted, can sometimes have unintended consequences.
  • Understand the Commands: If asking the AI to use complex PowerShell cmdlets or JavaScript APIs, try to have a basic understanding of what they do.
  • Limit Scope (Self-Imposed): When asking for file operations, be specific about paths (e.g., “in my Downloads folder”) rather than generic requests (“delete temporary files”).

PowerShellGPT offers unprecedented power by bridging AI and execution. Use its capabilities wisely and always prioritize careful review and understanding before allowing code execution on your system or in your browser.

18. Troubleshooting

While PowerShellGPT aims for seamless operation, you might occasionally encounter issues. This section provides guidance on diagnosing and resolving common problems.

1. Application Fails to Start or Crashes on Launch

  • Registry Settings Issues:
    • Symptom: Application crashes immediately or behaves erratically on startup, especially after a version update or if registry permissions are problematic.
    • Solution: You may need to clear the application’s registry settings. **Warning: This will reset all your custom settings to their defaults.**
      1. Open the Registry Editor (regedit.exe).
      2. Navigate to HKEY_CURRENT_USER\Software\PowerShellGPT.
      3. Delete the entire PowerShellGPT key (or specifically the Settings subkey: HKEY_CURRENT_USER\Software\PowerShellGPT\Settings).
      4. Restart PowerShellGPT. It will recreate the key with default values.
  • Antivirus Interference:
    • Symptom: Application is blocked, quarantined, or fails to run certain components (like agent_bridge.exe or PowerShell processes).
    • Solution: Check your antivirus software’s logs or quarantine. You may need to add PowerShellGPT’s executable (PowerShellGPT.exe), its helper utilities (agent_bridge.exe), or potentially its entire installation folder to your antivirus software’s exclusion list. Do this with caution and only if you trust the source of PowerShellGPT.

2. AI Model Browser Issues

  • Page Not Loading / Blank Screen:
    • Symptom: The area where Gemini/Claude/ChatGPT/Grok/LM Studio should appear is blank, white, or shows a connection error.
    • Solution:
      • Check your internet connection.
      • Verify the AI service itself is not down (try accessing it in a standard web browser).
      • Click the corresponding AI model’s refresh icon Refresh Icon.
      • Try switching to a different AI model and then back again.
      • For LM Studio, ensure your LM Studio server is running and accessible at the configured URL (default http://127.0.0.1:1234)
  • AI Not Responding to Prompts:
    • Symptom: You type a prompt, but the AI doesn’t generate a response.
    • Solution: Check the AI’s web interface for any errors, login requirements, captchas, or rate limit messages. Ensure you are logged into the respective service if required. Try refreshing the AI browser.
  • AI Not Generating @Tags or Expected Output:
    • Symptom: You ask for a PowerShell command or JavaScript, but the AI responds with plain text code without the required tags, or doesn’t follow other instructions (e.g., for TTS or LM Studio internet search).
    • Solution:
      • The AI may have “forgotten” its instructions. Resend the appropriate system prompt (system prompt for PowerShell, javascript system prompt for general JS, or a model-specific system prompt if you’ve created one) using the Prompt Management area.
      • Ensure your own prompt clearly requests the code within the tags or the specific desired behavior.
      • For LM Studio, check the “System Prompt” in its in-browser control panel to ensure it contains the instructions for `` tags.
  • AI Output Control Panel Not Appearing/Working:
    • Symptom: The floating panel with Output Mode, Recapture, and TTS controls is missing or unresponsive.
    • Solution: This panel is injected via JavaScript. If the AI’s page structure has changed significantly or if there’s a JavaScript error, it might fail to load. Try refreshing the AI Model Browser.

3. PowerShell Execution Issues

  • Commands Not Executing:
    • Symptom: You click “Allow” on the permission prompt, but nothing happens in the Console Output or Console Browser.
    • Solution:
      • Ensure PowerShell itself is functioning correctly on your system (open a standard PowerShell window and try a simple command like Get-Date).
      • Check if antivirus software is blocking the hidden PowerShell process.
  • Errors in Console Output:
    • Symptom: PowerShell errors appear in the Console Browser.
    • Solution: This is often an error in the AI-generated code.
      • If the “Send PowerShell output to model” setting is enabled, the error will be sent back to the AI, which may attempt to self-correct.
      • If self-correction fails, copy the error message and the problematic code, and prompt the AI again, asking it to fix the specific error.
      • You can also edit the command in the “Last Command” view and try running it again manually.
  • Console Browser UI Issues:
    • Symptom: The Console Browser looks strange, buttons don’t work, or output isn’t displayed correctly.
    • Solution: This is likely due to errors in the custom JavaScript UI plugin (PowerShell Console System). Try restoring the default version of this file from the original application package or debug the JavaScript code.

4. BrowserGPT / JavaScript Execution Issues

  • JavaScript Not Executing:
    • Symptom: You click “Run JavaScript” or the AI sends @JsGPT@ code, permission is granted, but nothing happens on the target webpage.
    • Solution:
      • Verify the JavaScript code itself is valid and appropriate for the target page. Use BrowserGPT’s developer tools (if accessible, usually F12 on the active tab) to test snippets.
      • Ensure the correct tab/browser is being targeted (Active Tab vs. specific directives like //run in tab ID...//).
      • Complex websites (React, Angular, etc.) might require more sophisticated JavaScript selectors or waiting mechanisms than the AI initially provides.
      • Check for `//silent//` directive if you expect a notification but don’t see one.
  • Browser Directives Not Working:
    • Symptom: Commands like //run in tab ID...// or //load page...// don’t seem to function as expected.
    • Solution: Double-check the syntax exactly (see Appendix A). Ensure correct spacing, the presence of the starting // and ending //, and valid parameters (URLs must start http:// or https://, User IDs must match exactly case-sensitive).
  • Messages Not Received from Browser (postMessage)
    • Symptom: JavaScript on the webpage calls window.chrome.webview.postMessage(), but the message doesn’t appear in the AI Feedback / Prompt Input Editor
    • Solution: Crucially, verify the JavaScript includes the correct [BROWSERCOMMANDPASSWORD] (e.g., postMessage("[BROWSERCOMMANDPASSWORD]My message")). Messages without the valid password prefix are ignored if they attempt [runcommand].
  • Tab Management Issues:
    • Symptom: Cannot close tabs, switching fails, favicons/titles don’t update correctly, loading animation stuck.
  • Browser Notification Pop-ups Not Appearing:
    • Symptom: Expected blue/green/red notification pop-ups in BrowserGPT don’t show.
    • Solution:
      • Check for JavaScript errors in the specific BrowserGPT tab’s developer console (F12, if accessible) that might prevent the notification script from running.
      • Ensure the //silent// directive is not accidentally present in the script being executed.

5. Voice Recognition & TTS Issues

  • Mic Not Working / Red Mic Icon:
    • Symptom: Clicking the mic does nothing, or the icon turns red Red Mic.
    • Solution:
      • Ensure your microphone is correctly connected and configured in Windows Sound settings.
      • Make sure PowerShellGPT (specifically the browser component handling speech) has microphone permission in Windows Privacy settings.
      • The application might require a restart if it encountered an error accessing the mic.
      • Check if the micaccessgranted file exists in the application directory.
      • If it doesn’t, the app might need to request permission again (which could involve clearing the browser cache – Close the App. Run the Delete BrowserCache.Bat file located in the applications directory.) This will cause the app to request microphone permissions again when speech recognition is activated.
  • Poor Recognition Accuracy:
    • Symptom: Spoken words are consistently misinterpreted.
    • Solution:
      • Ensure you have selected the correct Language and Dialect in the Voice Recognition settings window. This is the most common cause.
      • Speak clearly and at a moderate pace.
      • Reduce background noise.
      • Try a different microphone.
  • Wake Word Not Triggering:
    • Symptom: Saying the Agent Name doesn’t activate processing in Wake Word mode.
    • Solution:
      • Ensure the Agent Name is set correctly and is something the speech engine can reliably recognize. Avoid very short or complex names.
      • Make sure you are speaking the name clearly.
      • Check that voice recognition is actually running (Green Mic icon Green Mic).
  • TTS Not Working or Wrong Voice:
    • Symptom: AI responses are not spoken, or the wrong voice is used.
    • Solution:
      • Ensure the “Read aloud AI’s responses” setting is checked in the main Settings panel.
      • Verify the BrowserGPT tab with ID systemtts1 is open and successfully navigated to https://lazypy.ro/tts/. Try manually navigating it there via BrowserGPT’s address bar if unsure.
      • Check your internet connection, as LazyPy is an online service.
      • In the LazyPy tab, ensure a voice is selected in its native dropdown.
      • Check the DefaultVoice setting in PowerShellGPT’s Settings panel and ensure it’s a valid voice name from LazyPy.
      • If using ChatGPT, check the “Use ChatGPT TTS” checkbox in its AI Output Control Panel to toggle between ChatGPT’s voice and the LazyPy TTS.
      • For AI vs. AI conversations, ensure the JavaScript commands (like “AI Chat Speak”) are correctly specifying `[SETVOICE][VoiceName]` for the LazyPy tab.

If you encounter persistent issues not covered here, consider seeking help from the PowerShellGPT.com

19. Appendices

Appendix A: Browser Control Directives Reference

These special comment-like directives control where and how JavaScript code is executed within PowerShellGPT’s browser environments. They are typically placed at the beginning of a script block.

Syntax Rules:

  • Start with // and end with //. Keywords are case-insensitive.
  • Placeholders ([UserID], [URL], [Text], [CODE]) are where you insert specific values.
  • [UserID] is case-sensitive. [Text] for title/URL matching is case-insensitive.
  • [URL] for //load page...// and //orcreate...// must start with http:// or https://.
  • [CODE] refers to the JavaScript code following the directive’s closing //.

Core Directives:

  • //show browser//
    • Ensures BrowserGPT is visible. Any subsequent [CODE] runs in the active tab.
  • //plugin//
    • AnyJavaScript containing this directive will be automatically run whenever a webpage finishes loading in relevant browser contexts.
  • //run in ai browser//
    • Executes the entire script block in the main AI Model Browser. If used in a plugin, restricts plugin execution to this browser.
  • //run in console browser//
    • Executes the entire script block in the Console Browser (Form5). If used in a plugin, restricts plugin execution to this browser.
  • //load page [URL]//
    [CODE]
    • Navigates the active BrowserGPT tab to [URL]. [CODE] is queued and executed in that tab after the page loads.
  • //load page in new tab [URL]//
    [CODE]
    • Creates a new BrowserGPT tab, navigates it to [URL], the tab is not made active. [CODE] is queued and executed in the new tab after the page loads.
  • //set tab ID [UserID]//
    [CODE]
    • Assigns [UserID] to the active BrowserGPT tab. Overwrites previous IDs. [CODE] (if present) runs immediately in the active tab.

Targeted Execution & Switching Directives:

  • //run in tab ID [UserID]//
    [CODE]
    • Executes [CODE] in the BrowserGPT tab identified by [UserID] without changing the active tab.
  • //run in tab titlecontains [Text]//
    [CODE]
    • Executes [CODE] in the first BrowserGPT tab whose title contains [Text].
  • //run in tab urlcontains [Text]//
    [CODE]
    • Executes [CODE] in the first BrowserGPT tab whose URL contains [Text].
  • //switch to tab ID [UserID]//
    [CODE]
    • Finds tab by [UserID], makes it active. [CODE] (if present) runs in the now-active tab.
  • //switch to tab titlecontains [Text]//
    [CODE]
    • Finds tab by title containing [Text], makes it active. [CODE] (if present) runs in the now-active tab.
  • //switch to tab urlcontains [Text]//
    [CODE]
    • Finds tab by URL containing [Text], makes it active. [CODE] (if present) runs in the now-active tab.

Modifiers (used in conjunction with other directives):

  • //orcreate [URL]//
    • **Placement:** Immediately follows the target specifier part of a //run in tab...// directive (e.g., //run in tab ID MyTabID//orcreate https://example.com//
      [CODE]).
    • **Behavior:** If the target tab (from //run in tab...//) is not found, a new tab is created, navigated to [URL], and [CODE] is queued. If the original command was //run in tab ID [UserID]//orcreate..., the [UserID] is assigned to the new tab. By default, the new tab is created in the background; use //switch to target tab// in the [CODE] block to focus it.
  • //switch to target tab//
    • **Placement:** Within the final [CODE] block of a //run in tab...// or an //orcreate...// sequence.
    • **Behavior:** If an existing tab was found by //run in tab...//, it makes that tab active before running the rest of the [CODE]. If a new tab was created by //orcreate...//, it makes that new tab active.
  • //silent//
    • Placement: Anywhere within a JavaScript block.
    • Behavior: Suppresses the blue “Executing script…” or “Loading HTML…” notification pop-up in BrowserGPT for that specific execution.

Appendix B: Built-in Voice Commands Reference

These phrases, when spoken while voice recognition is active, trigger internal application functions directly, overriding saved items or AI prompts with the same name. Many commands controlling settings can be negated by prefixing with “Don’t”.

Settings Toggles:

  • Send system prompt when app starts / Don't send system prompt when app starts
  • Send powershell output to model / Don't send powershell output to model
  • Send browser output to model / Don't send browser output to model
  • Submit speech to model as prompt / Don't submit speech to model as prompt
  • Grant permanent powershell access / Don't grant permanent powershell access
  • Grant permanent browser access / Don't grant permanent browser access
  • Confirm critical commands / Don't confirm critical commands
  • Read aloud AI's responses / Don't read aloud AI's responses (Toggles TTS for all models; for ChatGPT also interacts with its specific “Use ChatGPT TTS” checkbox)
  • Prevent looping / Don't prevent looping

UI Navigation & Display:

  • Show yourself (Brings main PowerShellGPT window to front)
  • Show settings / Hide settings / Don't show settings
  • Show prompts
  • Show commands
  • Show last command
  • Show browser / Hide browser (Toggles BrowserGPT visibility)
  • Browser full screen / Exit full screen (For BrowserGPT)
  • Show browser controls / Hide browser controls (For BrowserGPT)
  • Expand browser / Collapse browser (BrowserGPT view modes)
  • Show javascript / Hide javascript (Toggles BrowserGPT’s JS Scratchpad visibility)
  • Show voice recognition / Hide voice recognition (Voice Recognition settings window)
  • Show command aliases / Hide command aliases (Command Alias manager)

Voice Recognition Control:

  • Set voice recognition mode to click to talk
  • Set voice recognition mode to constant
  • Set voice recognition mode to wake word
  • Stop listening
  • Cancel cancel cancel (Aborts and restarts current speech recognition attempt)

AI/Browser Interaction:

  • Allow access (Clicks “Allow” on permission prompts for PowerShell/JS)
  • Deny access (Clicks “Deny” on permission prompts)
  • Stop talking (Attempts to stop active TTS output from AI model’s native TTS or the external LazyPy TTS via stop tts system command)
  • Read that to me (Triggers TTS of the AI’s last response using the external LazyPy TTS system)

Agent Configuration:

  • Change your name to [New Name] (Updates Agent Name and Wake Word)

AI Model Switching:

  • Switch to Gemini
  • Switch to Claude
  • Switch to ChatGPT
  • Switch to Grok
  • Switch to LM Studio

Command Chaining/Waiting:

  • [Command 1] and then [Command 2] (Default phrase, customizable)
  • [Command1] and then wait for [1-60] seconds and then [Command 2] and then... (Default phrases, customizable)
  • [Command1] and then wait for [1-60] minutes and then [Command 2] and then... (Default phrases, customizable)