Why does Sentinel use Naive Bayes instead of a neural network or a modern LLM?

Naive Bayes inference is microseconds, not milliseconds, runs embedded in the binary with no API calls and adds zero external dependencies. A neural model would need external inference infrastructure, introducing network latency and an additional service to manage. For development-time email feedback the statistical approach is fast, self-contained and sufficient — the goal is not to replicate Gmail's proprietary filters but to catch the most common patterns that break templates before they reach production.

Can the spam detection model be updated or retrained with our own data?

The current model is compiled into the Vellum binary at build time via a Go embed directive. Replacing it requires publishing a new image. The training data, tokenization logic and classifier parameters are documented in the source code for anyone who wants to evaluate them, propose improvements or retrain with a different dataset.

Does a Sentinel score below the threshold mean the email will reach the inbox?

No. Sentinel Verified uses a statistical model trained on general spam patterns — not the proprietary algorithms of Gmail, Outlook or Yahoo. A score below 0.75 means the email does not trigger known spam vocabulary patterns, but real deliverability depends on domain reputation, SPF, DKIM and DMARC configuration, sending volume, engagement history and dozens of additional signals that a development server cannot control.

How Sentinel Verified works — Naive Bayes classifier

What is Sentinel Verified

Sentinel Verified is Vellum’s content analyzer. It takes the subject and plain text body of each received email and calculates the probability that a real spam filter would reject it. It does not use fixed rules like “if it contains the word ‘free’ add 2 points”: it uses a multinomial Naive Bayes classifier trained with real spam and ham examples. The model learns the relative frequencies of each word in each class and makes decisions probabilistically. The result is a score between 0 and 1, where values close to 1 indicate high probability that the email would be classified as spam.

How the model is built

The model starts from two training datasets: spam examples and ham examples. Training is fundamentally a counting process. For each example, the algorithm converts the text to lowercase and extracts tokens of at least 3 characters composed exclusively of letters and digits. Short words like “of”, “the”, or “an” are automatically discarded for not meeting the minimum length.

For each unique token within a document — without repeating it if it appears multiple times in the same text — the counter for that word in the corresponding class is incremented. When finished, the model stores how many documents exist per class and the absolute frequency of each token in each class. The vocabulary is the union of all unique tokens from both classes.

The result is serialized as JSON and embedded directly in Vellum’s binary via a compile-time embed directive. When the application starts, the model loads into memory and is ready for inference without external files or connections to third-party services.

Inference: from an email to a probability

When an email arrives, Sentinel concatenates the subject and plain text body and executes the following five-step process.

Tokenization

The text is converted to lowercase and split into tokens. Only tokens of at least 3 characters composed exclusively of letters or digits pass through. The criterion is identical to training: a token the model did not see during training has no recorded frequency, but Laplace smoothing guarantees it still contributes a small value instead of producing a division by zero or an undefined logarithm.

Log prior

Before analyzing content, the classifier starts with an initial estimate of how likely any email is to be spam or ham, based on the training set distribution:

logPrior(ham)  = log( (docsHam  + 1) / (totalDocs + 2) )
logPrior(spam) = log( (docsSpam + 1) / (totalDocs + 2) )

The +1 in the numerator and +2 in the denominator are Laplace smoothing. Without this adjustment, a class with zero documents would produce log(0), which is mathematically undefined. With the adjustment, even a class without examples has a small but valid prior probability.

Per-token likelihood

For each unique token in the email, the model calculates how likely that word is to appear in each class:

logP(token | class) = log( (count(token, class) + 1) / (totalTokens(class) + VocabSize) )

The +1 in the numerator ensures a token never seen in that class does not produce log(0). The VocabSize in the denominator normalizes the distribution considering the complete vocabulary size. This is the same Laplace smoothing principle applied at the word level instead of the document level.

Accumulated score

The final score for each class is the sum of the prior and all log-probabilities of the email’s unique tokens:

score(ham)  = logPrior(ham)  + Σ logP(token | ham)
score(spam) = logPrior(spam) + Σ logP(token | spam)

Working in logarithmic scale has a practical reason: multiplying many small probabilities together produces values so close to zero that floating-point arithmetic rounds them to zero. Summing logarithms produces the same mathematical result without that problem.

Conversion to probability

The two scores are converted to a probability in [0, 1] via softmax with numerical overflow correction:

maxScore = max(score(ham), score(spam))
pSpam = exp(score(spam) - maxScore) / (exp(score(ham) - maxScore) + exp(score(spam) - maxScore))

Subtracting maxScore before calculating the exponentials avoids overflow and underflow in 64-bit arithmetic. The result is mathematically equivalent to standard softmax but numerically stable regardless of score magnitude.

The threshold and trigger tokens

The final decision applies a threshold of 0.75. If pSpam >= 0.75, the email is classified as spam. The threshold is calibrated to reduce false positives in development contexts: templates with technical vocabulary should not be classified as spam due to words that are neutral outside the context of mass marketing.

When the classifier exceeds the threshold, it identifies up to 5 trigger tokens: the words whose differential contribution to score(spam) over score(ham) is greatest. These tokens are shown in Vellum’s interface along with the probability percentage. The developer can see precisely which specific part of their template is activating the classifier and act accordingly.

Naive Bayes has been the reference classifier for spam detection since Paul Graham’s work in 2002. Its assumption of independence between words is technically incorrect — words in a text are not independent of each other — but it produces solid results in email classification because the per-class vocabulary pattern is sufficiently distinct. Its computational cost is minimal: a sum of logarithms for each token in the message.

Why this approach works within Vellum

A statistical classifier embedded in the binary fulfills exactly the requirements of a testing server: it operates without calls to external services, without network latency, and without additional dependencies. Inference is a microsecond operation that runs in the same process receiving the SMTP email.

The result does not pretend to replicate production spam filters like SpamAssassin or Google’s proprietary models. It aims to detect the most common patterns that cause email templates to fail during development, at the moment when it is still cheap and fast to fix them. That is Sentinel Verified’s goal: make the problem appear in the environment where it costs nothing to resolve.