PhishGuard AI
Model Active — 98.26% Accuracy
Configuration

Settings & Profile

Configure your PhishGuard AI preferences and model parameters.

Model Information
Model TypeLogistic Regression
VectorizerTF-IDF (unigrams + bigrams)
Max Features40,000
Test Accuracy98.26%
ROC-AUC0.9969
Training Set14,476 samples
Test Set3,620 samples
DatasetPhishing Email Dataset (18K+)
Threat Level Thresholds
Critical Threat 85%
High Threat 65%
Medium Threat 40%
Safe <40%

Thresholds are based on the trained model's probability outputs. Adjust in phishing_app.py for production use.

Preprocessing Pipeline
LowercaseEnabled
URL NormalizationEnabled (url token)
Email NormalizationEnabled (email token)
HTML Tag RemovalEnabled
Non-Alpha RemovalEnabled
Min Term Frequency3 documents
Max Term Frequency90% of docs
Sublinear TF ScalingEnabled
Advanced Machine Learning Pipeline Spec
Email Text
TF-IDF
Logistic Reg.
Result
Core AI Brain

Logistic Regression Engine

The central threat model that evaluates semantic patterns to calculate the exact statistical likelihood of a phishing attempt.

sklearn.linear_model.LogisticRegression
Semantic Extraction

TF-IDF Word Frequency Model

Converts raw email text into unique numeric weight vectors by analyzing key word counts (individual words + double word pairs).

TfidfVectorizer(ngram_range=(1,2))
Complexity Tuning

Regularization Scale (C = 1.0)

Balances model sensitivity. A C-value of 1.0 guards against "over-fitting", making sure the model handles brand-new emails perfectly.

C_parameter = 1.0 (Balanced)
Optimizer Solver

L-BFGS Numerical Optimizer

A fast, memory-optimized optimization algorithm used to discover the mathematically perfect dividing line between safe and threat emails.

solver='lbfgs' (max_iter=1000)
Bias Prevention

Balanced Weight Distribution

Automatically balances the importance of benign and malicious classes during training to protect against biased alerts.

class_weight='balanced'
Threat Intel Split

80% Learn / 20% Rigorous Test

Splits the security threat library. 80% is used to train the system intelligence, and 20% is held back to rigorously verify correctness.

stratified_split(ratio=0.2)
About PhishGuard AI

PhishGuard AI is an academic AI cybersecurity project demonstrating email phishing detection using machine learning. The system uses a TF-IDF + Logistic Regression pipeline trained on 18,000+ real-world emails to achieve 98.26% classification accuracy.

Version 1.0 • Built with Flask, scikit-learn • Dataset: Phishing Email Dataset

Reset Session