> ## Documentation Index
> Fetch the complete documentation index at: https://docs.moderationapi.com/llms.txt
> Use this file to discover all available pages before exploring further.

# URL Risk

> Real-time risk scoring for URLs in user-generated content. Catch phishing, malware, brand impersonation, and credential-harvesting links before they reach your users.

When URL Risk is on, you don't have to pass URLs separately. Anything that looks like a link in the submitted text gets pulled out and scored. Each URL goes through threat-intel feeds and a model that's seen a lot of phishing infrastructure. The response gives you a risk score and a handful of reason codes per URL.

This page documents those fields and how to interpret them.

## Fields

```json theme={"theme":"nord"}
{
  "url": "https://secure-paypal-verify.xyz/account",
  "risk_score": 0.98,
  "reasons": ["brand_impersonation", "suspicious_keywords", "high_risk_tld"],
  "signals": {
    "brand_impersonation": {"brand": "paypal", "method": "registered_domain_token"},
    "has_suspicious_characters": false,
    "is_link_shortener": false,
    "domain_age_days": null,
    "has_email_setup": null,
    "redirect_count": null,
    "final_url": null,
    "bot_protection": null,
    "is_reported": false
  }
}
```

| Field        | Type             | Meaning                                                                                                                                                                   |
| ------------ | ---------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `url`        | string           | The URL that was evaluated.                                                                                                                                               |
| `risk_score` | number (0.0–1.0) | Risk score. Higher is riskier. Scores at or above `0.5` are treated as malicious by default; you can apply a stricter or looser cutoff for your use case.                 |
| `reasons`    | string\[]        | Stable codes explaining *why* the URL looks risky. Empty when the URL is clean. A list of reasons means something actually flagged, not a full audit of what was checked. |
| `signals`    | object           | Observable properties of the URL, described below.                                                                                                                        |

### Signals

Observable properties of the URL. The shape is consistent on every request. Fields that aren't applicable or weren't checked come back as `null`.

| Field                       | Type                      | Meaning                                                                                                                                                                                                                                                           | Null when                                                                       |
| --------------------------- | ------------------------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------- |
| `brand_impersonation`       | `{brand, method}` \| null | A well-known brand name appears in the URL in a way that doesn't match its legitimate domain, e.g. `paypal-verify.xyz` or `paypal.evil.com`. `brand` is the impersonated brand (e.g. `"paypal"`); `method` is `"registered_domain_token"` or `"subdomain_token"`. | No brand match detected.                                                        |
| `has_suspicious_characters` | boolean                   | Punycode, Unicode lookalike characters, or an unusual ratio of special characters (classic typosquatting and homograph-attack indicators). Flagged if *any* URL in the redirect chain exhibits this.                                                              | Always populated.                                                               |
| `is_link_shortener`         | boolean                   | A known shortener is used anywhere in the redirect chain. This includes first-party (`youtu.be`, `lnkd.in`) and third-party (`bit.ly`, `tinyurl.com`, and others).                                                                                                | Always populated.                                                               |
| `domain_age_days`           | integer \| null           | How many days ago the destination's domain was registered. Freshly registered domains (under 30 days old) are disproportionately used for phishing. Describes the registered domain, not the subdomain.                                                           | The signal isn't informative for this URL, or wasn't needed to reach a verdict. |
| `has_email_setup`           | boolean \| null           | Whether the destination's domain is set up to receive email. Legitimate businesses almost always are; throwaway phishing domains often aren't. Describes the registered domain.                                                                                   | Not needed to reach a verdict.                                                  |
| `redirect_count`            | integer \| null           | Number of redirect hops from the submitted URL to its final destination. `0` means no redirect.                                                                                                                                                                   | Not needed to reach a verdict.                                                  |
| `final_url`                 | string \| null            | The final URL reached after following redirects. Equal to the submitted URL when there's no redirect.                                                                                                                                                             | Not needed to reach a verdict.                                                  |
| `bot_protection`            | boolean \| null           | Whether the destination sits behind a bot challenge or web application firewall. When `true`, some destination-describing signals may be `null` because we can't see past the challenge.                                                                          | Not needed to reach a verdict.                                                  |
| `is_reported`               | boolean                   | The submitted URL matches one of our threat-intelligence feeds. Stays `false` if a redirect destination is reported but the submitted URL itself isn't.                                                                                                           | Always populated.                                                               |

<Note>
  Not every URL is analyzed in full depth. URLs that are clearly clean or clearly malicious from the string alone get a fast verdict, and the network-level signals (`domain_age_days`, `has_email_setup`, `redirect_count`, `final_url`, `bot_protection`) come back `null`. Treat `null` as **"not checked,"** not "not present."
</Note>

#### How signals describe redirect chains

When a URL redirects across domains (e.g. a shortener resolving to a landing page), signals are assembled from both the submitted URL and the final URL:

* **Describe the destination** (where the user ends up): `brand_impersonation`, `domain_age_days`, `has_email_setup`, `bot_protection`
* **Describe the submitted URL** (what was sent): `redirect_count`, `final_url`, `is_reported`
* **Either URL exhibiting the trait**: `is_link_shortener`, `has_suspicious_characters`

Same-domain redirects (`http://` → `https://`, trailing-slash canonicalization) don't trigger re-analysis.

### Reason codes

`reasons` is an ordered list of stable codes explaining why the URL looks risky. Codes only appear when a signal or rule actually attributed risk to this URL. A field being *present* is not enough; it has to have *driven* the score. Benign URLs return `reasons: []`.

| Code                        | Aligns with signal                  | What it means                                                                                                                                                          |
| --------------------------- | ----------------------------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------- |
| `blocklisted`               | None                                | The URL's registered domain matched your blocklist. Verdict comes from configuration, not from analysis.                                                               |
| `allowlisted`               | None                                | The URL's registered domain matched your allowlist. Verdict comes from configuration, not from analysis.                                                               |
| `brand_impersonation`       | `signals.brand_impersonation`       | A brand name is used in the domain or subdomain in a way that doesn't match its legitimate home.                                                                       |
| `has_suspicious_characters` | `signals.has_suspicious_characters` | Punycode, Unicode lookalikes, or an unusual special-character ratio.                                                                                                   |
| `is_link_shortener`         | `signals.is_link_shortener`         | The URL uses a shortener and that pattern contributed to the risk score.                                                                                               |
| `is_reported`               | `signals.is_reported`               | The URL is on one of our threat-intelligence feeds.                                                                                                                    |
| `new_domain`                | `signals.domain_age_days`           | The destination domain was registered recently and that freshness drove up risk.                                                                                       |
| `missing_email_setup`       | `signals.has_email_setup`           | The destination isn't set up for email, a common characteristic of throwaway phishing domains.                                                                         |
| `high_risk_tld`             | None                                | Top-level domain with disproportionate phishing prevalence.                                                                                                            |
| `suspicious_keywords`       | None                                | URL contains phishing keywords such as `login`, `verify`, `account`, `password`, `secure`.                                                                             |
| `suspicious_url_structure`  | None                                | Structural red flags: `@` in the authority, `//` redirect trick, IP address as host, URL embedded in path, credential-collecting query parameters, and similar tricks. |
| `ssl_invalid`               | None                                | The destination's SSL certificate failed to validate.                                                                                                                  |

Reasons only describe what *increased* risk. You will not see `has_email_setup` as a reason. It's the *absence* of email setup that's concerning, and that surfaces as `missing_email_setup`.

## Allowlists and blocklists

You can configure per-tenant allowlists and blocklists in the dashboard. These are applied before the risk model runs:

* A **blocklist** hit returns `risk_score: 1` and `reasons: ["blocklisted"]`. No `signals` are returned. The verdict comes from your configuration, not from analysis of the URL.
* An **allowlist** hit returns `risk_score: 0` and `reasons: ["allowlisted"]`. Also no `signals`.
* Everything else flows through the risk model.

If a domain is on both lists, the blocklist wins.

### Domain-level matching

Entries match at the **registered domain** level. Subdomains are *not* matched automatically. To allow every subdomain of your service, add each one explicitly.

Given an allowlist entry of `example.com`:

| URL                              | Matches?                                  |
| -------------------------------- | ----------------------------------------- |
| `https://example.com/page`       | Yes                                       |
| `https://www.example.com/page`   | Yes (`www` is normalized away)            |
| `https://login.example.com/page` | No, add `login.example.com` explicitly    |
| `https://api.prod.example.com/`  | No, add `api.prod.example.com` explicitly |
| `https://example.com.evil.xyz/`  | No, the registered domain is `evil.xyz`   |

Matching is case-insensitive. Enter plain domain strings: no scheme, no path, no wildcards. Internationalized domains should be entered in their punycode form (`xn--...`).

## FAQ

<AccordionGroup>
  <Accordion title="Why does the score for the same URL change over time?">
    Risk is a moving target. Several inputs change between requests:

    * **Domains age.** A freshly registered domain looks risky today and less risky in six months. `domain_age_days` grows naturally.
    * **Email infrastructure gets added.** Legitimate businesses set up MX, SPF, and DMARC records as they grow up; throwaway domains rarely do. `has_email_setup` can flip from `false` to `true` as a domain matures.
    * **Threat-intelligence feeds update constantly.** A URL not on any feed today may be reported tomorrow.
    * **Redirect destinations change.** Shorteners and redirectors can be repointed at any time. The destination is re-resolved on every request.
    * **The model is updated** as the threat landscape shifts.

    If you're caching scores, cache them briefly. Re-evaluate any URL still in active circulation rather than relying on a result that's hours or days old.
  </Accordion>

  <Accordion title="What threshold should I use?">
    `risk_score >= 0.5` is the default cutoff for "treat as malicious," and it's tuned so the rate of false positives at that threshold is low across typical user-generated content. Tighten it (e.g. `0.7`) if your audience is unusually tolerant of risky links, or loosen it (e.g. `0.3`) if you'd rather over-block. The `reasons` array gives you the *why* in either direction.
  </Accordion>

  <Accordion title="A legitimate URL of mine is being flagged. What do I do?">
    Add it to your **allowlist**. Allowlist entries override the risk model. This is the right tool for your own product domains, trusted partners, and URLs you've manually verified as safe.

    If you think the score is wrong in a way that would also affect other customers (for example, a brand-impersonation false positive on a legitimate brand variant), let us know and we'll look at the model.
  </Accordion>
</AccordionGroup>
