DataflySignal

About the Datafly Signal Scanner

This page explains what the Datafly Signal Scanner does, how to identify it, and how to block it if you would prefer we did not visit your site.

What the scanner does

When someone submits a URL at scan.dataflysignal.com, our scanner loads that page and a small sample of internal pages (discovered from the site’s sitemap) in a headless Chromium browser. It records the network requests made by third-party tags, measures page weight and load time, and produces a report describing the tag stack. The scanner does not submit forms, log in, or store cookies, localStorage, or page body content.

How to identify us

Every request the scanner makes carries this HTTP header:

X-Datafly-Signal-Scanner: 1 (+https://scan.dataflysignal.com/bot)

This is the identification mechanism recommended by Cloudflare, Datadome, and other bot-management providers for “good bots”. The scanner presents a current Chrome user-agent so that content-level bot checks serve it the real page rather than a stripped shell, but the identifying header makes it trivially easy to allow or block us at the edge.

How to block us

At your CDN or WAF, match requests where the X-Datafly-Signal-Scanner header is present and deny them. Examples:

Our crawler does not honour robots.txt because the URLs we visit are explicitly submitted. If you would prefer we did not visit your site regardless, email hello@dataflysignal.com and we will add your hostname to our internal block list.

Request volume

Each submission triggers a small set of page loads, sampled from the site’s sitemap (up to around 40 pages per scan). Submissions are rate limited to three per IP per hour and ten per email address per day.