How to Identify and Block Bots in Your Firewall

Integrate Fingerprint Bot Detection with your Web Application Firewall and dynamically block IP addresses linked to past bot visits.

What are bot attacks?

All websites and online applications today face risks from bot attacks. These automated programs can overload servers, scrape content, generate fake user activity, or attempt unauthorized access to sensitive data. Harmful bots can range from simple command-line scripts to full-featured automated web browsers.

On the client side, bot attacks were traditionally mitigated by CAPTCHA-style challenges, which have always been detrimental to user experience. With recent advances in machine learning, some bots can even solve CAPTCHA challenges faster and more accurately than humans.

On the server side, your default Web Application Firewall can offer some level of protection from bots.

What is a Web Application Firewall?

A Web Application Firewall (WAF) protects web applications from various online threats, including bot attacks. It is a software barrier between a web application and the internet, monitoring and filtering HTTP traffic between them. WAFs analyze incoming requests and block or allow traffic based on predefined security rules.

A WAF can detect bots using a handful of simple techniques:

  • Looking for unusual traffic patterns, such as a very large number of requests per second.
  • Looking at the request metadata, such as the User-Agent.
  • Comparing the IP address against a reputation database of IP ranges, countries, and data centers known to host bots.

These rigid rules provide a base level of protection from simple bots. But more sophisticated attackers can mimic human traffic patterns, spoof request metadata, or use proxies to change their IP address.

What is Fingerprint Bot Detection?

Fingerprint provides a robust client-side Bot Detection solution. It runs in the browser, collecting vast amounts of data that bots leak (errors, network overrides, browser attribute inconsistencies, API changes, and more) to reliably distinguish real users from headless browsers, automation tools, their derivatives, and plugins. We have covered how to use Fingerprint Bot Detection to protect data endpoints accessible from your website in the Content scraping prevention tutorial.

Flight search website protected from scraping using Fingerprint Bot Detection

  • Fingerprint Bot Detection is much better than a WAF at detecting sophisticated bots, such as headless browsers. However, it relies on collecting signals from the browser. The bot is still able to load the page before it is detected.
  • A WAF, working at a server level, has limited bot detection capabilities. But it can block bots sooner — before they even load the page.

In this article, you will explore how to integrate Fingerprint Bot Detection into your application firewall. By dynamically blocking IP addresses linked to bot visits, you can leverage the strengths of both Fingerprint and WAF to fortify your application against bot attacks.

Integrating Fingerprint Bot Detection with your Web Application Firewall

The Content scraping prevention tutorial already covered the following steps to detect bots on the client side and deny them access to your data:

  1. Sign up for Fingerprint Pro.
  2. Install the JavaScript agent on your website.
  3. Identify each visitor and send the corresponding identification request ID to the server.
  4. Use the request ID to retrieve and validate the full identification event from the Server API.
  5. Return the requested data or deny the request based on the Bot Detection result.

Please refer to the full article for a detailed explanation of each step.

This tutorial will build on the existing functionality and take things to the next level. First, we will save the IP address of each detected bot to a database. Then, we will build a simple dashboard where you can monitor your bot visits and manually block the associated IP addresses.

Dashboard showing a table of detected bot visits

You can try the live final result on our website. The example uses Next.js, but the same principles apply to any web application. The code snippets in the article are simplified for readability, but you can find the full source code on GitHub.

1. Save the IP address of each detected bot

Use the Server API to get the Bot Detection result. You can use one of our Server SDKs:

import { 
  FingerprintJsServerApiClient 
} from '@fingerprintjs/fingerprintjs-pro-server-api';

// Get the Bot Detection result from the Fingerprint Server API
// using the `requestId` from the Fingerprint JS Agent
const client = new FingerprintJsServerApiClient({ apiKey: SERVER_API_KEY })
const eventResponse = await client.getEvent(requestId);
botDetection = eventResponse.products?.botd?.data;

The botDetection result has the following information available:

{
  "bot": {
    "result": "bad" // or "good" or "notDetected",
      "type": "headlessChrome"
    },
  "userAgent": "Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) HeadlessChrome/110.0.5481.177 Safari/537.36",
  "url": "https://yourdomain.com/search",
  "ip": "61.127.217.15",
  "time": "2023-09-08T16:43:23.241Z",
  "requestId": "1234557403227.AbclEC",
}

A "good" bot might be a search engine crawler or a monitoring tool. A "bad" bot means a headless browser or automation tool is accessing the website. In that case, deny the data request and save the bot visit to a database:

// Determine if a bad bot was detected 
if (botDetection?.bot.result === "bad") {
  // Deny data request
  res.status(403).json({
    message: "Malicious bot detected, scraping this data is not allowed.",
  });

  // Save the bot visit to your database
  BotVisitDbModel.create({
    ip: botData.ip,
    requestId: botData.requestId,
    timestamp: botData.time,
    botResult: botData.bot.result,
    botType: botData.bot.type,
  })
  return;
}

2. Display detected bot visits in a dashboard (optional)

You can build an internal dashboard that shows all detected bot visits and allows you to manually block the associated IP addresses. Alternatively, you could automatically block bot IPs the moment they are detected (jump to Step 3 if you prefer).

First, build an API endpoint for retrieving bot visits:

// src/api/bot-firewall/get-bot-visits.ts
export default async function handler(_req, res,) {
  const botVisits = await BotVisitDbModel.findAll({
    order: [['timestamp', 'DESC']],
  });
  res.status(200).json(botVisits);
}

Use the endpoint to display a table of bot visits in a table:

import { useMutation, useQuery } from 'react-query';

export default function BotVisitsPage() {
  const { data: botVisits } = useQuery('get-bot-visits', () =>
    fetch('/api/bot-firewall/get-bot-visits').then((res) => res.json()),
  );

  const { mutate: blockIp } = useMutation('block-ip', async (ip) =>
    fetch('/api/bot-firewall/block-ip', {
      method: 'POST',
      headers: {
        'Content-Type': 'application/json',
      },
      body: JSON.stringify({ ip }),
    }).then((res) => res.json()),
  );

  return (
    <table>
      <thead>
        <tr>
          <th>Timestamp</th>
          <th>Bot Type</th>
          <th>IP Address</th>
          <th>Action</th>
        </tr>
      </thead>
      <tbody>
        {botVisits?.map((botVisit) => {
          return (
            <tr key={botVisit.requestId}>
              <td>{botVisit.timestamp}</td>
              <td>
                {botVisit.botResult} ({botVisit.botType})
              </td>
              <td>{botVisit.ip}</td>
              <td>
                <button onClick={() => blockIp(botVisit.ip)}>Block this IP</button>
              </td>
            </tr>
          );
        })}
      </tbody>
    </table>
  );
}

3. Block bot IPs in your web application firewall

In this example, we are proxying the website through Cloudflare, using that as the web application firewall, and updating a custom ruleset via the Cloudflare API. The section below assumes basic familiarity with Cloudflare, but this approach applies to any WAF solution with API-editable access rules.

// src/api/bot-firewall/block-ip.ts
export default async function blockIp(req, res) {
  const { ip } = req.body;
  // Save newly blocked IP to the list of currently blocked IPs
  await BlockedIpDbModel.upsert({
    ip,
    timestamp: new Date().toISOString(),
  });

  // Get the updated list of all blocked IPs
  const blockedIps = (
    await BlockedIpDbModel.findAll({
      order: [['timestamp', 'DESC']],
    })
  ).map((ip) => ip.ip);

  // Construct new firewall rules from the blocked IP database
  const newRules = await buildFirewallRules(blockedIps);

  // Apply new firewall rules to your Cloudflare application
  await updateRulesetUsingCloudflareAPI(newRules);

  // Return success
  return res.status(200).json({ result: 'success' });
}

Note: In this example, we block IP addresses directly in the Cloudflare Custom Rules. We use our own database to keep track of currently blocked IPs and update the entire ruleset on every change. Alternatively, you could use Lists to store blocked IPs and update just that, bearing in mind the List's limitations around IPv6.

Cloudflare rule expressions are limited to 4096 characters, so you can fit a maximum of 84 IPv6 addresses per rule. The buildFirewallRules function splits the list of blocked IPs into multiple rules if necessary.

The free Cloudflare plan allows up to five custom rules, with more rules available on higher plans. These limitations will vary depending on your WAF provider, plan, and the chosen blocking approach (Cloudflare Lists can accommodate more IP addresses, for example).

// Cloudflare rule expressions are limited to 4096 characters (fits 84 IP addresses per rule), 
// split the IP list into multiple rules if necessary
const MAX_IPS_PER_RULE = 84;
const MAX_RULES = 5;
const MAX_BLOCKED_IPS = MAX_IPS_PER_RULE * MAX_RULES; // 420

export const buildFirewallRules = async (
  // Already assumed to be unique IP addresses due to how the database is set up
  blockedIps,
  maxIpsPerRule = MAX_IPS_PER_RULE,
): Promise<CloudflareRule[]> => {
  // Split the list of blocked IPs into chunks of MAX_IPS_PER_RULE length
  const chunks = _.chunk(blockedIps, maxIpsPerRule);

  // Build the rule expression for each chunk
  const ruleExpressions = chunks.map((chunk) => {
    const ipList = chunk.map((ip) => `"${ip}"`).join(' ');
    return `http.x_forwarded_for in {${ipList}}`;
  });

  // Build a rule from each rule expression
  const rules: CloudflareRule[] = ruleExpressions.map((expression, index) => ({
    action: 'block',
    description: `Block Bot IP addresses #${index + 1}`,
    expression,
  }));

  return rules;
};

Finally, use the Cloudflare API, to update your custom ruleset with the rules compiled above. You are going to need:

  • Your Cloudflare API token (create one in the Cloudflare dashboard)
  • Your Cloudflare zone ID (find it in the Cloudflare dashboard)
  • Your Custom ruleset ID (find it using the Cloudflare API)
async function updateRulesetUsingCloudflareAPI(rules: CloudflareRule[]) {
  const apiToken = process.env.CLOUDFLARE_API_TOKEN ?? '';
  const zoneId = process.env.CLOUDFLARE_ZONE_ID ?? '';
  const customRulesetId = process.env.CLOUDFLARE_RULESET_ID ?? '';

  const url = `https://api.cloudflare.com/client/v4/zones/${zoneId}/rulesets/${customRulesetId}`;
  const options = {
    method: 'PUT',
    headers: {
      'Content-Type': 'application/json',
      Authorization: `Bearer ${apiToken}`,
    },
    body: JSON.stringify({
      description: 'Custom ruleset for blocking Fingerprint-detected bot IPs',
      kind: 'root',
      name: 'default',
      phase: 'http_request_firewall_custom',
      rules,
    }),
  };

  const response = await fetch(url, options);
  if (!response.ok) {
    console.error(response.statusText, await response.json());
    throw new Error('Updating firewall ruleset failed', { cause: response.statusText });
  }

  return await response.json();
}

4. Unblock IP addresses

Unblocking an IP address works the same as blocking one, only delete the IP from the table of blocked IPs instead of adding it. Then, update the Cloudflare ruleset as before.

export default async function unblockIp(req, res) {
  const { ip } = req.body;
  // Delete the unblocked IP from the list of currently blocked IPs
  BlockedIpDbModel.destroy({
    where: {
      ip,
    },
  });

  // Update the Cloudflare ruleset
  const blockedIps = await getBlockedIps();
  const newRules = await buildFirewallRules(blockedIps);
  await updateRulesetUsingCloudflareAPI(newRules);
  return res.status(200).json({ result: 'success' });
}

5. Define a time limit for blocking IP addresses

You might want to block IP addresses only for a limited time, and then unblock them automatically. Cloudflare does not support defining a time-to-live on its custom rules. But you can create a simple cron job that runs periodically, deletes expired IP blocks, and updates the Cloudflare ruleset:

import { syncFirewallRuleset } from '../src/server/botd-firewall/cloudflareApiHelper';
import { schedule } from 'node-cron';

// Run every 5 minutes
schedule('*/5 * * * *', () => {
  deleteOldIpBlocks();
});

// 1 hour
const IP_BLOCK_TIME_TO_LIVE_MS = 1000 * 60 * 60;

async function deleteOldIpBlocks() {
  // Remove expired IP blocks
  await BlockedIpDbModel.destroy({
    where: {
      timestamp: {
        [Op.lt]: new Date(Date.now() - IP_BLOCK_TIME_TO_LIVE_MS).toISOString(),
      },
    },
  });

  // Update the Cloudflare ruleset
  const blockedIps = await getBlockedIps();
  const newRules = await buildFirewallRules(blockedIps);
  await updateRulesetUsingCloudflareAPI(newRules);
}

Explore the Bot Firewall Demo

We built a fully open-source Bot Firewall demo to demonstrate the concepts above. Try it to see how you can use Fingerprint Bot Detection in combination with your web application firewall to better protect yourself from malicious bots.

To prevent users from interfering with other people's demo experience, the demo only allows you to block your own IP. To try it out, visit the web scraping demo as a bot using a locally running browser automation tool like Puppeteer or Playwright.

For example, assuming you already have Node and NPM installed, you can visit the page using Playwright:

  1. Run mkdir bot-firewall-test && cd bot-firewall-test.
  2. Run npm init -y.
  3. Run npm install playwright.
  4. Run npx playwright install.
  5. Create an index.js file like below. Note that the bot must spend enough time on the page for the Fingerprint JS agent to load and identify it.
const playwright = require("playwright");

(async () => {
  const browser = await playwright["chromium"].launch();
  const context = await browser.newContext();
  const page = await context.newPage();
  await page.goto("https://demo.fingerprint.com/web-scraping");
  await page.waitForTimeout(3000);
  console.log(await page.getByRole("heading").first().textContent());
  await browser.close();
})();
  1. Run node index.js. Your bot visit will be saved to the database.
  2. Open the demo to see your bot visit.
  3. Click Block this IP.
  4. Run node index.js again or just visit the web scraping page using your regular browser.

The bot IP will be blocked from loading the page completely.

Note: If you use iCloud Private Relay, your Safari will have a different IP address than your local bot, but the bot itself is still blocked.

Access to the web scraping page blocked by Cloudflare

Feel free to jump into the GitHub repo to see the full source code. If you have any questions or want to learn more about Fingerprint Bot Detection, you can join our Discord server, get in touch with our sales team, or start a 14-day free trial.

FAQ

What is a bot detection?

Bot detection distinguishes between automated tools or headless browsers and real human visitors.

How do you detect bot traffic?

You can detect bot traffic by analyzing the behavior of a visitor, such as their speed to do certain activities, as well as by looking at browser attribute inconsistencies, API changes, and unusual traffic patterns.

Can you block IPs with a Web Application Firewall?

Yes, once you have the IP address of a known bot or bad actor, you can configure your Web Application Firewall (WAF) to block requests from that IP.

Share this post