Bot and Spam Form Submission Management

What is a bot?

A bot is a software application that is programmed to do certain tasks. Bots are automated, which means they run according to their instructions without a human user needing to manually start them up every time. Bots often imitate or replace a human user's behavior. Typically they do repetitive tasks, and they can do them much faster than human users could. - Cloudflare

Good vs Bad Actors

A good actor is any person or bot which respects instructions provided to them. For example; robots.txt a file which provides instructions to web crawlers(i.e. bots) on which URLs should and should not be crawled. In this scenario the Google web crawler is a good actor. As it respects the rules outlined in a robots.txt file.

Inversely, bad actors do not respect or actively try to circumvent instructions provided to them. For example; if a bot is rate limited, and receives a 429 "Too Many Requests" HTTP response, for making too many requests. It may change the IP address it is sending traffic from to bypass being flagged as reaching a rate limit.

Additionally bad actors often try to take advantage of a system in a way that was not intended. Leading us to our discussion around spam form submissions.

Preventing Spam

At a high level we are going to look at two categories of where spam preventative measures can be implemented: front-end and back-end.

Front-end

These are spam prevention solutions which need front-end(read: browser/client) implementation. Some are exclusive to front-end implementation, others require a back-end portion to complete the solution.

Captcha / reCaptcha

A CAPTCHA test is designed to determine if an online user is really a human and not a bot. - Cloudflare

CAPTCHA is a piece of software(read: code) added to a form which requires a user to pass a test which is designed for humans to be able to easily do so but difficult for bots. If passed the form is submitted.

CAPTCHA can also be implemented as a back-end service which sits between the client making the request and the server receiving it. If the request is suspected as being made by a bot the request is redirected to a CAPTCHA test. If passed the request is allowed to be completed.

reCAPTCHA is a free service from Google which implements CAPTCHA.

Honeypots

Form honeypots are hidden input field(s). They take advantage of bot tendency to arbitrarily fill out all form fields. This allows for inspection of the hidden field, which a human user would not see and therefore not fill out, if the field has a value provided then it is denied as a bot submission. Learn more about implementing honeypot fields on Zesty.io.

JavaScript Placement of Form

Note: This is not a recommended solution. Properly programming this type of solution is complex and requires deep web development experience.

Bots typically do not request and execute JavaScript included on a page. As bots want to run quickly and the additional latency of those steps would be undesirable for them. In these situations using a technique of adding the form after page load can be effective.

For example; using a Zesty.io block to load a form onto a web page.

Back-end

These solutions are exclusive to back-end services. Typically placed between the client request and server processing the request.

WAF(Web Application Firewall)

A WAF is a server(s) specifically designed to filter traffic. They often are configured to do some by IP address, contents of a request or other sophisticated means.

At Zesty.io we operate a WAF in front of our platform services. This allows us to rate limit our services to ensure no one actor can consume more resources than acceptably allowed before affecting our service health.

Additionally we use our WAF to filter out and block traffic patterns which are either known to not be supported or are clearly identified as malicious. We do so in a handful of ways.

  1. Using pre-defined OWASP rule sets. For example; owasp-crs-v030301-id941320-xss which is designed to prevent possible XSS attacks with HTML tags.
  2. Rate limiting or banning requesters which exceed our limit of 10 RPS(requests per second) for more than a minute.
Limitations of the WAF

The Zesty.io WAF can not participate in preventing bot form submissions. This is due to a technical limitation, that we will detail.

IP Addresses and GDPR

IP addresses are a mechanism for identifying devices on a computer network. The client you are reading this from has an IP address. The nature of this identifier puts it under the GDPR(General Data Protection Regulation) Article 4 paragraph 1 guidelines. When network requests are sent to the Zesty.io origin they are made GDPR compliant by removing the original requester's host header containing the user's IP address but augments the request by adding an anonymized fingerprint. When the origin receives the request the host header IP address it receives is that of the CDN(Content Delivery Network) PoP(Point of Presence). Resulting in 2 outcomes;

  1. Zesty.io WAF can not block based on IP address. Because if it did it would block an entire CDN server. Which services multiple clients.
  2. Zesty.io WAF protects the platform from traffic based attacks by fingerprinting clients.

While the Zesty.io WAF protects the platform as a whole it can not participate in bot spam submission prevention by blocking IP addresses.

Bot Management Services

If you have already exhausted the options described, a bot management service may be your best option. These are services which you would place between your website's DNS resolution and the Zesty.io platform.




Two suggested services would be.

These services are specifically designed to manage bot traffic by using sophisticated machine learning algorithms to detect bot traffic then enact preventative measures, configured to meet your specific needs.

Additionally by implementing this service before the traffic reaches the Zesty.io origin it provides you direct control of blocking IP addresses.

In Conclusion

There are a handful of front-end and back-end form spam prevention solutions. We would recommend starting with the front-end solutions. If not seeing the results needed then exploring the back-end solutions. As solutions increase in complexity and expense going from front-end to back-end.