Getting DDOS:ed by AI bots

Locked forum where the devs and staff post news about the makehuman software

Getting DDOS:ed by AI bots

Postby joepal » Mon May 19, 2025 5:29 am

Right now the site is rather slow. This is because anonymous crawlers (bots that do not identify as bots) keeps hammering every forum topic and single post multiple times per second. From the news I'm guessing these are poorly written AI scrapers.

Blocking them one by one does not seem to work as they only switch IP. And since they don't respect robots.txt nor identify themselves, there's nothing much I can do about it.
Joel Palmius (LinkedIn)
MakeHuman Infrastructure Manager
http://www.palmius.com/joel
joepal
 
Posts: 4630
Joined: Wed Jun 04, 2008 11:20 am

Re: Getting DDOS:ed by AI bots

Postby RobBaer » Mon May 19, 2025 12:59 pm

Here was my Browser's AI solution to AI crawlers:

Including results for what to do with ai scrapers that ignore robots.txt
Search only for what to do with ai scrappers that ignore robots.txt

AI Overview
Learn more
When dealing with AI scrapers that ignore robots.txt, the best approach is to use a combination of methods. You can add specific AI crawlers to your robots.txt file as disallowed user agents. For more advanced AI crawlers, consider implementing "tarpit" strategies, such as creating infinite loops of static files or using dynamic content to waste their resources. You can also monitor your logs for IPs accessing a random, bad link in robots.txt and block those IPs.
Detailed Strategies:
1. Block AI Crawlers in robots.txt:
Identify the specific AI crawler user agents you want to block.
Add them to your robots.txt file using the disallow directive.
For example, User-agent: OpenAI-Bot; would block OpenAI's crawler.
Regularly update your robots.txt with the latest AI crawler user agents.
2. Implement "Tarpits":
Nepenthes and Its Progeny: Create a "maze" of static files with no exit links to waste the crawler's resources.
Iocaine: Use a reverse proxy to trap crawlers in a "garbage" maze.
Nepenthes Quixotic: Create a dynamic "honey pot" with fake pages and links to lure the crawler into an endless loop.
Marko and Markov-tarpit: Use Markov chains to generate a vast, dynamic network of pages with no exit.
3. Monitor Log Files:
Add a random, bad link in your robots.txt file.
Monitor your server logs for requests to that URL.
Identify the IPs associated with those requests, as they likely belong to AI scrapers.
Block those IPs using your server's firewall or configuration.
4. Consider Cloudflare's Bot Management:
Cloudflare offers features to block AI bots and other malicious traffic.
You can enable "Block AI Bots" or use their bot management tools to identify and block misbehaving crawlers.
5. Report Misbehaving Crawlers:
If you encounter an AI crawler that consistently violates your robots.txt file, report it to the respective AI company or platform.
Cloudflare and other bot management services also provide mechanisms for reporting misbehaving crawlers.
Important Considerations:
robots.txt is not a guaranteed solution:
AI crawlers can often bypass robots.txt restrictions, especially those using advanced techniques.
Combine strategies:
The most effective approach is to combine multiple methods, such as blocking in robots.txt, implementing tarpits, and monitoring log files.
Stay updated:
AI crawlers and their techniques are constantly evolving, so it's important to stay informed about new methods and adapt your defenses accordingly.
User avatar
RobBaer
 
Posts: 1236
Joined: Sat Jul 13, 2013 3:30 pm
Location: Kirksville, MO USA

Re: Getting DDOS:ed by AI bots

Postby tomcat » Mon May 19, 2025 1:01 pm

What can be done in a situation like this? What are they trying to accomplish? Would a zip-bomb help?

RobBaer wrote:Here was my Browser's AI solution to AI crawlers:


Use AI against AI? That makes sense. :lol:

Stay updated:
AI crawlers and their techniques are constantly evolving, so it's important to stay informed about new methods and adapt your defenses accordingly.


Unfortunately, it's too time-consuming. :(
Foreigners' reactions to Russian "Bird's Milk" candies
— Are your birds being milked?
— In Russia everyone is milked. Here even the zucchini is used to make caviar.
User avatar
tomcat
 
Posts: 485
Joined: Sun Sep 27, 2015 7:53 pm
Location: Moscow (Orcish Stun), The Aggressive Evil Empire

Re: Getting DDOS:ed by AI bots

Postby Ricardo2020 » Mon May 19, 2025 1:23 pm

AI is a cancer spread by Big Tech firms. We know who they are, and as such, it will be nearly impossible to deal with them since they have links on the home page here at MakeHuman Community. There is a Facebook link in the "More" area. This is an open door which cannot be closed as long as we have a presence there. I get that we might nead that presence, but it is a shame, because therein lies the real problem.

As long as we have roads leading to these cesspools and a presence there, the bots will have free reign on every corner of our domain. Granted, bots will use any method to gain access, but removing the social media crap will go a long way toward at least making the issue more manageable, although it will always be a drain on our time. As the old saying among techs goes, "Facebook - wasting people's time since 2004."

This is why we must not include AI slop in anything we do here. Doing so will certainly ruin everything we are trying to accomplish. The ban hammer must come down on all the rusty bent nails, not just a few of the shiny ones which are easier to hit.
Paddle faster. I hear charango music!
User avatar
Ricardo2020
 
Posts: 274
Joined: Sat Apr 18, 2020 4:17 pm
Location: Tennessee

Re: Getting DDOS:ed by AI bots

Postby joepal » Fri May 23, 2025 5:05 am

In the end I've set up firewall rules to silently drop any traffic originating from a bunch of entire B-class networks (x.x.0.0 / 16). This was needed as the AI bots use a large set of IPs within the range to avoid being easily blocked. I guess this might block some legitimate users too, although my guess the ranges are owned by corporations and not used by actual individuals.

This has caused the situation to improve considerably. But I guess it has to be redone in a month or so.

If the AI crawlers respected robots.txt (which they don't), this would not be a problem.
Joel Palmius (LinkedIn)
MakeHuman Infrastructure Manager
http://www.palmius.com/joel
joepal
 
Posts: 4630
Joined: Wed Jun 04, 2008 11:20 am

Re: Getting DDOS:ed by AI bots

Postby Ricardo2020 » Fri May 23, 2025 3:10 pm

Well, considering that these bots are the result of the AI created by the mega corps mentioned above, the lack of respect for the robots.txt file is not surprising. The mega corps follow the same behavior in that they do not respect the settings you make to your devices. I will give you two examples:

1. Computers

If you run Microsoft products such as Windows, the settings you make are often changed by updates. The only way around this is to use third party shell enhancements such as Open Shell. However, updates will put back the ads and app crap you attempt to remove. It's a game of whack-a-mole.

2. Smart Gadgets

If you run Android, social media is baked into the system. Your per app settings do not matter to these cesspools. They will go behind your back and do as they want, all to engage you so they can reach out and get your eyeballs on the ads that follow you around based on the scrape and sell stuff they do.

So, having a fondleslab, chromebook, or PC with big tech on it will almost always lead to a loss of both control and privacy by the user since you are the product making these morally bankrupt outfits more money than the economies of several countries.

The good news is that one has a choice. If you use open source wares such as Linux on your machine, the bots can be given the heave ho over the gunwales. Also, avoiding smart tech and social media in general is a good idea for all the reasons quoted above. How much of that shite do we really need?

To sum up, the horse will bolt if the barn door is open. The bad actors will flood in.

And so, thank you, Joe, for a job well done. My ban hammer arm was beginning to get really tired and was starting to develop charly horse cramps...
Paddle faster. I hear charango music!
User avatar
Ricardo2020
 
Posts: 274
Joined: Sat Apr 18, 2020 4:17 pm
Location: Tennessee


Return to News from the crew

Who is online

Users browsing this forum: No registered users and 5 guests