阻止所有机器人/爬虫/蜘蛛与htaccess的一个特殊的目录爬虫、机器人、蜘蛛、特殊

2023-09-02 00:21:33 作者:萌二代

我试图阻止所有机器人/爬虫/蜘蛛为一个特殊的目录。我怎样才能做到这一点与的htaccess ?我搜索了一点,找到了解决方案通过基于用户代理阻止:

 的RewriteCond%{HTTP_USER_AGENT} Googlebot的
 

现在,我需要更多的用户代理(适用于所有已知的机器人)和规则应该是仅适用于我的单独的目录。我已经一个robots.txt,但不是所有的爬虫来看看它......阻止通过IP地址是不是一种选择。还是有其他的解决方案?我知道密码保护,但我要问的第一,如果这将是一种选择。然而,我寻找基于用户剂的溶液

解决方案

您需要启用mod_rewrite的。把它在.htaccess的文件夹中。如果放在其他地方(如父文件夹),然后重写规则的模式需要进行小幅修改,包括该文件夹的名称)。

  RewriteEngine叙述上

的RewriteCond%{HTTP_USER_AGENT}(Googlebot的| bingbot | Baiduspider)[NC]
重写规则。*  -  [R = 403,L]
 
你知道搜索引擎都是如何工作的吗

在我已经进入了只有极少数的机器人 - 你添加任何其他的自己(字母不区分大小写) 在此规则将响应403禁止访问的结果code对于这样的请求。您可以更改为另一个响应HTTP code,如果你真的想要(403是最适合在这里考虑您的要求)。

I'm trying to block all bots/crawlers/spiders for a special directory. How can I do that with htaccess? I searched a little bit and found a solution by blocking based on the user agent:

RewriteCond %{HTTP_USER_AGENT} googlebot

Now I would need more user agents (for all bots known) and the rule should be only valid for my separate directory. I have already a robots.txt but not all crawlers take a look at it ... Blocking by IP address is not an option. Or are there other solutions? I know the password protection but I have to ask first if this would be an option. Nevertheless, I look for a solution based on the user agent.

解决方案

You need to have mod_rewrite enabled. Placed it in .htaccess in that folder. If placed elsewhere (e.g. parent folder) then RewriteRule pattern need to be slightly modified to include that folder name).

RewriteEngine On

RewriteCond %{HTTP_USER_AGENT} (googlebot|bingbot|Baiduspider) [NC]
RewriteRule .* - [R=403,L]

I have entered only few bots -- you add any other yourself (letter case does not matter). This rule will respond with "403 Access Forbidden" result code for such requests. You can change to another response HTTP code if you really want (403 is most appropriate here considering your requirements).