.htaccess中使用托管现场测试现场、测试、htaccess

2023-09-02 10:08:19 作者:初暖未至夏已凉

我会使用主机进行现场测试,但我想避免和prevent搜索引擎索引。 例如内的public_html(服务器的目录结构): _private _bin _cnf _log _ ...(更多默认的目录主机) testpublic CSS 图片 index.html的 我想 index.html的是visibile给大家和所有其他目录(除了testpublic)被隐藏,保护访问和搜索引擎不要索引。

I would use the hosting for live testing, but I want to protect access and prevent search engine indexing. For example (server directory structure) within public_html: _private _bin _cnf _log _ ... (more default directories hosting) testpublic css images index.html I want index.html is visibile to everyone and all other directories (except "testpublic") are hidden, protected access and search engines not to index.

目录testpublic我想这是公开,但可能不会在搜索引擎索引,不知道这是可能的。

The directory "testpublic" I wish it was public but may not be indexed in search engines, not sure if this is possible.

要明白,我需要2个文件的.htaccess。 在的public_html等具体的testpublic。

To do understand that I need 2 files .htaccess. One general in "public_html" and other specific for "testpublic".

在一般的.htaccess(的public_html),我想应该是这样的: 的AuthUserFile /home/folder../.htpasswd AuthName指令考验! 与AuthType基本 需要用户为admin123 < FilesMatch的index.html> 满足任何 < / FilesMatch> 谁能帮我创建具有相应属性的文件吗?谢谢!

The .htaccess general (public_html) I think it should be something like: AuthUserFile /home/folder../.htpasswd AuthName "test!" AuthType Basic require user admin123 < FilesMatch "index.html"> Satisfy Any < / FilesMatch> Can anyone help me create the files with the appropriate properties? Thank you!

推荐答案

您可以在您的根文件夹使用robots.txt文件。所有的标准,遵守机器人会服从这个文件,而不是索引文件和文件夹。

You can use a robots.txt file in your root folder. All standards-abiding robots will obey this file and not index your files and folders.

例的robots.txt告诉所有(*),爬虫继续前进和索引什么。

Example Robots.txt that tells all (*) crawlers to move on and index nothing.

User-agent: *
Disallow: /

您可以使用.htaccess文件来微调你的服务器(假设的Apache)提供了哪些目录索引是可见的。在这种情况下,您将添加

You could use .htaccess files to fine tune what your server (assuming Apache) serves out and what directory indexes are visible. In which case you would add

IndexIgnore *

要你的.htaccess文件来禁止索引。

To your .htaccess file to disallow indexes.

更新(感谢http://stackoverflow.com/users/1714715/samuel-cook):

如果您要明确停止BOT /履带式和知道它的用户代理字符串,你可以这样做你的的.htaccess

If you want to specifically stop a bot/crawler and know its USER AGENT string you can do so in your .htaccess

<IfModule mod_rewrite.c>
  RewriteEngine on
  RewriteCond %{HTTP_USER_AGENT} Googlebot
  RewriteRule ^.* - [F,L]
</IfModule> 

希望这有助于。

Hope this helps.