您现在的位置是：网站首页>服务器>robots.txt 介绍和基本语法服务器

robots.txt 介绍和基本语法

admin【服务器】27人已围观

	介绍 

robots是网站跟爬虫间的协议，用简单直接的txt格式文本方式告诉对应的爬虫被允许的权限，也就是说robots.txt是搜索引擎中访问网站的时候要查看的第一个文件。

	语法教程 

	用几个最常见的情况，直接举例说明： 

	1. 允许所有SE收录本站：robots.txt为空就可以，什么都不要写。 

	2. 禁止所有SE收录网站的某些目录： 

	User-agent: * 

	Disallow: /目录名1/ 

	Disallow: /目录名2/ 

	Disallow: /目录名3/ 

	3. 禁止某个SE收录本站，例如禁止百度： 

	User-agent: Baiduspider 

	Disallow: / 

	4. 禁止所有SE收录本站： 

	User-agent: * 

	Disallow: / 

	5. 加入sitemap.xml路径,例如： 

	Sitemap: https://www.xxxx.com/sitemap.xml 

	搜索引擎的介绍 

	User-agent: 定义搜索引擎的类型 

	Disallow: 定义禁止搜索引擎收录的地址 

Allow: 定义允许搜索引擎收录的地址

各大搜索引擎

google蜘蛛：googlebot

	百度蜘蛛：baiduspider 

	yahoo蜘蛛：slurp 

	alexa蜘蛛：ia_archiver 

	msn蜘蛛：msnbot 

	altavista蜘蛛：scooter 

	lycos蜘蛛：lycos_spider_(t-rex) 

	alltheweb蜘蛛：fast-webcrawler 

	inktomi蜘蛛： slurprobots.txt文件的写法