生成一个脚本,每天打一次,谷歌和记录我们的SERP的位置?脚本、位置、SERP

2023-09-06 07:23:39 作者:聽随風ツ

的需求已经出现了我们的组织,以监视(每天)在那里我们的网站出现(有机和PPC)的页面谷歌的1之内。同时出现的一个主要竞争对手。对于某些关键词。

在眼前短期的一个同事是由人工打谷歌和记下的结果这样做。是的。

它的发生给我们,我们可以(例如,使用C#)来做到这一点写一个脚本。

我知道Analytics(分析)会告诉我们一个可怕的很多,但它并没有注意到竞争对手的位置,再加上我不认为这有我们想要的其他数据。

问题是,是否有这这是否现有的基本工具(免费,我猜)?如果我们把它写自己,从哪里开始以及是否有明显的缺陷,以避免(例如,谷歌可以检测和阻止自动请求?)

编辑: 要回答这些使用谷歌API的建议 - 这个帖子上的谷歌论坛似乎排除这种可能性完全:

     

自定义搜索API需要你建立一个自定义搜索引擎(CSE)   必须设置以搜索特定   网站,而不是整个网络。

  

自定义搜索API服务条款明确禁止你进行   自动查询,这将是关键   以定期,准确   衡量一个网站的搜索结果页面。

        

杰里米·R·Geerdes

解决方案

谷歌实际上做禁止刮无人的互动的搜索结果(见5.3和的这里)。我不主张这样做。他们的状态令人担忧的是,有太多的人在做,这可能会导致问题(有多少搜索词,你会寻找什么?),以及可能的游戏排名本身。

说了这么多,你可能使用API​​做一个搜索结果,并遍历结果,因为我有如下,使用HTML结果。或者,你可以尝试一些可用的服务来帮助你做到这一点:

http://www.googlerankings.com/

(注:我绝不参加与本网站,这只是一个例子)

我相信有很多SEO公司,这也将提供这作为一个服务。我建议探索这些选项进入刮了。

我继续扔在一起,将来自谷歌搜索结果拉的基本信息的快速CS类。该类使用中提到的HTML敏捷性包,微软用于遍历网页,使您可以使用XPath找到你要找的页面中创建一个pretty的极好的工具。在这种情况下,//跨度//举给你的网址,所以本例中使用了。

一键健康打卡脚本

要使用,请执行以下操作:

  GoogleRankScrape.Do(
    谷歌刮,
    C:\\ \\排行榜,
    //跨度//举,
    新的String [] {stackoverflow.com,wikipedia.org,okeydoke.org},
    100
);
 

这可以被包裹成一个CS控制台应用程序,然后使用Windows调度运行控制台程序。还有很多其他的方法,这可能去;这仅仅是一个例子。

在GoogleRankScrape code为以下内容:

 使用系统;
使用System.IO;
使用System.Text;
使用HtmlAgilityPack;

类GoogleRankScrape
{
    公共静态无效DO(查询字符串,字符串DEST,路径字符串,字符串[]匹配,INT深度)
    {
        Directory.SetCurrentDirectory(@dest);

        字符串的URL =htt​​p://www.google.com/search?q=+查询+&放大器;放大器; NUM =+深度;

        字符串RP =rankings.txt;

        DateTime的DT = DateTime.Now;

        字符串DTF =的String.Format({0:U},DT);
        字符串dtfr =的String.Format({0:F},DT);
        DTF = dtf.Replace( - ,);
        DTF = dtf.Replace(,);
        DTF = dtf.Replace(:,);

        字符串WP =页面+ DTF +。html的;
        字符串OP =输出+ DTF +名.txt;

        FileInfo的R =新的FileInfo(RP);
        如果(!File.Exists(rankings.txt))
        {
            StreamWriter的RSW = r.CreateText();
            rsw.Close();
        }

        StreamWriter的RS =新的StreamWriter(r.Name,真正的);

        rs.WriteLine(日期:+ dtfr);
        rs.WriteLine(日期:+ DTF);
        rs.WriteLine(深度:+深度);
        rs.WriteLine(查询:+查询);

        HtmlWeb HW =新HtmlWeb();
        的HTMLDocument D = hw.Load(URL);
        d.Save(WP);

        FileInfo的O =新的FileInfo(OP);
        StreamWriter的OS = o.CreateText();

        的HTMLDocument HD =新的HTMLDocument();
        HD.Load(WP);

        字符串检查=;
        字符串checkblock =;

        VAR SpanCite = HD.DocumentNode.SelectNodes(路径);
        如果(SpanCite!= NULL)
        {
            INT等级= 1;
            的foreach(在SpanCite HtmlNode HN)
            {
                串线=;
                如果(HN.InnerText.ToString()的IndexOf(/)大于0)
                {
                    线路= HN.InnerText.ToString()子串(0,HN.InnerText.ToString()的IndexOf(/)。);
                }
                否则如果(HN.InnerText.ToString()的IndexOf()> 0)
                {
                    线路= HN.InnerText.ToString()子串(0,HN.InnerText.ToString()的IndexOf());
                }
                其他
                {
                    行= HN.InnerText.ToString();
                }
                os.WriteLine(线);
                os.WriteLine(rs.NewLine);

                的for(int i = 0; I< matches.Length;我++)
                {
                    checkblock =[+匹配[我] +];
                    如果(line.Contains(火柴[I])及&安培;!check.Contains(火柴[I]))
                    {
                        rs.WriteLine(等级:+ rank.ToString()+,+比赛[I]);
                        检查+ = checkblock;
                    }
                }

                排名++;
            }

            的for(int i = 0; I< matches.Length;我++)
            {
                checkblock =[+匹配[我] +];
                如果(!check.Contains(火柴[I]))
                {
                    rs.WriteLine(等级:没有排+,+火柴[I]);
                }
            }
        }

        os.Close();

        rs.WriteLine(==========);
        rs.Close();
    }

}
 

The need has arisen within our organisation to monitor (on a daily basis) where our site appears (both organic and PPC) on the page 1 of Google. Also where a key competitor appears. For certain key words.

In the immediate short term a colleague is doing this by hitting Google manually and jotting down the results. Yep.

It occurs to us we can write a script (e.g. using C#) to do this.

I know Analytics will tell us an awful lot but it doesn't note the competitor's position, plus I don't think it has other data we want.

Question is, is there an existing basic tool which does this (for free, I guess)? And if we write it ourselves, where to start and are there obvious pitfalls to avoid (for example can Google detect and block automated requests?)

Edit: To those answers suggesting using the Google API - this post over on Google Groups would appear to rule that out completely:

The Custom Search API requires you to set up a Custom Search Engine (CSE) which must be set to search particular sites rather than the entire web.

The Custom Search API TOS explicitly prohibit you from making automated queries, which would be key to "regularly and accurately" measuring the SERP of a site.

Jeremy R. Geerdes

解决方案

Google actually does prohibit scraping of their search results without "human" interaction (see 5.3, and here). I'm not advocating you do so. The concern they state is that having too many people doing this could cause issues (how many search terms would you look for?), as well as possibly gaming the rankings themselves.

Having said that, you could possibly use the API to do a search result and iterate through the results as I have below, using the html result. Or, you could try some of the services available to help you do this:

http://www.googlerankings.com/

(Note: I am in no way affiliated with this website, it is only an example.)

I am sure there are plenty of SEO companies that would also provide this as a service. I would recommend exploring those options before getting into scraping.

I went ahead and threw together a quick CS class that would pull basic information from a Google search result. This class uses the mentioned HTML Agility Pack, a pretty nifty tool Microsoft created for iterating over web pages that allows you to use XPath to find what you are looking for in the page. In this case, "//span//cite" gives you the url, so this example uses that.

To use, do the following:

GoogleRankScrape.Do(
    "google scraping",
    "C:\\rankings\\",
    "//span//cite",
    new string[] {"stackoverflow.com","wikipedia.org","okeydoke.org"},
    100
);

This could be wrapped into a CS console app and then use the Windows scheduler to run the console app. There are many other ways that this could go; this is only an example.

The GoogleRankScrape code is following:

using System;
using System.IO;
using System.Text;
using HtmlAgilityPack;

class GoogleRankScrape
{
    public static void Do(string query, string dest, string path, string[] matches, int depth)
    {
        Directory.SetCurrentDirectory(@dest);

        string url = "http://www.google.com/search?q=" + query + "&num=" + depth;

        string rp = "rankings.txt";

        DateTime dt = DateTime.Now;

        string dtf = String.Format("{0:u}", dt);
        string dtfr = String.Format("{0:f}", dt);
        dtf = dtf.Replace("-", "");
        dtf = dtf.Replace(" ", "");
        dtf = dtf.Replace(":", "");

        string wp = "page" + dtf + ".html";
        string op = "output" + dtf + ".txt";

        FileInfo r = new FileInfo(rp);
        if (!File.Exists("rankings.txt"))
        {
            StreamWriter rsw = r.CreateText();
            rsw.Close();
        }

        StreamWriter rs = new StreamWriter(r.Name, true);

        rs.WriteLine("Date: " + dtfr);
        rs.WriteLine("Date: " + dtf);
        rs.WriteLine("Depth: " + depth);
        rs.WriteLine("Query: " + query);

        HtmlWeb hw = new HtmlWeb();
        HtmlDocument d = hw.Load(url);
        d.Save(wp);

        FileInfo o = new FileInfo(op);
        StreamWriter os = o.CreateText();

        HtmlDocument HD = new HtmlDocument();
        HD.Load(wp);

        string check = "";
        string checkblock = "";

        var SpanCite = HD.DocumentNode.SelectNodes(path);
        if (SpanCite != null)
        {
            int rank = 1;
            foreach (HtmlNode HN in SpanCite)
            {
                String line = "";
                if (HN.InnerText.ToString().IndexOf("/") > 0)
                {
                    line = HN.InnerText.ToString().Substring(0, HN.InnerText.ToString().IndexOf("/"));
                }
                else if (HN.InnerText.ToString().IndexOf(" ") > 0)
                {
                    line = HN.InnerText.ToString().Substring(0, HN.InnerText.ToString().IndexOf(" "));
                }
                else
                {
                    line = HN.InnerText.ToString();
                }
                os.WriteLine(line);
                os.WriteLine(rs.NewLine);

                for (int i = 0; i < matches.Length; i++)
                {
                    checkblock = "[" + matches[i] + "]";
                    if (line.Contains(matches[i]) && !check.Contains(matches[i]))
                    {
                        rs.WriteLine("Rank: " + rank.ToString() + ", " + matches[i]);
                        check += checkblock;
                    }
                }

                rank++;
            }  

            for (int i = 0; i < matches.Length; i++)
            {
                checkblock = "[" + matches[i] + "]";
                if (!check.Contains(matches[i]))
                {
                    rs.WriteLine("Rank: not ranked" + ", " + matches[i]);
                }
            }
        }

        os.Close();

        rs.WriteLine("==========");
        rs.Close();
    }

}