我是从一个.txt刮的目的加载数据。然而,URL需要我打破可变起来,做+/- 2到它。例如,如果该值是2342,我需要创建2340和2344的URL的目的。
我参加了一个猜测如何打破它:
$ {ARGS} birth_year =($ {ARGS} birth_year - 2)。 ' - '。 ($ ARGS {birth_year} + 2);
我怎么然后把它放在网址是什么?
下面是的code中的相关部分:
使用严格的;
使用警告;
使用WWW ::机械化:: Firefox的;
使用数据::自卸车;
使用LWP :: UserAgent的;
使用JSON;
使用CGI QW /越狱/;
使用HTML :: DOM;
开放式(我$ L,'locations2.txt)或死无法打开地点:$!;
而(我的$行=< $ L>){
终日啃食$线;
我的%ARGS;
@args {QW /给定名称姓birth_place birth_year性别种族/} =分流/,/,$线;
$ ARGS {birth_year} =($ {ARGS} birth_year - 2)。 ' - '。 ($ ARGS {birth_year} + 2);
我的$机甲= WWW ::机械化:: Firefox->新建(创建=→1,激活=> 1);
$mech->get("https://familysearch.org/search/collection/index#count=20&query=%2Bgivenname%3A$args{givenname}20%2Bsurname%3A$args{surname}20%2Bbirth_place%3A$args{birth_place}%20%2Bbirth_year%3A1910-1914~%20%2Bgender%3A$args{gender}20%2Brace%3A$args{race}&collection_id=2000219");
输入为:的
本杰明,Schuvlein,德国,1912年,男,白
想要的网址为:的
https://familysearch.org/search/collection/index#count=20&query=%2Bgivenname%3ABenjamin%20%2Bsurname%3ASchuvlein%20%2Bbirth_place%3AGermany%20%2Bbirth_year%3A1910-1914~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219
解决方案为什么你就不能改变这一行:
$mech->get("https://familysearch.org/search/collection/index#count=20&query=%2Bgivenname%3A$args{givenname}20%2Bsurname%3A$args{surname}20%2Bbirth_place%3A$args{birth_place}%20%2Bbirth_year%3A1910-1914~%20%2Bgender%3A$args{gender}20%2Brace%3A$args{race}&collection_id=2000219");
这样:
$mech->get("https://familysearch.org/search/collection/index#count=20&query=%2Bgivenname%3A$args{givenname}20%2Bsurname%3A$args{surname}20%2Bbirth_place%3A$args{birth_place}%20%2Bbirth_year%3A$args(birth_year)~%20%2Bgender%3A$args{gender}20%2Brace%3A$args{race}&collection_id=2000219");注意:我改变了这一点:
%3A1910-1914〜20%
这样:
%3A $ ARG(birth_year)〜20%
I'm loading data from a .txt for the purposes of scraping. However, the URL requires that I break that variable up and do +/- 2 to it. For example, if the value is 2342, I need to create 2340 and 2344 for the purposes of the URL.
I took a guess at how to break it up:
$args{birth_year} = ($args{birth_year} - 2) . '-' . ($args{birth_year} + 2);
How do I then put it in the URL?
Here's the relevant part of the code:
use strict;
use warnings;
use WWW::Mechanize::Firefox;
use Data::Dumper;
use LWP::UserAgent;
use JSON;
use CGI qw/escape/;
use HTML::DOM;
open(my $l, 'locations2.txt') or die "Can't open locations: $!";
while (my $line = <$l>) {
chomp $line;
my %args;
@args{qw/givenname surname birth_place birth_year gender race/} = split /,/, $line;
$args{birth_year} = ($args{birth_year} - 2) . '-' . ($args{birth_year} + 2);
my $mech = WWW::Mechanize::Firefox->new(create => 1, activate => 1);
$mech->get("https://familysearch.org/search/collection/index#count=20&query=%2Bgivenname%3A$args{givenname}20%2Bsurname%3A$args{surname}20%2Bbirth_place%3A$args{birth_place}%20%2Bbirth_year%3A1910-1914~%20%2Bgender%3A$args{gender}20%2Brace%3A$args{race}&collection_id=2000219");
Input is:
Benjamin,Schuvlein,Germany,1912,M,White
Desired URL is:
https://familysearch.org/search/collection/index#count=20&query=%2Bgivenname%3ABenjamin%20%2Bsurname%3ASchuvlein%20%2Bbirth_place%3AGermany%20%2Bbirth_year%3A1910-1914~%20%2Bgender%3AM%20%2Brace%3AWhite&collection_id=2000219
解决方案Why can't you just change this line:
$mech->get("https://familysearch.org/search/collection/index#count=20&query=%2Bgivenname%3A$args{givenname}20%2Bsurname%3A$args{surname}20%2Bbirth_place%3A$args{birth_place}%20%2Bbirth_year%3A1910-1914~%20%2Bgender%3A$args{gender}20%2Brace%3A$args{race}&collection_id=2000219");
to this:
$mech->get("https://familysearch.org/search/collection/index#count=20&query=%2Bgivenname%3A$args{givenname}20%2Bsurname%3A$args{surname}20%2Bbirth_place%3A$args{birth_place}%20%2Bbirth_year%3A$args(birth_year)~%20%2Bgender%3A$args{gender}20%2Brace%3A$args{race}&collection_id=2000219");
NOTE: I changed this bit:
%3A1910-1914~%20
to this:
%3A$arg(birth_year)~%20