The golang language is also a framework language in the crawler. Of course, many web crawler novices will face the choice of what language is suitable for the crawler. Generally, many crawler users will choose python and java framework languages to write crawler programs to collect data. In fact, in addition to python and java framework languages, there are many languages that are suitable for data collection. It’s just that python and JAVA language frameworks are more in line with everyone’s choice. What I choose here is the golang language to call my own crawler program. As long as you understand the principles of crawlers, no matter what programming language you use, you can basically write a crawler system. No matter what crawler language framework you use, using an IP to collect data for a long time will definitely be restricted. At this time, you need to use a crawler agent to solve the problem.
Golang crawler steps:
1. Set crawler goals
2. Create a crawler interface
3. Send HTTP to get data
4. Block invalid requests
5. Analyze data content
6. Storing Data
7. Use crawler agents to continuously collect
The following is a demo of the crawler proxy code configured in golang language:
package main
import (
"net/url"
"net/http"
"bytes"
"fmt"
"io/ioutil"
)
// 代理服务器(产品官网 www.16yun.cn)
const ProxyServer = "t.16yun.cn:31111"
type ProxyAuth struct {
Username string
Password string
}
func (p ProxyAuth) ProxyClient() http.Client {
var proxyURL *url.URL
if p.Username != ""&& p.Password!="" {
proxyURL, _ = url.Parse("http://" + p.Username + ":" + p.Password + "@" + ProxyServer)
}else{
proxyURL, _ = url.Parse("http://" + ProxyServer)
}
return http.Client{Transport: &http.Transport{Proxy:http.ProxyURL(proxyURL)}}
}
func main() {
targetURI := "https://httpbin.org/ip"
// 初始化 proxy http client
client := ProxyAuth{"username", "password"}.ProxyClient()
request, _ := http.NewRequest("GET", targetURI, bytes.NewBuffer([] byte(``)))
// 设置Proxy-Tunnel
// rand.Seed(time.Now().UnixNano())
// tunnel := rand.Intn(10000)
// request.Header.Set("Proxy-Tunnel", strconv.Itoa(tunnel) )
response, err := client.Do(request)
if err != nil {
panic("failed to connect: " + err.Error())
} else {
bodyByte, err := ioutil.ReadAll(response.Body)
if err != nil {
fmt.Println("读取 Body 时出错", err)
return
}
response.Body.Close()
body := string(bodyByte)
fmt.Println("Response Status:", response.Status)
fmt.Println("Response Header:", response.Header)
fmt.Println("Response Body:\n", body)
}
}
|