2545 views|1 replies

5

Posts

0

Resources
The OP
 

Golang crawler language access proxy [Copy link]

The golang language is also a framework language in the crawler. Of course, many web crawler novices will face the choice of what language is suitable for the crawler. Generally, many crawler users will choose python and java framework languages to write crawler programs to collect data. In fact, in addition to python and java framework languages, there are many languages that are suitable for data collection. It’s just that python and JAVA language frameworks are more in line with everyone’s choice. What I choose here is the golang language to call my own crawler program. As long as you understand the principles of crawlers, no matter what programming language you use, you can basically write a crawler system. No matter what crawler language framework you use, using an IP to collect data for a long time will definitely be restricted. At this time, you need to use a crawler agent to solve the problem.

Golang crawler steps:

1. Set crawler goals

2. Create a crawler interface

3. Send HTTP to get data

4. Block invalid requests

5. Analyze data content

6. Storing Data

7. Use crawler agents to continuously collect

The following is a demo of the crawler proxy code configured in golang language:

        package main

        import (
            "net/url"
            "net/http"
            "bytes"
            "fmt"
            "io/ioutil"
        )

        // 代理服务器(产品官网 www.16yun.cn)
        const ProxyServer = "t.16yun.cn:31111"

        type ProxyAuth struct {
            Username string
            Password string
        }

        func (p ProxyAuth) ProxyClient() http.Client {

            var proxyURL *url.URL
            if p.Username != ""&& p.Password!="" {
                proxyURL, _ = url.Parse("http://" + p.Username + ":" + p.Password + "@" + ProxyServer)
            }else{
                proxyURL, _ = url.Parse("http://" + ProxyServer)
            }
            return http.Client{Transport: &http.Transport{Proxy:http.ProxyURL(proxyURL)}}
        }

        func main()  {


            targetURI := "https://httpbin.org/ip"


            // 初始化 proxy http client
            client := ProxyAuth{"username",  "password"}.ProxyClient()

            request, _ := http.NewRequest("GET", targetURI, bytes.NewBuffer([] byte(``)))

            // 设置Proxy-Tunnel
            // rand.Seed(time.Now().UnixNano())
            // tunnel := rand.Intn(10000)
            // request.Header.Set("Proxy-Tunnel", strconv.Itoa(tunnel) )

            response, err := client.Do(request)

            if err != nil {
                panic("failed to connect: " + err.Error())
            } else {
                bodyByte, err := ioutil.ReadAll(response.Body)
                if err != nil {
                    fmt.Println("读取 Body 时出错", err)
                    return
                }
                response.Body.Close()

                body := string(bodyByte)

                fmt.Println("Response Status:", response.Status)
                fmt.Println("Response Header:", response.Header)
                fmt.Println("Response Body:\n", body)
            }
        }

Latest reply

Golang is developing very fast, and many places are slowly replacing old platforms.   Details Published on 2020-9-10 19:23
 
 

7462

Posts

2

Resources
2
 

Golang is developing very fast, and many places are slowly replacing old platforms.

Personal signature

默认摸鱼,再摸鱼。2022、9、28

 
 
 

Guess Your Favourite
Just looking around
Find a datasheet?

EEWorld Datasheet Technical Support

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
circle

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
快速回复 返回顶部 Return list