2
\$\begingroup\$

I've written a script in vba to parse movie names and year from a torrent site. The script is doing just awesome. Although the scraper is leaving no room for complaint, I'm still dubious about how the way I've set the proxy is accurate. Moreover, I've set two proxies in my scraper. Btw, I went through few codes where the proxy has been set using.setProxy 2,"xxxxxxx" whereas I used.setProxy 1 in my below code because the earlier one was throwing errors. Once again, the code is working flawlessly. Thanks in advance.

Here is what I've written:

Sub Torrent_Scraper()    Dim http As New ServerXMLHTTP60, html As New HTMLDocument    Dim post As HTMLHtmlElement    With http        .Open "GET", "https://yts.ag/browse-movies", False        .setRequestHeader "Content-Type", "text/html; charset=utf-8"        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"        .setProxy 1, "61.233.25.166:80", "46.101.27.218:8118"        .send        html.body.innerHTML = .responseText    End With    For Each post In html.getElementsByClassName("browse-movie-bottom")        With post.getElementsByClassName("browse-movie-title")            x = x + 1: Cells(x, 1) = .item(0).innerText        End With        With post.getElementsByClassName("browse-movie-year")            If .Length Then Cells(x, 2) = .item(0).innerText        End With    Next postEnd Sub
askedSep 15, 2017 at 20:16
SIM's user avatar
\$\endgroup\$

1 Answer1

2
\$\begingroup\$

Here are my suggestions:

UseApplication.ScreenUpdating = False to speed up the execution of the code.See here for more details.

Declare elements where you need them. While your code is short (which is great, btw), it's a good habit to get into.

UseOption Explicit and declare thatx variable appropriately.x right now is of typeVariant but you're only using it like aLong, so you should declare it asLong. Also, changex to a more meaningful name likerow.

Don't put two executable statements in the same line.

Havingx = x + 1: Cells... is clever, but it may confuse other people later on (or even yourself if you don't touch this code for months).

Set a reference to the active sheet if that's the sheet you intend to use. When you useCells(...) you are referring to the active sheet. However, while you are looping through your items in theFor loop, if you were to click on another sheet, then your data will get messed up when you could avoid this issue pretty easily. In the code below,Cells(...) becomesactiveWs.Cells(...).

Personally, I would only useWith ... End With if I had many statements I needed it for (like the first use you have of it). In the second and third uses ofWith, I would just remove it. I would also useIf ... Then ... End If so theIf statement isn't so long.

I can tell that you wanted to make the code as short as possible, but sacrificing a few lines for clarity will be more beneficial in the long run, in my opinion.

Lastly, aboutIf post.getElementsByClassName("browse-movie-year").Length Then, I would explicitly state what you really mean, which is that if the length if greater than 0. While this statement returns an array and I get that you're checking for an array length and it will always be>=0, it's best to be explicit for clarity.

EDIT: Thinking about this further, this codecould be considered to do too much. If you were to follow the Single Responsibility Principle, then we could split out the work to other, private functions.

Here is my version of the refactored code (untested):

Option ExplicitSub Torrent_Scraper()    Application.ScreenUpdating = False    Dim url as String    Dim proxyServer As Variant    Dim bypassList as Variant    url = "https://yts.ag/browse-movies"    proxyServer = "61.233.25.166:80"    bypassList = "46.101.27.218:8118"    Dim html As New HTMLDocument    html.body.innerHTML = GetHttpResponseTextByProxy(url, proxyServer, bypassList)    Call SetMovieData(html, ActiveSheet)    Application.ScreenUpdating = TrueEnd SubPrivate Function GetHttpResponseTextByProxy( _    ByRef url as String, _    ByRef proxyServer as Variant, _    ByRef bypassList as Variant) As String    Dim http As New ServerXMLHTTP60    With http        .Open "GET", url, False        .setRequestHeader "Content-Type", "text/html; charset=utf-8"        .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36"        .setProxy 1, proxyServer, bypassList        .send    End With    GetHttpResponseTextByProxy = http.responseTextEnd FunctionPrivate Sub SetMovieData(ByRef html as HTMLDocument, ByRef ws As Worksheet)    Dim post As HTMLHtmlElement    Dim row As Long    For Each post In html.getElementsByClassName("browse-movie-bottom")        row = row + 1        ws.Cells(row, 1) = post.getElementsByClassName("browse-movie-title").item(0).innerText        If post.getElementsByClassName("browse-movie-year").Length > 0 Then            ws.Cells(row, 2) = post.getElementsByClassName("browse-movie-year").item(0).innerText        End If    Next postEnd Sub
answeredSep 18, 2017 at 10:58
Joseph's user avatar
\$\endgroup\$
3
  • \$\begingroup\$Thanks joseph4tw, for such an explicitly detailed review. Your explanation brings the clarity of few things that I really missed so long. However, do you think the usage of proxy is right in my script. Although it is working, I came across few usage where I found it being used assetProxy 2, whereas I usedseProxy 1,. Thanks again.\$\endgroup\$CommentedSep 18, 2017 at 16:06
  • \$\begingroup\$Hi @Shahin, unfortunately I'm not an expert in this area, however, this documentation might help you:msdn.microsoft.com/en-us/library/ms760236(v=vs.85).aspx I wish I could be of further help here.\$\endgroup\$CommentedSep 18, 2017 at 18:20
  • \$\begingroup\$You have already helped me a lot. Thanks.\$\endgroup\$CommentedSep 18, 2017 at 18:21

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.