I've written a script in vba to parse movie names and year from a torrent site. The script is doing just awesome. Although the scraper is leaving no room for complaint, I'm still dubious about how the way I've set the proxy is accurate. Moreover, I've set two proxies in my scraper. Btw, I went through few codes where the proxy has been set using.setProxy 2,"xxxxxxx" whereas I used.setProxy 1 in my below code because the earlier one was throwing errors. Once again, the code is working flawlessly. Thanks in advance.
Here is what I've written:
Sub Torrent_Scraper() Dim http As New ServerXMLHTTP60, html As New HTMLDocument Dim post As HTMLHtmlElement With http .Open "GET", "https://yts.ag/browse-movies", False .setRequestHeader "Content-Type", "text/html; charset=utf-8" .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36" .setProxy 1, "61.233.25.166:80", "46.101.27.218:8118" .send html.body.innerHTML = .responseText End With For Each post In html.getElementsByClassName("browse-movie-bottom") With post.getElementsByClassName("browse-movie-title") x = x + 1: Cells(x, 1) = .item(0).innerText End With With post.getElementsByClassName("browse-movie-year") If .Length Then Cells(x, 2) = .item(0).innerText End With Next postEnd Sub1 Answer1
Here are my suggestions:
UseApplication.ScreenUpdating = False to speed up the execution of the code.See here for more details.
Declare elements where you need them. While your code is short (which is great, btw), it's a good habit to get into.
UseOption Explicit and declare thatx variable appropriately.x right now is of typeVariant but you're only using it like aLong, so you should declare it asLong. Also, changex to a more meaningful name likerow.
Don't put two executable statements in the same line.
Havingx = x + 1: Cells... is clever, but it may confuse other people later on (or even yourself if you don't touch this code for months).
Set a reference to the active sheet if that's the sheet you intend to use. When you useCells(...) you are referring to the active sheet. However, while you are looping through your items in theFor loop, if you were to click on another sheet, then your data will get messed up when you could avoid this issue pretty easily. In the code below,Cells(...) becomesactiveWs.Cells(...).
Personally, I would only useWith ... End With if I had many statements I needed it for (like the first use you have of it). In the second and third uses ofWith, I would just remove it. I would also useIf ... Then ... End If so theIf statement isn't so long.
I can tell that you wanted to make the code as short as possible, but sacrificing a few lines for clarity will be more beneficial in the long run, in my opinion.
Lastly, aboutIf post.getElementsByClassName("browse-movie-year").Length Then, I would explicitly state what you really mean, which is that if the length if greater than 0. While this statement returns an array and I get that you're checking for an array length and it will always be>=0, it's best to be explicit for clarity.
EDIT: Thinking about this further, this codecould be considered to do too much. If you were to follow the Single Responsibility Principle, then we could split out the work to other, private functions.
Here is my version of the refactored code (untested):
Option ExplicitSub Torrent_Scraper() Application.ScreenUpdating = False Dim url as String Dim proxyServer As Variant Dim bypassList as Variant url = "https://yts.ag/browse-movies" proxyServer = "61.233.25.166:80" bypassList = "46.101.27.218:8118" Dim html As New HTMLDocument html.body.innerHTML = GetHttpResponseTextByProxy(url, proxyServer, bypassList) Call SetMovieData(html, ActiveSheet) Application.ScreenUpdating = TrueEnd SubPrivate Function GetHttpResponseTextByProxy( _ ByRef url as String, _ ByRef proxyServer as Variant, _ ByRef bypassList as Variant) As String Dim http As New ServerXMLHTTP60 With http .Open "GET", url, False .setRequestHeader "Content-Type", "text/html; charset=utf-8" .setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/60.0.3112.113 Safari/537.36" .setProxy 1, proxyServer, bypassList .send End With GetHttpResponseTextByProxy = http.responseTextEnd FunctionPrivate Sub SetMovieData(ByRef html as HTMLDocument, ByRef ws As Worksheet) Dim post As HTMLHtmlElement Dim row As Long For Each post In html.getElementsByClassName("browse-movie-bottom") row = row + 1 ws.Cells(row, 1) = post.getElementsByClassName("browse-movie-title").item(0).innerText If post.getElementsByClassName("browse-movie-year").Length > 0 Then ws.Cells(row, 2) = post.getElementsByClassName("browse-movie-year").item(0).innerText End If Next postEnd Sub- \$\begingroup\$Thanks joseph4tw, for such an explicitly detailed review. Your explanation brings the clarity of few things that I really missed so long. However, do you think the usage of proxy is right in my script. Although it is working, I came across few usage where I found it being used as
setProxy 2,whereas I usedseProxy 1,. Thanks again.\$\endgroup\$SIM– SIM2017-09-18 16:06:10 +00:00CommentedSep 18, 2017 at 16:06 - \$\begingroup\$Hi @Shahin, unfortunately I'm not an expert in this area, however, this documentation might help you:msdn.microsoft.com/en-us/library/ms760236(v=vs.85).aspx I wish I could be of further help here.\$\endgroup\$Joseph– Joseph2017-09-18 18:20:32 +00:00CommentedSep 18, 2017 at 18:20
- \$\begingroup\$You have already helped me a lot. Thanks.\$\endgroup\$SIM– SIM2017-09-18 18:21:57 +00:00CommentedSep 18, 2017 at 18:21
You mustlog in to answer this question.
Explore related questions
See similar questions with these tags.

