4
\$\begingroup\$

For the first time I've worked in Excel VBA to find rows in my dataset that contain the same adress as another entry in a cluster. These entries have to be merged and the row then is deleted. I've come up with the following, which works (As far as I can tell from the testing I did on small samples of the set):

Sub Merge_Orders()Application.ScreenUpdating = FalseApplication.DisplayStatusBar = FalseDim lastrow As Longlastrow = Cells(Rows.Count, "A").End(xlUp).RowDim y As LongDim x As LongDim j As LongDim k As LongFor i = 2 To lastrow //for each row, starting below header rowj = 1y = (Cells(i, 9)) //this is the clusternumberDo While y = (Cells(i + j, 9)) //while the next row is in the same cluster x = (Cells(i, 12)) //this is the adresscode k = 1    Do While x = (Cells(i + k, 12)) //while the current adresscode is the same as the next, iterating until another adresscode is hit     Cells(i, 16) = Val(Cells(i, 16)) + Val(Cells(i + k, 16)) //update cell value     Cells(i, 18) = Cells(i, 18) + Cells(i + k, 18)  //update cell value     Cells(i, 19) = Cells(i, 19) + Cells(i + k, 19)  //update cell value     If Cells(i, 20) > Cells(i + k, 20) Then     Cells(i, 20) = Cells(i + k, 20)  //update cell value    End If    If Cells(i, 21) > Cells(i + k, 21) Then     Cells(i, 21) = Cells(i + k, 21)  //update cell value    End If    Cells(i, 22) = Cells(i, 22) + Cells(i + k, 22)  //update cell value    Cells(i, 23) = Cells(i, 23) + Cells(i + k, 23)  //update cell value    Rows(i + 1).EntireRow.Delete //Delete the row from which data was pulled    k = k + 1    Loop j = j + 1LoopNext iApplication.ScreenUpdating = TrueApplication.DisplayStatusBar = TrueEnd Sub

The problem I'm facing is time. Testing this on a small sample of ~50 rows took over 5 minutes. My entries total over 100K rows. I would expect it has to run for close to 10 days. It's been running for over a day with no end in sight.

Is there a way to optimize this so I don't have to wait until I'm grey?

Note: I've added //comments just for CR/SO, they are not in VBA.

Mathieu Guindon's user avatar
Mathieu Guindon
75.6k18 gold badges195 silver badges469 bronze badges
askedJan 19, 2017 at 17:22
Rob's user avatar
\$\endgroup\$

1 Answer1

4
\$\begingroup\$

Correctness

First of all, I have some concerns whether your code actually does what you want it to do. Let me explain why I think this.

  1. In your inner Do While Loop you delete the next row, but still incrementk. This means that you are skipping more and more rows. I think your intention was to always look at the next row, i.e. atk = 1.
  2. In the inner loop you do not check whether you are still in the same cluster. So, you will end up aggregating over different clusters if the addresses match.
  3. The inner loop does not depend on the middle loop.
  4. Since you delete rows, your outer loop will run past the last remaining row.

Performance

Next, let me give some performance tips that do not change your code radically:

  1. Accessing a worksheet in Excel VBA is expensive. You should load as much as possible into the memory at once. To achieve this you can load an entire range into a two dimentional Variant array by assigning theValue property of the range to a variant variable, or better theValue2 property. (See thisblog post for some reason to useValue2.) You can write the array back to a range by assigning to theValue property of the range. If you go down this road, you will have to employ some collection to save the rows to delete for later. (When you remove the rows, you should do it from the bottom to the top as this guarantees that the row numbers of the remaining rows are preserved.)

  2. Your middle loop is redundant. This is the case because the loop body does not depend on the index variablej. I think what you actually wanted to do is determine whetherx = (Cells(i + 1, 12)) And y = (Cells(i + 1, 9)) and only do one loop. (I already setk andj to 1 in accordance with my first comment from above.)

  3. You can save some redundant itterations by decrementinglastrow by 1 whenever you delete a row.

Style

Now, let me come to some general remarks regarding the programming style.

  1. It is a very good practice to give variables meaningful names. This helps the next reader of your code a lot trying to understand what the code is doing. In your case, it would make most comments unnecessary. E.g. you could usecurrentCluster instead ofy andcurrentAddress instead ofx. Moreover, If you load the columns into variant arrays, you can call the array for row 9clusterIds and that for column 12adresses.
  2. You should avoid to use the active sheet implicitely. You are referring to it with your calls toCells. This can lead to very hard to detect errors. Instead you could explicitly use it by writingActiveSheet.Cells or enclosing everything in aWith ActiveSheet block and then using.Cells. Personally, I think the first alternative is better since the period can be overlooked or forgotten easily.
  3. You should also avoid to use default properties, like theValue property of the range object returned byCells. Using default properties does not only hurt the readability, it can also lead to obscure errors. Instead, you should use theValue property directly, or betterValue2 when you are reading the data, especially when querying cells formatted as currency. These get crippled byValue. (Again, see thisblog post.)
  4. Always declare your variables. Otherwise, they are implicitly Variant. In your code,y andi are not declared. To ensure that you always declare variables, you can addOption Explicit. Then you get a compiler error for undeclared variables. This has the added advantage that you cannot inadvertedly introduce a new variable by a typo. (Such bugs are very hard to find.)
  5. Get into the habit not to use underscores in method names unless there is a syntactic reason for it. This would be the case when writing an event handler or implementing an interface. The problem with underscores surfaces when using classes in VBA. You actually cannot implement a method of an interface if it contains an underscore. (This is a bug of the VBE.)

Alternative Approaches

Finally, I would like to present a completely different approach to your problem: to use the SQL support for Excel via ADODB. How to use it, can be found in thisMicrosoft article.

Using it the output you seem to want could be produced with a SQL statement similar to the following.

SELECT Cluster, Address, SUM(Value1), SUM(Value2), SUM(Value3)FROM [ActiveSheetName$]GROUP BY Cluster, Address

Here,ActiveSheetName should be the name of the active sheet, which you can get viaActiveSheet.Name and ValueX should be the names in the header row of the columns you want to sum for matching clusters and addresses.

An alterntive approach would be not to use Excel at all and instead use a real database, e.g. Access for starters. They are made to handle requests like yours efficiently and also provide ways to enforce data integrity.

answeredJan 19, 2017 at 20:30
M.Doerner's user avatar
\$\endgroup\$
3
  • \$\begingroup\$+1 for handing off to ADODB, far quicker and less reinventing the wheel than the vba solution.\$\endgroup\$CommentedJan 19, 2017 at 21:37
  • \$\begingroup\$Hi, thanks for the detailed response. You are right about k skipping more and more rows. Unfortunately it can happen that even though the next row doesn't have a matching adress code, the one after that does. So checking only at value(i+1) doesn't hold. Suggested elsewhere was that I check (i+j), which I'm trying our right now. I'm using all your other comments and writing them into VBA(no underscores, value2, logical names). I have no experience with using ADODB and am on a bit of a clock, so I would like to try to avoid that. Thanks for the suggestion, will do it if this doesn't work.\$\endgroup\$CommentedJan 20, 2017 at 9:53
  • \$\begingroup\$Yes, you can check both conditions for i+j in the inner loop. (You really have to check both in the inner loop to avoid leaving the cluster.) When you do this, you also have to delete row i+j instead of row i+1. As an alternative to the inner loop you can also use an if else block only checking the address. The if block would contain the loop body and the else block would increment j.\$\endgroup\$CommentedJan 20, 2017 at 12:02

You mustlog in to answer this question.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.