2
\$\begingroup\$

I wrote some code that uses the arcpy module to read from some tables and checks to see if those values are in another table then if they aren't it writes them to a txt document. I made a change so I could write more than just one field to the txt document. When I did this my code slowed down tremendously. Before I think it took about 5 minutes to run. After I stopped it from running at about 30 minutes in. Why'd my code take so much longer the second time around?

The common field has values that are in one table and potentially not in the other.

Here's what I changed it to:

import arcpyimport osimport setsimport time#script takes longer now.. Wonder why#Looks like this just isn't working anymorepathOne = os.path.join(real path that exists)pathOneFields = ["Common field","OBJECTID","Shape.STArea()","Shape.STLength()"]listOne = [row for row in arcpy.da.SearchCursor(pathOne, pathOneFields)]print "Done with List One"pathTwo = os.path.join(this path is also real...or is it? It is)pathTwoFields = ["Common field","OBJECTID","SHAPE.STArea()","SHAPE.STLength()"]listTwo = [row for row in arcpy.da.SearchCursor(pathTwo,                                                       pathTwoFields)]print "Done with Path Two List"#just counting some stuff to make sure I actually did somethingi = 0j = 0with open("None Equals IDs.txt", "a") as text:    for item in pathOneList:        i += 1        if (item[0] not in pathTwoList) and (str(item[0]) != "Null"):            #add month to this file name as well as full path for the file            j += 1            text.write("\n{} not in pathTwo".format(item[1]))        elif str(item) == "Null":            j += 1            text.write("\n{} not in pathTwo {} {} {}".format(item,                                                                     "How should we",                                                                     "handle Null",                                                                     "Values?"))print "Done with what's in pathOne and not pathTwo"print jk = 0l = 0with open("None Equals IDs.txt", "a") as text:    for item in pathTwoList:        k += 1        if (item[0] not in pathOneList) and  (str(item[0]) != "Null"):            #add month to this file name as well as full path for the file            l += 1            text.write("\n{} not in pathOne".format(item[1]))        elif str(item[0]) == "Null":            l += 1            text.write("\n{} not in pathOne {} {} {}".format(item,                                                                     "How should we",                                                                     "handle Null",                                                                     "Values?"))print "Done with what's in pathTwo and not in pathOne"print lprint "Finished"

Here's what I changed it from:

    import arcpy    import os    import sets    import time    #This is the fast script    pathOne = os.path.join(real path that exists)    pathOneFields = ["Common field","OBJECTID","Shape.STArea()","Shape.STLength()"]    listOne = [row[0] for row in arcpy.da.SearchCursor(pathOne, pathOneFields)]    print "Done with List One"    pathTwo = os.path.join(this path is also real...or is it? It is)    pathTwoFields = ["Common field","OBJECTID","SHAPE.STArea()","SHAPE.STLength()"]    listTwo = [row[0] for row in arcpy.da.SearchCursor(pathTwo,                                                           pathTwoFields)]    print "Done with Path Two List"    #just counting some stuff to make sure I actually did something    i = 0    j = 0    with open("None Equals IDs.txt", "a") as text:        for item in pathOneList:            i += 1            if (item not in pathTwoList) and (str(item) != "Null"):                #add month to this file name as well as full path for the file                j += 1                text.write("\n{} not in pathTwo".format(item))            elif str(item) == "Null":                j += 1                text.write("\n{} not in pathTwo {} {} {}".format(item,                                                                         "How should we",                                                                         "handle Null",                                                                         "Values?"))    print "Done with what's in pathOne and not pathTwo"    print j    k = 0    l = 0    with open("None Equals IDs.txt", "a") as text:        for item in pathTwoList:            k += 1            if (item not in pathOneList) and  (str(item) != "Null"):                #add month to this file name as well as full path for the file                l += 1                text.write("\n{} not in pathOne".format(item))            elif str(item[0]) == "Null":                l += 1                text.write("\n{} not in pathOne {} {} {}".format(item,                                                                         "How should we",                                                                         "handle Null",                                                                         "Values?"))    print "Done with what's in pathTwo and not in pathOne"    print l    print "Finished"

I was expecting the former bit of code to take longer, but not over 6 times longer. It wasn't even halfway done when I stopped it. How can this be!?

rolfl's user avatar
rolfl
98.1k17 gold badges220 silver badges419 bronze badges
askedNov 21, 2017 at 22:09
\$\endgroup\$
2
  • \$\begingroup\$because of theitem[0] in your 'from' script i doubt this was really run. get your from script, run and time it and post it.\$\endgroup\$CommentedNov 21, 2017 at 22:22
  • \$\begingroup\$@stefan Was the logic not working? I thought it might not have been. Can you not use indexing in if statements?\$\endgroup\$CommentedNov 21, 2017 at 22:29

1 Answer1

4
\$\begingroup\$

AssumingpathTwoList islistTwo,pathOneList islistOne..

From what I understand,you've actually broken the initial logic. Look at theitem[0] not in pathTwoList expression.pathTwoList is a list of rows returned by the AcrPy query,item[0] is a value of the "Common Field". This means that the expression would always returnFalse after a full scan of thepathTwoList list, which, in other words, means that you arehitting the worst case every time, which explains the slowdown.

A different approach would probably be to makesets of common field values and work with thedifference of the sets.

answeredNov 21, 2017 at 22:54
alecxe's user avatar
\$\endgroup\$
6
  • \$\begingroup\$I wanted to use sets, but some of the values that the query returns are repeated.\$\endgroup\$CommentedNov 21, 2017 at 23:08
  • \$\begingroup\$I went ahead and corrected some stuff that I transposed wrong. it should be right now. That was wrong in my actual code too. Thanks for pointing it out. It was wrong in both the slow version and the quick version though.\$\endgroup\$CommentedNov 21, 2017 at 23:10
  • \$\begingroup\$what's wrong withelif str(item) =="Null":?\$\endgroup\$CommentedNov 22, 2017 at 23:46
  • \$\begingroup\$@Steve sorry, missed your comment. What do you mean what is wrong, please elaborate.\$\endgroup\$CommentedNov 26, 2017 at 1:27
  • \$\begingroup\$I think I figured it out. For some reason the elif never executes. I think it's because str(item) never equals null. Is it worth putting my logic in functions to avoid repetition here?\$\endgroup\$CommentedNov 27, 2017 at 22:00

You mustlog in to answer this question.