0

I'm trying to optimize a query in which I have some requirements where I need to be able to arbitrarily select a rowROWX from a query. Why is it that

SELECT XML_DATA FROM     (Select ROWNUM AS MYROWNUM , EXTRACT(COL1,'expresion').getStringVal() AS XML_DATA    From "Table1" Where Col2 = 'val' AND ROWNUM < ROWX+1)WHERE MYROWNUM = ROWX ;

Runs much faster than

SELECT XML_DATA FROM (Select ROWNUM AS MYROWNUM , EXTRACT(COL1,'expresion').getStringVal() AS XML_DATA    From "Table1" Where Col2 = 'val') AND MYROWNUM = ROWX

Obviously as ROWX grows there is some performance degradation with the first query but the second query has a constant run time (outside of the case where ROWX=1) of about 4 minutes where the base runtime for the first query is non-existent and growth is more acceptable than having to wait 4 minutes per iteration.

Version:Oracle Database 11g Enterprise Edition  11.2.0.3.0  64bit Production

Sorry I'm a software developer (not a DBA!) that primarily works with MySQL/SQL Server so I'm not familiar with Oracle's internal workings feel free to point me in an entirely different direction,

Thanks.


Based on Justin Cave's answer I wrote the query to be deterministic and a bit cleaner (less hacky)

SELECT EXTRACT(COL1,'expresion').getStringVal() AS XML_DATAFROM (Select ROWNUM AS MYROWNUM , CLUSTERED_ID From "Table1" Where Col2 = 'val' ORDER BY CLUSTERED_ID) t1 JOIN table1 ON (t1.CLUSTERED_ID = table1.CLUSTERED_ID)WHERE MYROWNUM = ROWX

This has the benifit of running in constant time (where n = rowx rather than count()) and is much easier to understand.

askedApr 1, 2014 at 18:55
Colton's user avatar
4
  • Sorry, in some cases forROWX<COUNT() (specifically if I don't haveROWNUM in the column list) it still returns no value for the second case. Am I not understanding the purpose ofROWNUM?CommentedApr 1, 2014 at 19:00
  • A predicaterownum = rowx would only return rows ifrowx was 1. If it was any other value, the query will return 0 rows. So the two queries are returning a different result in addition to having different performance, right?CommentedApr 1, 2014 at 19:01
  • Sorry I updated my earlier comment, YES. That is the case. Am I not able to arbitrarily select a row by number with Oracle 11g?CommentedApr 1, 2014 at 19:02
  • Your most recent update does not appear to be deterministic. TheORDER BY is applied after therownum is assigned in your inner query so there is no guarantee that the result will be the same across runs. It likely will be, assuming that the table remains constant, optimizer parameters are unchanged, etc. but it would be perfectly valid for Oracle to return the same value for every value ofrowx that you pass in.CommentedApr 1, 2014 at 19:36

1 Answer1

1

Therownum pseudocolumn is assigned as rows are processed. The predicaterownum = <<x>> will evaluate tofalse if<<x>> is anything other than 1.

Conceptually (and I emphasize that this isn't how Oracle actually works, it's just a useful abstraction), the second query does something like

  1. Read a row fromtable1
  2. Evaluate thecol2 = 'val' expression
  3. Ifcol2 = 'val', evaluate theextract
  4. Assign therownum to the row that has been returned. Since no other rows have satisfied all the criteria yet, therownum will be 1
  5. Evaluate therownum = <<x>> predicate. If<<x>> is anything other than 1, the row isrejected
  6. Go back to step 1 and read the next row

In this loop, every row will be read fromtable1. And every row will be rejected (assuming<<x>> is something other than 1) because of therownum = <<x>> predicate. So you'll incur the cost of reading every row from the table, evaluating theextract for every row, and you'll never return any data.

Your first query appears to be working because it is filtering out<<x>> + 1 rows in the subquery. Of course, since there is noORDER BY, your first query always returns an arbitrary row. It would be entirely legal for Oracle to return the same row for every value of<<x>> in your first query. It probably won't, of course, but it is entirely possible that you'd get different rows over time, that some rows would never be returned, that other rows would be returned multiple times, etc. If you want a deterministic result (prior to 12.1 which has some simpler syntax options), you'd need to do two levels of nesting

SELECT *  FROM (SELECT b.*,               rownum rn          FROM (SELECT a.*                  FROM some_table a                 ORDER BY some_column) b         WHERE rownum <= <<upper_limit>>) c WHERE rn >= <<lower_limit>>

For a pretty thorough discussion of the various issues, take a look throughthis askTom thread.

answeredApr 1, 2014 at 19:07
Justin Cave's user avatar
3
  • So is my work around a valid work around or is there a bet alternative. The use case would be a program is trying to select sample data from in col1 depending on testcase so I would like a scalar-ish solution to arbitrarily select a unique row based on the testcase number.CommentedApr 1, 2014 at 19:12
  • @Sparksis - I've added to my answer to talk about the problem with the initial solution. Depending on what "sample data" means to you, you might be better off using the actualsample clause.CommentedApr 1, 2014 at 19:14
  • Awesome, thanks. After reading your comment the first time I pulled theEXTRACT operation to the outer query and joined the subquery to table1 (18 seconds). After reading your amendment I I ordered the subquery by the clustered id to make it deterministic (0.7 seconds). I think joining is a similar idea to what you have but it is more intuitive to me when reading it.CommentedApr 1, 2014 at 19:22

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.