I'm trying to optimize a query in which I have some requirements where I need to be able to arbitrarily select a rowROWX from a query. Why is it that
SELECT XML_DATA FROM (Select ROWNUM AS MYROWNUM , EXTRACT(COL1,'expresion').getStringVal() AS XML_DATA From "Table1" Where Col2 = 'val' AND ROWNUM < ROWX+1)WHERE MYROWNUM = ROWX ;Runs much faster than
SELECT XML_DATA FROM (Select ROWNUM AS MYROWNUM , EXTRACT(COL1,'expresion').getStringVal() AS XML_DATA From "Table1" Where Col2 = 'val') AND MYROWNUM = ROWXObviously as ROWX grows there is some performance degradation with the first query but the second query has a constant run time (outside of the case where ROWX=1) of about 4 minutes where the base runtime for the first query is non-existent and growth is more acceptable than having to wait 4 minutes per iteration.
Version:Oracle Database 11g Enterprise Edition 11.2.0.3.0 64bit ProductionSorry I'm a software developer (not a DBA!) that primarily works with MySQL/SQL Server so I'm not familiar with Oracle's internal workings feel free to point me in an entirely different direction,
Thanks.
Based on Justin Cave's answer I wrote the query to be deterministic and a bit cleaner (less hacky)
SELECT EXTRACT(COL1,'expresion').getStringVal() AS XML_DATAFROM (Select ROWNUM AS MYROWNUM , CLUSTERED_ID From "Table1" Where Col2 = 'val' ORDER BY CLUSTERED_ID) t1 JOIN table1 ON (t1.CLUSTERED_ID = table1.CLUSTERED_ID)WHERE MYROWNUM = ROWXThis has the benifit of running in constant time (where n = rowx rather than count()) and is much easier to understand.
- Sorry, in some cases for
ROWX<COUNT()(specifically if I don't haveROWNUMin the column list) it still returns no value for the second case. Am I not understanding the purpose ofROWNUM?Colton– Colton2014-04-01 19:00:06 +00:00CommentedApr 1, 2014 at 19:00 - A predicate
rownum = rowxwould only return rows ifrowxwas 1. If it was any other value, the query will return 0 rows. So the two queries are returning a different result in addition to having different performance, right?Justin Cave– Justin Cave2014-04-01 19:01:02 +00:00CommentedApr 1, 2014 at 19:01 - Sorry I updated my earlier comment, YES. That is the case. Am I not able to arbitrarily select a row by number with Oracle 11g?Colton– Colton2014-04-01 19:02:37 +00:00CommentedApr 1, 2014 at 19:02
- Your most recent update does not appear to be deterministic. The
ORDER BYis applied after therownumis assigned in your inner query so there is no guarantee that the result will be the same across runs. It likely will be, assuming that the table remains constant, optimizer parameters are unchanged, etc. but it would be perfectly valid for Oracle to return the same value for every value ofrowxthat you pass in.Justin Cave– Justin Cave2014-04-01 19:36:13 +00:00CommentedApr 1, 2014 at 19:36
1 Answer1
Therownum pseudocolumn is assigned as rows are processed. The predicaterownum = <<x>> will evaluate tofalse if<<x>> is anything other than 1.
Conceptually (and I emphasize that this isn't how Oracle actually works, it's just a useful abstraction), the second query does something like
- Read a row from
table1 - Evaluate the
col2 = 'val'expression - If
col2 = 'val', evaluate theextract - Assign the
rownumto the row that has been returned. Since no other rows have satisfied all the criteria yet, therownumwill be 1 - Evaluate the
rownum = <<x>>predicate. If<<x>>is anything other than 1, the row isrejected - Go back to step 1 and read the next row
In this loop, every row will be read fromtable1. And every row will be rejected (assuming<<x>> is something other than 1) because of therownum = <<x>> predicate. So you'll incur the cost of reading every row from the table, evaluating theextract for every row, and you'll never return any data.
Your first query appears to be working because it is filtering out<<x>> + 1 rows in the subquery. Of course, since there is noORDER BY, your first query always returns an arbitrary row. It would be entirely legal for Oracle to return the same row for every value of<<x>> in your first query. It probably won't, of course, but it is entirely possible that you'd get different rows over time, that some rows would never be returned, that other rows would be returned multiple times, etc. If you want a deterministic result (prior to 12.1 which has some simpler syntax options), you'd need to do two levels of nesting
SELECT * FROM (SELECT b.*, rownum rn FROM (SELECT a.* FROM some_table a ORDER BY some_column) b WHERE rownum <= <<upper_limit>>) c WHERE rn >= <<lower_limit>>For a pretty thorough discussion of the various issues, take a look throughthis askTom thread.
- So is my work around a valid work around or is there a bet alternative. The use case would be a program is trying to select sample data from in col1 depending on testcase so I would like a scalar-ish solution to arbitrarily select a unique row based on the testcase number.Colton– Colton2014-04-01 19:12:28 +00:00CommentedApr 1, 2014 at 19:12
- @Sparksis - I've added to my answer to talk about the problem with the initial solution. Depending on what "sample data" means to you, you might be better off using the actual
sampleclause.Justin Cave– Justin Cave2014-04-01 19:14:29 +00:00CommentedApr 1, 2014 at 19:14 - Awesome, thanks. After reading your comment the first time I pulled the
EXTRACToperation to the outer query and joined the subquery to table1 (18 seconds). After reading your amendment I I ordered the subquery by the clustered id to make it deterministic (0.7 seconds). I think joining is a similar idea to what you have but it is more intuitive to me when reading it.Colton– Colton2014-04-01 19:22:43 +00:00CommentedApr 1, 2014 at 19:22
Explore related questions
See similar questions with these tags.