Database dynamic query

Question 1

I am working withC#,Npgsql,EF Core andPostgres.
I defined an endpoint for a paginated search, where the filters and orderBy column are dynamic. The endpoint accepts aPaginationOptions instance:

public class PaginationOptions{    public int Page { get; set; }    public int ItemsPerPage { get; set; }    public string OrderBy { get; set; }    public bool Desc { get; set; }    public IList<FilterValues> Filters { get; set; }}public class FilterValues{    public string FieldName { get; set; }    public IEnumerable<string> Values { get; set; }}

The following method performs the search and returns a Tuple with the sorted items and a counter for the total items in the table:

public Tuple<IList<T>, int> Search(PaginationOptions paginationOptions){    if (!string.IsNullOrEmpty(paginationOptions.OrderBy))    {        CheckFilterField(paginationOptions.OrderBy);    }    int offset = (paginationOptions.Page - 1) * paginationOptions.ItemsPerPage;    string orderBy = !string.IsNullOrEmpty(paginationOptions.OrderBy) ? paginationOptions.OrderBy : $"{prefix}.title";    string order = paginationOptions.Desc ? "DESC" : "ASC";    using (NpgsqlConnection connection = GetConnection())    {        string query = $"{GetQueryfields()} {GetFromClause()} {BuildWhere(paginationOptions.Filters)}";        string itemsQuery = $"SELECT {query} ORDER BY {orderBy} {order}";        NpgsqlCommand command = BuildCommand(connection, itemsQuery, paginationOptions.Filters);        IDataReader reader = command.ExecuteReader();        ISet<Guid> guids = new HashSet<Guid>(paginationOptions.ItemsPerPage);        while (reader.Read())        {            Guid guid = reader.GetGuid(0);            if (!guids.Contains(guid))            {                guids.Add(guid);            }        }        ISet<Guid> filteredGuids = guids.Skip(offset).Take(paginationOptions.ItemsPerPage).ToHashSet();        IList<T> items = GetItems(filteredGuids);        return Tuple.Create(items, guids.Count);    }}

In words: In each entity there are the query fields and the FROM clause defiend. They are splitted because I need the FROM clause in another method as well.
The WHERE (prepared statement) and ORDER BY are built dynamically using the parameters. TheBuildCommand creates theNpgsqlCommand and sets the parameters. Then I use Dapper for a raw query in order to get the ids of the requested items, then I load them using theEF and at the end ISkip andTake in order to have the right pagination.
The problem ist thatEF does not allow to add an ORDER BY clause for raw queries, it is only available throug theLinq expression:

context.AnEntity.FromSqlRaw("Select * from users ORDER BY id").OrderBy(x => x.Title);

ORDER BY id is ignored, items are sorted by the expression. If no orderby linq expression is used, the framework addsORDER BY entity.id. Otherwise I could have done followings:

string itemsQuery = $"SELECT {query} ORDER BY {orderBy} {order}";context.AnEntity.FromSqlRaw(itemsQuery).Skip(offset).Take(limit)...

It works. Even on a table with 1mil a query takes 2,8sec
Comments? Improvment hints?

Edit:
I ended up with a query which loads the paged data in 2,2sec over a table with 1mil rows. Is it an acceptable result?

Question 2

Is the Guid-column the primary key?

Question 3

Did you look at DynamicLinq or even making your own expression tree for the orderbys? example -codereview.stackexchange.com/a/256123/52662. Also you state its an endpoint if it's a webapi project what about just adding OData support for the pagenation? OData has some flexibility to intercept the iqueryable.

Question 4

yes, theid (uuid) column is the primary key.

Question 5

@CharlesNRice Didn't know aboutDynamicLinq, will take a look. I am not fan ofOData but I will give it a try.

Question 6

[PaginationOptions] RenameDes toOrderByDescending and removeOrderBy, this would be more manageable as you already have a defaultorder by in your query.
I can't see a check for the fields ? if not, try to ensure that every field name exists on your table before building the query.
UseDapper to count and select queries, as they're faster in dapper thanEF.
Since this is a Web API, aTuple would be an overkill for your controller, you might consider returningIEnumerable orIDictionary<int, IList<T>> or any simpler type, just try to simplify the results to the consumer.
you can useFETCH clause instead of LINQ'sTake andSkip, which would make it faster and also would decrease the allocated memory. referrer to thispage to read more about it.(FYI : PostgreSQL, SQL Server, Oracle, and MySQL supportsFETCH clause with some caveats).
guids is already aHashSet<Guid> so checking the guid in the loop is unnecessary, as theguids.Add(guid) would ignore the value if it's already in the collection.

Question 7

For theCount andSelect I am using Npgsql, no EF nor Dapper. ThisSearch method is not in the controller, it is in an abstract class, called by the controller where i manage the returned items.Take andSkip are not used on a IO operation since the Guids are already in memory, so it makes no difference, right? Thanks for the rest!

Question 8

@Emaborsa if you are usingNpgsql you should state that, and edit your post to clearify this, as the first line of your post states that you're usingDapper andEF. for the method, I know it's not the controller. what I meant is the service should always returns simple types, as you don't want to add more data processing inside your controller.

Question 9

not sure how things work on your application, but if you just storing guids to re-query the DB with the full data, unless if it's a requirement, you might loose some performance for that. though usingFetch clause would give you what you need with faster results.

Question 10

About the method: But I need to return the pagend data AND the total items count. About of performance: I agree that is non sense to retrieve all the data and then filter it for the paging, but since the performance is not too bad, I though it was a good solution anyway. I will review the query,.

Question 11

Just a quick shot at

IDataReader reader = command.ExecuteReader();ISet<Guid> guids = new HashSet<Guid>(paginationOptions.ItemsPerPage);while (reader.Read()){    Guid guid = reader.GetGuid(0);    if (!guids.Contains(guid))    {        guids.Add(guid);    }}ISet<Guid> filteredGuids = guids.Skip(offset).Take(paginationOptions.ItemsPerPage).ToHashSet();

Because the Guid-column is your primary-key-column, you should just use aList<Guid>. You won't need to check ifreader.GetGuid(0) exists because that just can't happen.

I wouldn't create the formerHashSet<Guid> by using the capacity ofpaginationOptions.ItemsPerPage but would use a fixed number which should be in the lower range but not too low. Because if you use this overloaded constructor, then you should use it in a way that it scales good. Having a low number for capacity, you may reach the point fast that there aren't any slots left meaning theHashSet needs to be resized. If the number for capacity is too high it may take more time at initialization because its createing two array with the length of the passed capacity.

Take a look at thereference-source forHashSet<T>

This applies for aList<T> as well.

To come to an end I would change the code a bove to

IDataReader reader = command.ExecuteReader();List<Guid> guids = new List<Guid>(2000);while (reader.Read()){    guids.Add(reader.GetGuid(0));}ISet<Guid> filteredGuids = guids.Skip(offset).Take(paginationOptions.ItemsPerPage).ToHashSet();

but would extract the2000 to a meaningful named constant.

Question 12

Thanks for the hint, but If the query has a join with, the result could include two rows with the same id and different joined values which are neede for filtering. That's why I used a ISet. I could use a IList, but in this case Id need the checkif contains...`.

Question 13

I've just tried to change fromISet<Guid> guids = new HashSet<Guid>(paginationOptions.ItemsPerPage); toList<Guid> guids = new List<Guid>(2000);. For the same query the first takes 500ms, the latter 12sec.

iSR5 6,3831 gold badge9 silver badges16 bronze badges · Answer 1 · 2021-07-23 03:13:03Z

[PaginationOptions] RenameDes toOrderByDescending and removeOrderBy, this would be more manageable as you already have a defaultorder by in your query.
I can't see a check for the fields ? if not, try to ensure that every field name exists on your table before building the query.
UseDapper to count and select queries, as they're faster in dapper thanEF.
Since this is a Web API, aTuple would be an overkill for your controller, you might consider returningIEnumerable orIDictionary<int, IList<T>> or any simpler type, just try to simplify the results to the consumer.
you can useFETCH clause instead of LINQ'sTake andSkip, which would make it faster and also would decrease the allocated memory. referrer to thispage to read more about it.(FYI : PostgreSQL, SQL Server, Oracle, and MySQL supportsFETCH clause with some caveats).
guids is already aHashSet<Guid> so checking the guid in the loop is unnecessary, as theguids.Add(guid) would ignore the value if it's already in the collection.

For theCount andSelect I am using Npgsql, no EF nor Dapper. ThisSearch method is not in the controller, it is in an abstract class, called by the controller where i manage the returned items.Take andSkip are not used on a IO operation since the Guids are already in memory, so it makes no difference, right? Thanks for the rest!
@Emaborsa if you are usingNpgsql you should state that, and edit your post to clearify this, as the first line of your post states that you're usingDapper andEF. for the method, I know it's not the controller. what I meant is the service should always returns simple types, as you don't want to add more data processing inside your controller.
not sure how things work on your application, but if you just storing guids to re-query the DB with the full data, unless if it's a requirement, you might loose some performance for that. though usingFetch clause would give you what you need with faster results.
About the method: But I need to return the pagend data AND the total items count. About of performance: I agree that is non sense to retrieve all the data and then filter it for the paging, but since the performance is not too bad, I though it was a good solution anyway. I will review the query,.

Heslacher 51k5 gold badges83 silver badges177 bronze badges · Answer 2 · 2021-07-22 15:12:24Z

Just a quick shot at

IDataReader reader = command.ExecuteReader();ISet<Guid> guids = new HashSet<Guid>(paginationOptions.ItemsPerPage);while (reader.Read()){    Guid guid = reader.GetGuid(0);    if (!guids.Contains(guid))    {        guids.Add(guid);    }}ISet<Guid> filteredGuids = guids.Skip(offset).Take(paginationOptions.ItemsPerPage).ToHashSet();

Because the Guid-column is your primary-key-column, you should just use aList<Guid>. You won't need to check ifreader.GetGuid(0) exists because that just can't happen.

I wouldn't create the formerHashSet<Guid> by using the capacity ofpaginationOptions.ItemsPerPage but would use a fixed number which should be in the lower range but not too low. Because if you use this overloaded constructor, then you should use it in a way that it scales good. Having a low number for capacity, you may reach the point fast that there aren't any slots left meaning theHashSet needs to be resized. If the number for capacity is too high it may take more time at initialization because its createing two array with the length of the passed capacity.

Take a look at thereference-source forHashSet<T>

This applies for aList<T> as well.

To come to an end I would change the code a bove to

IDataReader reader = command.ExecuteReader();List<Guid> guids = new List<Guid>(2000);while (reader.Read()){    guids.Add(reader.GetGuid(0));}ISet<Guid> filteredGuids = guids.Skip(offset).Take(paginationOptions.ItemsPerPage).ToHashSet();

but would extract the2000 to a meaningful named constant.

Thanks for the hint, but If the query has a join with, the result could include two rows with the same id and different joined values which are neede for filtering. That's why I used a ISet. I could use a IList, but in this case Id need the checkif contains...`.
I've just tried to change fromISet<Guid> guids = new HashSet<Guid>(paginationOptions.ItemsPerPage); toList<Guid> guids = new List<Guid>(2000);. For the same query the first takes 500ms, the latter 12sec.

Movatterモバイル変換

Stack Exchange Network

Database dynamic query

2 Answers2

You mustlog in to answer this question.

Linked

Related

Hot Network Questions

Subscribe to RSS