2

We a running an instance ofApache syncope with around 130k users (ca. 33k withactive orpending state). When searching for users by attributes we see long query times often over 50 seconds. One example query might be

select * from user_search_attr where schema_id = 'familyName' and stringvalue = 'Bergmann'

user_search_attr is a view defined as follows:

CREATE OR REPLACEALGORITHM = UNDEFINED VIEW `user_search_attr` ASselect    `u`.`id` AS `any_id`,    `attrs`.`schema_id` AS `schema_id`,    `attrs`.`booleanvalue` AS `booleanvalue`,    `attrs`.`datevalue` AS `datevalue`,    `attrs`.`doublevalue` AS `doublevalue`,    `attrs`.`longvalue` AS `longvalue`,    `attrs`.`stringvalue` AS `stringvalue`from    (`SyncopeUser` `u`left join JSON_TABLE(coalesce(`u`.`plainAttrs`, '[]'), '$[*]'    COLUMNS (`schema_id` varchar(255) PATH '$.schema',    NESTED PATH '$.values[*]'    COLUMNS (`booleanvalue` int(11) PATH '$.booleanValue',             `datevalue` varchar(32) PATH '$.dateValue',             `doublevalue` double PATH '$.doubleValue',             `longvalue` bigint(20) PATH '$.longValue',             `stringvalue` varchar(255) PATH '$.stringValue'))) `attrs` on    (1 = 1))where    `attrs`.`schema_id` is not null    and (`attrs`.`booleanvalue` is not null        or `attrs`.`datevalue` is not null        or `attrs`.`doublevalue` is not null        or `attrs`.`longvalue` is not null        or `attrs`.`stringvalue` is not null);

As you can see the data comes from the tableSyncopeUser which is defined as follows:

CREATE TABLE `SyncopeUser` (  `id` varchar(36) NOT NULL,  `creationContext` varchar(255) DEFAULT NULL,  `creationDate` datetime(3) DEFAULT NULL,  `creator` varchar(255) DEFAULT NULL,  `lastChangeContext` varchar(255) DEFAULT NULL,  `lastChangeDate` datetime(3) DEFAULT NULL,  `lastModifier` varchar(255) DEFAULT NULL,  `status` varchar(255) DEFAULT NULL,  `changePwdDate` datetime(3) DEFAULT NULL,  `cipherAlgorithm` varchar(20) DEFAULT NULL,  `failedLogins` int(11) DEFAULT NULL,  `lastLoginDate` datetime(3) DEFAULT NULL,  `mustChangePassword` int(11) DEFAULT NULL,  `password` varchar(255) DEFAULT NULL,  `passwordHistory` text DEFAULT NULL,  `securityAnswer` varchar(255) DEFAULT NULL,  `suspended` int(11) DEFAULT NULL,  `token` text DEFAULT NULL,  `tokenExpireTime` datetime(3) DEFAULT NULL,  `username` varchar(255) DEFAULT NULL,  `plainAttrs` longtext CHARACTER SET utf8mb4 COLLATE utf8mb4_bin              DEFAULT NULL CHECK (json_valid(`plainAttrs`)),  `REALM_ID` varchar(36) DEFAULT NULL,  `SECURITYQUESTION_ID` varchar(36) DEFAULT NULL,  PRIMARY KEY (`id`),  UNIQUE KEY `U_SYNCPSR_USERNAME` (`username`),  UNIQUE KEY `SyncopeUser_username` (`username`),  KEY `SECURITYQUESTION_ID` (`SECURITYQUESTION_ID`),  KEY `SyncopeUser_realm_id` (`REALM_ID`),  CONSTRAINT `SyncopeUser_ibfk_1` FOREIGN KEY (`REALM_ID`)            REFERENCES `Realm` (`id`),  CONSTRAINT `SyncopeUser_ibfk_2` FOREIGN KEY (`SECURITYQUESTION_ID`)           REFERENCES `SecurityQuestion` (`id`)) ENGINE=InnoDB DEFAULT CHARSET=utf8mb3 COLLATE=utf8mb3_general_ci;

The relevant columnplainAttrs contains the users attibutes as JSON string. One example might be[{"uniqueValue":{"stringValue":"123456789"},"schema":"lockSystemId"},{"values":[{"stringValue":"Bergmann"}],"schema":"familyName"}].

The following indexes are defined onSyncopeUser:

Table      |Non_unique|Key_name            |Seq_in_index|Column_name        |Collation|Cardinality|Sub_part|Packed|Null|Index_type|Comment|Index_comment|Ignored|-----------+----------+--------------------+------------+-------------------+---------+-----------+--------+------+----+----------+-------+-------------+-------+SyncopeUser|         0|PRIMARY             |           1|id                 |A        |     149635|        |      |    |BTREE     |       |             |NO     |SyncopeUser|         0|U_SYNCPSR_USERNAME  |           1|username           |A        |     149635|        |      |YES |BTREE     |       |             |NO     |SyncopeUser|         0|SyncopeUser_username|           1|username           |A        |     149635|        |      |YES |BTREE     |       |             |NO     |SyncopeUser|         1|SECURITYQUESTION_ID |           1|SECURITYQUESTION_ID|A        |          1|        |      |YES |BTREE     |       |             |NO     |SyncopeUser|         1|SyncopeUser_realm_id|           1|REALM_ID           |A        |          1|        |      |YES |BTREE     |       |             |NO     |

As most normal users have over 15 attributes theuser_search_attr view contains over 2M rows.

The servers are VMs with currently 8 GB of RAM and 6 CPUs but this can be changed. We are currently running version 11.4.7 of MariaDB in a galera cluster of 3 Servers. Syncope is only connected to one of the servers as this cluster is also used for other applications and we don't want to influnce these other applications.

Here are some of the IMHO relevant configuration options:

[mysqld]aria_pagecache_buffer_size = 32Mbasedir = /usrbind-address = ::binlog_format = ROWcharacter_set_server = utf8collation_server = utf8_general_ciinnodb_adaptive_hash_index = OFFinnodb_autoinc_lock_mode = 2innodb_buffer_pool_instances = 1innodb_buffer_pool_size = 4096Minnodb_flush_log_at_trx_commit = 0innodb_log_buffer_size = 16MBinnodb_log_file_size = 512Mkey_buffer_size = 1Mlog-error = /var/log/mariadb/mariadb.loglog_queries_not_using_indexes = OFFlong_query_time = 4.0max_allowed_packet = 128Mmax_binlog_size = 100Mmax_connections = 400max_heap_table_size = 256Mperformance_schema = ONquery_cache_limit = 16Mquery_cache_size = 0query_cache_type = OFFskip-external-lockingskip-name-resolve = 1slow_query_log = 0slow_query_log_file = /var/log/mariadb/slow.logsort_buffer_size = 4Mtable_definition_cache = 800table_open_cache = 1000thread_cache_size = 8thread_stack = 256Ktmp_table_size = 256Mwsrep_on = ONwsrep_provider = /usr/lib64/galera-4/libgalera_smm.so

As most queries only search foractive orpending users one option that I have not tried is indexing or partitioning the table for thestatus column.

Currently we are not in the position of changing the design of the database or even switching to another IDM sollution. We hope to find optimazion potential on the DB layer or maybe some improvements on some queries.

Rohit Gupta's user avatar
Rohit Gupta
2,1688 gold badges20 silver badges26 bronze badges
askedAug 3 at 6:44
Clemens Bergmann's user avatar
5
  • Whyvarchar(255) on almost all columns? For example It should be better to use anenum for the column status with possible valuesactive,pending,not-active (and also add this to the index when your only want active or pending users.)CommentedAug 3 at 8:31
  • thanks for the question @Luuk. I am not the designer of the software but only a user. I think that the reason for thevarchar(255) in the case of thestatus column is that the values are not fixed by design. The status of a user is determined by a workflow and the status is the state the workflow ends up in after changes to a user.CommentedAug 3 at 10:38
  • "I am not the designer of the software but only a user" Fro slow queries you should also contact the designer, dis you do that ?CommentedAug 3 at 11:05
  • Hi, yes. I posted a link to this question in the developer mailinglist. Maybe they can contribute here to.CommentedAug 3 at 13:06
  • 1
    refmailing list post.ANALYZE FORMAT=JSON SELECT * from user_search_attr ... will show a table scan. Suggestions for view: 1) JOIN (not LEFT), 2) dateValue is date(time) type 3) WHERE criteria in view - make part of tablecheck constraint.CommentedAug 4 at 3:41

1 Answer1

0

JSON_TABLE unfortunately isn't a Magicly indexed construct that copes well yet with large searches.

Its possible to create a materialised table that is maintained with triggers that will be fast to search.

Step 1: Create the table

CREATE OR REPLACE TABLE `SyncopeAttrs` (`any_id` varchar(36) NOT NULL, `schema_id` varchar(255), `booleanvalue` bool, `datevalue` bigint, `doublevalue` DOUBLE, `longvalue` bigint, `stringvalue` varchar(255), key(id, schema_id), key(schema_id)) ASSELECT u.id,       attrs. schema_id,       attrs.`booleanvalue`,       `attrs`.`datevalue`,       `attrs`.`doublevalue`,       `attrs`.`longvalue`,       `attrs`.`stringvalue`FROM `SyncopeUser` `u`LEFT JOIN JSON_TABLE(coalesce(`u`.`plainAttrs`, '[]'), '$[*]'  COLUMNS (`schema_id` varchar(255) PATH '$.schema',  NESTED PATH '$.values[*]' COLUMNS (    `booleanvalue` bool PATH '$.booleanValue',    `datevalue` bigint PATH '$.dateValue',    `doublevalue` DOUBLE PATH '$.doubleValue',    `longvalue` bigint(20) PATH '$.longValue',    `stringvalue` varchar(255) PATH '$.stringValue'))) `attrs`;

This is a representation of the table at the time of the query:

Step 2: Create Triggers

A DELETE, INSERT and UPDATE triggers are required to keep the table in sync

Step 2a: Create the Update trigger

Note this will only trigger on the applicable fields of the table. I don't expect forid to the the column that changes but let us account for that anyway.

DELIMITER //CREATE TRIGGER trg_syncopeuser_updateAFTER UPDATE ON SyncopeUser OF plainAttrs, idFOR EACH ROWBEGIN    DELETE FROM SyncopeAttrs WHERE id = NEW.id;    DELETE FROM SyncopeAttrs WHERE id = OLD.id;    INSERT INTO SyncopeAttrs (      any_id, schema_id,      booleanvalue, datevalue, doublevalue, longvalue, stringvalue    )    SELECT      NEW.id, jt.schema_id,      jt.booleanvalue,      jt.datevalue,      jt.doublevalue,      jt.longvalue,      jt.stringvalue    FROM JSON_TABLE(      COALESCE(NEW.plainAttrs, '[]'),      '$[*]'      COLUMNS (        schema_id VARCHAR(255) PATH '$.schema',        NESTED PATH '$.values[*]'        COLUMNS (          booleanvalue INT PATH '$.booleanValue',          datevalue BIGINT PATH '$.dateValue',          doublevalue DOUBLE PATH '$.doubleValue',          longvalue BIGINT PATH '$.longValue',          stringvalue VARCHAR(255) PATH '$.stringValue'        )      )    ) AS jt;  END IF;END;//

2.INSERT

CREATE TRIGGER trg_syncopeuser_insertAFTER INSERT ON SyncopeUserFOR EACH ROWBEGINDELETE FROM SyncopeAttrs WHERE id = NEW.id;INSERT INTO SyncopeAttrs (any_id, schema_id, booleanvalue, datevalue, doublevalue, longvalue, stringvalue)  SELECT NEW.id, jt.schema_id, jt.booleanvalue,    jt.datevalue,    jt.doublevalue, jt.longvalue, jt.stringvalue  FROM JSON_TABLE(    COALESCE(NEW.plainAttrs, '[]'),    '$[*]'    COLUMNS (      schema_id VARCHAR(255) PATH '$.schema',      NESTED PATH '$.values[*]'      COLUMNS (        booleanvalue INT PATH '$.booleanValue',        datevalue BIGINT PATH '$.dateValue',        doublevalue DOUBLE PATH '$.doubleValue',        longvalue BIGINT PATH '$.longValue',        stringvalue VARCHAR(255) PATH '$.stringValue'      )    )  ) AS jt;END;

2.DELETE

CREATE TRIGGER trg_syncopeuser_deleteBEFORE DELETE ON SyncopeUserFOR EACH ROW  DELETE FROM SyncopeAttrs WHERE any_id = OLD.id;

Step 3 Replace the view

CREATE OR REPLACEALGORITHM = UNDEFINED VIEW `user_search_attr` ASselect    `any_id` AS `any_id`,    `schema_id` AS `schema_id`,    `booleanvalue` AS `booleanvalue`,    `datevalue` AS `datevalue`,    `doublevalue` AS `doublevalue`,    `longvalue` AS `longvalue`,    `stringvalue` AS `stringvalue`from   SyncopeAttrs attrs;

After this queries on the attributes of the table will be faster because they are using an indexed real table.

Note:booleanvalue change tobool for my compulsive type enforcement disorder :-)

Edit 1:datevalue in Syncope appears to a be a millisecond epoch so used a bigint for a lossless compatible form (seemed better thanvarchar), but date/datetime couldn't be used.

Edit 2: normalised naming so the view is pretty consistent with table name columns. Is possible to make the table as the view nameuser_search_attr

ref:https://sqlize.online/sql/mariadb118/51f5a71cfc6670d06c566f0db6ade4f2/

answeredAug 4 at 5:24
danblack's user avatar
4
  • That looks like a good idea. I try this in the testing environment and report back. This might take until next week as I am on vacation this week. One more question: what is the reason for using another table and not replacing the view with a table?CommentedAug 5 at 4:47
  • quite right, it should be able to use a table instead of a view. Theid is replaced withany_id which might be important for Syncope. Alsoschema_id might need to be nullable if there isn't any attributes for an id. Feedback (edits if able) welcome and I'll incorporate it into the answer. Have a good vacation.CommentedAug 5 at 5:40
  • Hi, I was not happy about the fact that the code contains a variation of the initial view in three places. This makes it hard to maintain should the view change. Also there is not only this JSON_TABLE view but 6 in total so the code could get quite large and repetative. One solution could be to keep the view around (renaming it touser_search_attr_view) and replace the inserts in the triggers with something likeINSERT INTO user_search_attr SELECT * FROM user_search_attr_view where any_id = NEW.id;. Why is this a bad idea?CommentedAug 10 at 7:36
  • I was seeing the construction ofSyncopeAttrs as a way that the behaviour could be validated before replacinguser_search_attr. Renaming the view first and then making the 1 table and 3 triggers sounds perfectly reasonable to allow the removal of the old view simply at the end step. The original view included a reference toSyncopeUser so a trigger on aSyncopeUser could be selecting for a view that is on the same row. I suspect there could be an error when this occurs.CommentedAug 10 at 8:11

Your Answer

Sign up orlog in

Sign up using Google
Sign up using Email and Password

Post as a guest

Required, but never shown

By clicking “Post Your Answer”, you agree to ourterms of service and acknowledge you have read ourprivacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.