- Notifications
You must be signed in to change notification settings - Fork16
Resolve issues #5 and #1: reduce number of collisions in the ptrack map#6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to ourterms of service andprivacy statement. We’ll occasionally send you account related emails.
Already on GitHub?Sign in to your account
Uh oh!
There was an error while loading.Please reload this page.
Conversation
…slots.Previously we thought that 1 MB can track changes page-to-page in the 1 GB ofdata files. However, recently it became evident that our ptrack map or basichash table behaves more like a Bloom filter with a number of hash functions k = 1.See more here:https://en.wikipedia.org/wiki/Bloom_filter#Probability_of_false_positives.Such filter has naturally more collisions.By storing update_lsn of each block in the additional slot we perform asa Bloom filter with k = 2, which significatly reduces collision rate.
4c32e9e
to38aa439
CompareAlso bump extversion to 2.2
ptrack.c Outdated
update_lsn1 = pg_atomic_read_u64(&ptrack_map->entries[slot1]); | ||
update_lsn2 = pg_atomic_read_u64(&ptrack_map->entries[slot2]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
It is better to fetch and check slot1 first, and only if check passed then fetch and check slot2.
This way you will save TLB and cache misses for slot2 for most of page items.
Note that compiler could not optimize/reorder atomic instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
OK, I hope that I did it
Probe the second slot only if the first one succeded.
ptrack--2.1--2.2.sql Outdated
FROM | ||
(SELECT count(path) AS changed_files, | ||
sum( | ||
length(replace(right((pagemap)::text, -1)::varbit::text, '0', '')) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Если таблицы 8TB, то вот эта строчка потребует выделение 1GB памяти для преобразования::varbit::text
.
Соответственно, таблица 16TB потребует уже 2GB памяти, и постгресс просто сам не позволит этого сделать.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Это очень грустно, что varbit не имеет функции countbits.
funny-falconMay 13, 2021 • edited
Loading Uh oh!
There was an error while loading.Please reload this page.
edited
Uh oh!
There was an error while loading.Please reload this page.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
В любом случае, для ptrack_get_change_stat и ptrack_get_change_file_stat кажется нужно создать ptrack_get_pagecount (ну или другое название).
Или даже просто реализовать ptrack_get_change_file_stat полностью в сишке.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Таблицы же разбиты на сегменты по 1 ГБ дефолтно, а ptrack_get_pagemapset() выдаёт изначально битмапы per file/segment, то есть потребуется максимум в 1000 раз меньше памяти на каждое преобразование. Разве нет?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
А ок. Я ещё не посмотрел ptrack_get_pagemapset() .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Слушай, но я бы всё равно поменял бы ptrack_get_pagemapset, добавив поле count в вывод.
pg_probackup при этом не поломается, т.к. он указывает поля, которые хочет.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Сделал
/* Delete and try again */ | ||
durable_unlink(ptrack_path, LOG); | ||
is_new_map = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Не могу найти, где делается unmap в этом случае?
При этом сразу после меткиptrack_map_reinit
делаетсяdurable_unlink(ptrack_mmap_path)
.
В итоге, этот файл повисает невидимкой в файловой системе, и в адрессном пространстве процесса повисает его mmap.
Наверное есть смысл позвать здесьptrackCleanFilesAndMap
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others.Learn more.
Да, похоже на то. Я сомневался в этом месте, но потом забыл и не разобрался до конца
Everything seems to be working, so I'm merging this one. If the internal QA finds out anything, we will fix it in |
Uh oh!
There was an error while loading.Please reload this page.
Resolve#5
Resolve#1