rachit chokshi
2024-03-04 09:23:39 UTC
Hello,
We have a setup where the kerberos database (db2) is hosted on an NFS
server. There are multiple KDC servers each mounting the NFS share and
serving traffic.
For replicating data into the NFS hosted database from an external master
KDC. We have a sync job setup that runs "kdb5_util load" against the NFS
hosted database every few minutes (~5m)
Approximately once every month, we experience a corruption scenario where
the "kdb5_util load" starts crashing with the below error strings.
newly loaded database live
bad database /var/kerberos/krb5kdc_shared/principal
After the system enters into this state. There is a complete outage.
Existing running KDCs processes are unable to access the database (Cannot
open DB2 database). Only way to recover is to delete the database and
create a new one from the dump.
It would be a great help, If anybody can help us understand where things
are going wrong and what can be done to avoid this situation. Tried going
through the code, no pointers found so far.
Thank you,
Rachit
We have a setup where the kerberos database (db2) is hosted on an NFS
server. There are multiple KDC servers each mounting the NFS share and
serving traffic.
For replicating data into the NFS hosted database from an external master
KDC. We have a sync job setup that runs "kdb5_util load" against the NFS
hosted database every few minutes (~5m)
Approximately once every month, we experience a corruption scenario where
the "kdb5_util load" starts crashing with the below error strings.
kdb5_util: Cannot open DB2 database
'/var/kerberos/krb5kdc_shared/principal': Invalid argument >while makingnewly loaded database live
kdb5_util: Cannot open DB2 database
'/var/kerberos/krb5kdc_shared/principal~': Invalid >argument while deletingbad database /var/kerberos/krb5kdc_shared/principal
After the system enters into this state. There is a complete outage.
Existing running KDCs processes are unable to access the database (Cannot
open DB2 database). Only way to recover is to delete the database and
create a new one from the dump.
It would be a great help, If anybody can help us understand where things
are going wrong and what can be done to avoid this situation. Tried going
through the code, no pointers found so far.
Thank you,
Rachit