面试题：MongoDB恢复数据时处理数据冲突的策略

备份现有数据
- 在处理冲突前，先对目标数据库中同名集合的现有数据进行备份，以防处理过程中出现意外数据丢失。可以使用mongodump命令对相关集合进行备份。例如：mongodump --uri="mongodb://<host>:<port>/<database>?authSource=admin" --collection=<collection_name> --out=/path/to/backup
覆盖策略
- 如果确定备份数据的准确性更高，可以选择覆盖目标数据库中同名集合的数据。在恢复备份数据时，使用--drop选项，mongorestore命令会先删除目标集合，然后再从备份文件中恢复数据。例如：mongorestore --uri="mongodb://<host>:<port>/<database>?authSource=admin" --drop /path/to/backup/<database>/<collection_name>.bson
合并策略
- 基于唯一标识合并：如果数据集合中有唯一标识字段（如_id ），可以编写脚本（如使用Python的pymongo库）遍历备份数据和目标数据库中的数据。对于备份数据中存在而目标数据库中不存在的文档，直接插入；对于都存在的文档，根据业务逻辑决定如何合并字段值。

import pymongo

# 连接源数据库（备份数据所在数据库）
source_client = pymongo.MongoClient("mongodb://<source_host>:<source_port>/")
source_db = source_client["<source_database>"]
source_collection = source_db["<collection_name>"]

# 连接目标数据库
target_client = pymongo.MongoClient("mongodb://<target_host>:<target_port>/")
target_db = target_client["<target_database>"]
target_collection = target_db["<collection_name>"]

for doc in source_collection.find():
    existing_doc = target_collection.find_one({"_id": doc["_id"]})
    if existing_doc:
        # 这里可以根据业务逻辑合并字段，例如更新特定字段
        target_collection.update_one({"_id": doc["_id"]}, {"$set": {"field_to_update": doc["field_to_update"]}})
    else:
        target_collection.insert_one(doc)

- **使用聚合和批量操作**：可以利用MongoDB的聚合框架对备份数据和目标数据进行分析，然后使用`bulk_write`进行批量插入或更新操作，以提高效率。

import pymongo

source_client = pymongo.MongoClient("mongodb://<source_host>:<source_port>/")
source_db = source_client["<source_database>"]
source_collection = source_db["<collection_name>"]

target_client = pymongo.MongoClient("mongodb://<target_host>:<target_port>/")
target_db = target_client["<target_database>"]
target_collection = target_db["<collection_name>"]

# 聚合获取备份数据中需要插入或更新的文档
pipeline = [
    {
        "$lookup": {
            "from": target_collection.name,
            "localField": "_id",
            "foreignField": "_id",
            "as": "existing_doc"
        }
    },
    {
        "$match": {
            "existing_doc": {"$eq": []}
        }
    }
]
docs_to_insert = list(source_collection.aggregate(pipeline))

# 批量插入新文档
if docs_to_insert:
    target_collection.insert_many(docs_to_insert)

# 处理需要更新的文档
update_ops = []
for doc in source_collection.find():
    existing_doc = target_collection.find_one({"_id": doc["_id"]})
    if existing_doc:
        update_op = pymongo.UpdateOne({"_id": doc["_id"]}, {"$set": {"field_to_update": doc["field_to_update"]}})
        update_ops.append(update_op)

if update_ops:
    target_collection.bulk_write(update_ops)

手动检查和处理
- 如果数据量较小或者业务逻辑复杂，人工难以通过自动化方式处理冲突，可以导出目标数据库和备份数据中的同名集合数据（例如导出为CSV文件），然后使用数据分析工具（如Excel、OpenOffice Calc等）对数据进行比对和处理，最后将处理好的数据重新导入到目标数据库中。

面试题：MongoDB恢复数据时处理数据冲突的策略

知识考点

面试题答案