Google Cloud Platform上的MongoDB：入门与最佳实践 – wiki词典

“`markdown
MongoDB on Google Cloud Platform: Getting Started and Best Practices

Introduction

In the world of modern applications, data is king, and how it’s stored and managed can significantly impact performance, scalability, and developer agility. MongoDB, a leading NoSQL document database, offers a flexible, scalable, and high-performance solution for handling diverse data types. Its document-oriented model allows for rapid iteration and schema evolution, making it a favorite for agile development teams.

Running MongoDB on Google Cloud Platform (GCP) combines the power and flexibility of MongoDB with GCP’s robust, global, and highly scalable infrastructure. This synergy provides numerous benefits, including unparalleled scalability, global reach, and seamless integration with a vast ecosystem of GCP services. Whether you’re building a new application or migrating an existing one, leveraging MongoDB on GCP can unlock new levels of performance, reliability, and operational efficiency.

This article will guide you through the various options for deploying MongoDB on GCP, from fully managed services to self-managed setups. We’ll cover the essential steps to get started and dive deep into best practices for optimizing performance, ensuring security, maximizing reliability, and managing costs effectively within the GCP environment.

MongoDB Deployment Options on GCP

When deploying MongoDB on Google Cloud Platform, you generally have two primary approaches: leveraging the fully managed MongoDB Atlas service or self-managing your MongoDB instances on GCP’s Compute Engine. Each option offers distinct advantages depending on your operational needs, control requirements, and budget.

Managed Service (MongoDB Atlas on GCP)

MongoDB Atlas is the official database-as-a-service (DBaaS) from MongoDB, offering a fully managed, global cloud database. Running Atlas on GCP means you benefit from MongoDB’s expertise in database operations combined with Google Cloud’s robust infrastructure.

Overview: Atlas handles all the operational heavy lifting, including provisioning, patching, backups, scaling, high availability, and monitoring. It allows developers and operations teams to focus on building applications rather than managing complex database infrastructure.
Benefits:
- Fully Managed: Eliminates the need for manual database administration tasks.
- Automated Operations: Includes automated backups, restores, and scaling operations.
- High Availability: Built-in replica sets and automated failover ensure continuous uptime.
- Security Features: Offers comprehensive security with network isolation, encryption at rest and in transit, IP whitelisting, and robust authentication mechanisms.
- Global Distribution: Easily deploy clusters across multiple GCP regions for low-latency access and disaster recovery.
- Seamless Integration: Integrates well with other GCP services and development tools.
How to Get Started:
1. Create an Atlas Account: Sign up on the MongoDB Atlas website.
2. Link to GCP: When creating a new project or cluster in Atlas, select “Google Cloud” as your preferred cloud provider. Atlas provides seamless integration with your GCP account.
3. Deploy a Cluster:
  - Choose your desired GCP region(s) and instance size based on your performance and storage requirements.
  - Select your MongoDB version.
  - Configure features like sharding (for horizontal scalability) and backup policies.
  - Atlas will provision and configure your replica set or sharded cluster automatically.
4. Configure Network Access: Set up IP whitelist entries or configure VPC Peering/Private Link to allow your applications to securely connect to the Atlas cluster.

Self-Managed MongoDB on GCP Compute Engine

For organizations that require a higher degree of control over their database environment, have specific compliance needs, or want to optimize costs by managing resources directly, self-managing MongoDB on GCP Compute Engine is a viable option.

Overview: This approach involves provisioning virtual machines (VMs) on GCP and manually installing, configuring, and maintaining your MongoDB instances. While it offers maximum flexibility, it also shifts the responsibility for operations, patching, backups, and scalability entirely to your team.
Steps to Self-Manage MongoDB:
1. Provision Compute Engine Instances (VMs):
  - Launch a set of N1, N2, E2, or C2 series VMs (e.g., 3 instances for a basic replica set) in the desired GCP region(s) and zones.
  - Choose machine types that align with your expected workload (CPU, RAM). For example, memory-optimized machine types are often beneficial for MongoDB.
2. Choosing Appropriate Storage:
  - Attach Persistent Disks to your VMs. For production databases, use SSD Persistent Disks for better I/O performance.
  - Consider the disk size and provisioned IOPS to meet your throughput requirements.
  - For optimal performance, stripe multiple Persistent Disks together using LVM or similar tools, or leverage larger, faster disks.
3. Networking (VPC, Firewall Rules):
  - Create a Virtual Private Cloud (VPC) network to isolate your database servers.
  - Configure firewall rules to allow incoming connections only from trusted sources (e.g., your application servers) and necessary MongoDB ports (default: 27017).
  - Use internal IP addresses for communication between MongoDB nodes and application servers within your VPC.
4. Installing MongoDB:
  - SSH into each VM instance.
  - Follow the official MongoDB installation guide for your chosen operating system (e.g., Ubuntu, CentOS).
  - Configure mongod.conf with appropriate settings for data directory, logging, security, and replica set name.
5. Configuring Replica Sets/Sharding:
  - Replica Set: Initialize a replica set across your instances to ensure high availability and data redundancy. This involves starting mongod with the --replSet option and then initiating the replica set from the MongoDB shell.
  - Sharding: For extremely large datasets or high write throughput, configure a sharded cluster. This involves setting up config servers, mongos routers, and shard replica sets. This is a significantly more complex setup.
6. Implement Monitoring and Backups: Set up monitoring agents (e.g., GCP Cloud Monitoring, Prometheus, Grafana) and a robust backup strategy (e.g., disk snapshots, mongodump to Cloud Storage).

Getting Started: Common Steps

Regardless of whether you choose MongoDB Atlas or a self-managed deployment on Compute Engine, certain fundamental steps are common for connecting to and interacting with your MongoDB database.

Connection

Establishing a secure and efficient connection to your MongoDB instance is the first critical step.

Connection Strings:
MongoDB clients (drivers) typically use a connection string (URI) to connect to the database. This string specifies the host(s), port, authentication credentials, and various connection options.
- For MongoDB Atlas: Atlas provides a ready-to-use connection string directly from its UI for your cluster. It usually looks something like this:
  mongodb+srv://<username>:<password>@<cluster-name>.mongodb.net/<database>?retryWrites=true&w=majority
  You’ll replace <username>, <password>, and <cluster-name> with your specific details. The +srv indicates a DNS Seed List connection, simplifying connections to replica sets.
- For Self-Managed MongoDB: If you have a single mongod instance, the string might be:
  mongodb://<host>:<port>/<database>
  For a replica set (recommended for production), it would list all members:
  mongodb://<host1>:<port1>,<host2>:<port2>,<host3>:<port3>/<database>?replicaSet=<replicaSetName>&readPreference=primaryPreferred
  Remember to include authentication details (<username>:<password>@) if authentication is enabled (which it should be!).
Security Group/Firewall Considerations:
This is paramount for securing your database.
- MongoDB Atlas: You must configure IP Access Lists within the Atlas console. This allows you to whitelist specific IP addresses or CIDR blocks (e.g., your application servers’ public IPs, your development machine’s IP) that are permitted to connect to your Atlas cluster. For enhanced security and lower latency, consider VPC Peering or Private Link to establish a private connection between your GCP VPC and your Atlas VPC.
- Self-Managed MongoDB on Compute Engine: You must configure GCP Firewall Rules for the Compute Engine instances running MongoDB.
  - Create a firewall rule that allows ingress traffic on the MongoDB port (default 27017) only from the internal IP addresses of your application servers’ subnets or specific external IPs if your applications are outside GCP.
  - Never open port 27017 to the entire internet (0.0.0.0/0) unless absolutely necessary and with extreme caution, using strong authentication and encryption.
  - Ensure internal communication between replica set members is allowed.

Basic Operations

Once connected, you can perform standard database operations. Here’s a brief overview using the MongoDB Shell (mongosh).

Connecting via mongosh:
bash mongosh "mongodb+srv://<username>:<password>@<cluster-name>.mongodb.net/<database>?retryWrites=true&w=majority" # Or for self-managed: mongosh "mongodb://<username>:<password>@<host>:<port>/<database>"
Creating Databases and Collections:
In MongoDB, databases and collections are created implicitly when you first insert data into them.
“`javascript
// Switch to (or create) a database named ‘mydatabase’
use mydatabase;

// Insert a document into (or create) a collection named ‘users’
db.users.insertOne({ name: “Alice”, age: 30, city: “New York” });

// You can explicitly create a collection with options if needed
// db.createCollection(“products”, { capped: true, size: 1048576 });
“`
CRUD Operations (Create, Read, Update, Delete):
- Create (Insert):
  javascript db.users.insertOne({ name: "Bob", age: 25, city: "London" }); db.users.insertMany([ { name: "Charlie", age: 35, city: "Paris" }, { name: "Diana", age: 28, city: "Berlin" } ]);
- Read (Query):
  “`javascript
  // Find all documents in ‘users’ collection
  db.users.find();
  
  // Find documents matching a condition
  db.users.find({ age: { $gt: 29 } }); // Users older than 29
  
  // Find one document
  db.users.findOne({ name: “Alice” });
  “`
- Update:
  “`javascript
  // Update one document
  db.users.updateOne(
  { name: “Alice” },
  { $set: { age: 31, status: “active” } }
  );
  
  // Update multiple documents
  db.users.updateMany(
  { city: “New York” },
  { $inc: { age: 1 } } // Increment age by 1
  );
  “`
- Delete:
  “`javascript
  // Delete one document
  db.users.deleteOne({ name: “Bob” });
  
  // Delete multiple documents
  db.users.deleteMany({ status: “inactive” });
  
  // Delete all documents in a collection (but keep the collection itself)
  // db.users.deleteMany({});
  
  // Drop an entire collection
  // db.users.drop();
  
  // Drop the current database
  // db.dropDatabase();
  “`

Best Practices for MongoDB on GCP

Optimizing your MongoDB deployment on Google Cloud Platform involves careful consideration of performance, security, reliability, and cost. Adhering to best practices in these areas will ensure your database is robust, efficient, and well-protected.

Performance & Scalability

Achieving optimal performance and scalability in MongoDB is crucial for responsive applications.

Indexing Strategy:
- Identify Query Patterns: Analyze your application’s read operations (find, sort, aggregate) to determine which fields are frequently queried or used in sorting.
- Create Indexes: Use db.collection.createIndex() for single-field, compound, multi-key, text, or geospatial indexes.
- Monitor Index Usage: Regularly use db.collection.getIndexes() and explain() to understand index performance and identify unused or redundant indexes. Over-indexing can hurt write performance.
- Partial Indexes: For collections with varying document structures or common conditions, partial indexes can reduce index size and improve performance.
Schema Design Considerations:
- Embedded Documents: Embed related data where data is frequently accessed together and one-to-few relationships exist. This reduces the number of queries and improves read performance. Example: order document embedding line_items.
- Referenced Documents: Use references for large, frequently updated, or many-to-many relationships. This avoids large documents and potential data duplication, but requires additional queries (joins at the application level). Example: posts referencing users.
- Avoid Large Arrays: Storing excessively large arrays within documents can degrade performance. Consider breaking them out or using application-level pagination.
- Document Size Limit: Keep in mind MongoDB’s 16MB document size limit.
Replica Sets for High Availability and Read Scaling:
- Always use a Replica Set: For any production workload, deploy MongoDB as a replica set (minimum 3 members). This provides automatic failover and data redundancy.
- Read Preference: Configure your application’s read preference (e.g., primary, primaryPreferred, secondaryPreferred) to distribute read load across secondaries, especially for analytics or less-critical reads.
- Dedicated Secondaries: For heavy analytical queries, consider adding dedicated secondary members that applications can read from, offloading the primary.
Sharding for Horizontal Scalability:
- When to Shard: Implement sharding when a single replica set can no longer handle your data volume or write throughput. It distributes data across multiple independent replica sets (shards).
- Choose a Good Shard Key: The choice of shard key is critical. A good shard key promotes even data distribution and allows for efficient query routing. Avoid keys that lead to hot spots (e.g., monotonically increasing values). Consider a hashed shard key for uniform distribution or a compound shard key if queries often filter on multiple fields.
- Pre-splitting and Tag-aware Sharding: For highly predictable workloads, consider pre-splitting chunks or using tag-aware sharding to control data placement.
Monitoring:
- MongoDB Atlas Monitoring: Atlas provides comprehensive, built-in monitoring dashboards for CPU, memory, connections, query performance, and more. Set up alerts for critical metrics.
- GCP Monitoring (Operations Suite):
  - For Self-Managed: Install Cloud Monitoring agents on your Compute Engine instances to collect system metrics (CPU, memory, disk I/O, network).
  - Integrate MongoDB logs (mongod.log) with Cloud Logging for centralized log management and analysis.
  - Create custom dashboards and alerts for MongoDB-specific metrics (e.g., replication lag, number of open connections, slow query count).
- MongoDB Tools: Utilize mongostat (overview of cluster activity) and mongotop (per-collection usage) for real-time insights from the command line.
Choosing Correct Compute Engine Instance Types and Persistent Disk (for Self-Managed):
- Instance Types:
  - Memory-Optimized: MongoDB heavily relies on RAM for its working set. Prioritize memory-optimized machine types (e.g., n2d-highmem, e2-highmem) to keep your data in RAM as much as possible, reducing disk I/O.
  - Balanced: For workloads with moderate memory and CPU needs, n2-standard or e2-standard series can be cost-effective.
  - CPU-Optimized: Less common for core MongoDB unless you have extremely CPU-intensive aggregation pipelines.
- Persistent Disk:
  - SSD Persistent Disks: Always use SSD Persistent Disks for production MongoDB deployments due to their superior IOPS and throughput compared to standard HDDs.
  - Provisioned IOPS: Consider increasing the disk size or using balanced/performance SSDs to provision higher IOPS as needed.
  - Local SSDs: For extreme I/O performance requirements (e.g., caching or temporary storage), Local SSDs offer very high IOPS and low latency but are ephemeral (data is lost on VM restart). Use them carefully and only for data that can be easily rebuilt or replicated.

Security

Security is paramount for any database. Implement a defense-in-depth strategy.

Network Isolation:
- VPC and Firewall Rules: As discussed, use GCP VPC networks and strict firewall rules to restrict access to MongoDB instances only from authorized application servers or administrators.
- Private Link (for Atlas): Utilize MongoDB Atlas Private Link (powered by GCP Private Service Connect) to establish a private, secure connection between your GCP applications and Atlas cluster, avoiding the public internet entirely. This is a highly recommended security measure.
Authentication:
- Enable Authentication: Never run MongoDB without authentication. Use SCRAM (Salted Challenge Response Authentication Mechanism) for robust password-based authentication.
- x.509 Certificate Authentication: For enhanced security, especially for inter-node communication in a replica set or sharded cluster, consider using x.509 client certificate authentication.
Authorization (Role-Based Access Control – RBAC):
- Least Privilege: Grant users and applications only the minimum necessary privileges to perform their tasks.
- Built-in Roles: Leverage MongoDB’s rich set of built-in roles (e.g., read, readWrite, dbAdmin, clusterMonitor).
- Custom Roles: Create custom roles to define granular permissions tailored to your application’s needs.
Encryption:
- Encryption at Rest:
  - GCP Persistent Disk Encryption: All GCP Persistent Disks are encrypted at rest by default, providing a strong baseline.
  - MongoDB Enterprise Encryption: For self-managed deployments requiring additional compliance or control, MongoDB Enterprise offers native encryption at rest with Key Management System (KMS) integration (e.g., with GCP Cloud KMS).
  - Atlas Encryption: Atlas provides encryption at rest by default and allows you to use your own Cloud KMS keys if desired.
- Encryption in Transit (TLS/SSL): Always enable TLS/SSL for all client-server and inter-node communication. This encrypts data as it travels across the network, preventing eavesdropping. Atlas enables this by default. For self-managed, configure TLS in your mongod.conf.
Auditing:
- MongoDB Enterprise Auditing: For self-managed deployments, MongoDB Enterprise provides comprehensive auditing capabilities to track database events (logins, CRUD operations, etc.), essential for compliance and security monitoring.
- Atlas Auditing: Atlas includes advanced auditing features as part of its enterprise security package.
- GCP Cloud Audit Logs: Integrate MongoDB logs with Cloud Logging to get a centralized audit trail of activities.

Reliability & Disaster Recovery

Ensuring your data is always available and recoverable is critical.

Backups:
- MongoDB Atlas Automated Backups: Atlas provides continuous, point-in-time recovery backups with configurable retention policies, simplifying disaster recovery.
- Self-Managed Snapshots:
  - GCP Persistent Disk Snapshots: Take regular snapshots of your MongoDB data disks. These are incremental, cost-effective, and can be used to restore an entire disk.
  - mongodump to Cloud Storage: For logical backups, periodically run mongodump and store the output in a GCP Cloud Storage bucket. This allows for granular recovery of specific databases or collections.
  - oplog Backup: For point-in-time recovery with mongodump, ensure you also back up the oplog (operational log).
Multi-Region/Multi-Zone Deployments:
- High Availability: Deploy replica set members across different zones within a region (e.g., us-central1-a, us-central1-b, us-central1-c) to protect against single-zone outages.
- Disaster Recovery: For critical applications, consider a Global Cluster (Atlas) or self-managed replica sets with members in different GCP regions to withstand regional disasters. This adds complexity and cost but provides the highest level of resilience.
Testing Failovers: Regularly test your failover procedures for replica sets to ensure they work as expected. This includes simulating primary failures and observing secondary promotion.

Cost Optimization

Managing costs effectively while maintaining performance is a key benefit of cloud deployments.

Right-Sizing Instances:
- Monitor Usage: Continuously monitor CPU, memory, and disk I/O metrics to understand actual resource consumption.
- Adjust as Needed: Scale your Compute Engine instances (for self-managed) or Atlas tiers up or down based on observed usage rather than over-provisioning from the start. GCP’s custom machine types allow fine-grained control.
- Autoscaling (for applications, not MongoDB nodes): While you don’t typically autoscale MongoDB instances, ensure your application layer can autoscale to handle traffic spikes without overwhelming the database.
Choosing Appropriate Storage:
- SSD vs. HDD: Use SSD Persistent Disks for primary MongoDB data volumes, but consider standard HDD Persistent Disks for less critical data or backups if cost is a major concern (for self-managed).
- Optimize Disk Size: Only provision the necessary disk size to avoid paying for unused capacity. Remember, higher performance SSD Persistent Disks are often tied to size.
Leveraging Committed Use Discounts (for Self-Managed):
- For stable, long-running self-managed MongoDB deployments, commit to a 1-year or 3-year term for Compute Engine resources (VMs and certain storage types) to receive significant discounts.
Monitoring Usage to Avoid Over-Provisioning: Use GCP’s Billing Reports and Cost Management tools to track spending. Analyze usage patterns and identify areas where resources might be over-provisioned. Use Atlas cost explorer to understand your Atlas spending.

Integration with GCP Services

One of the significant advantages of running MongoDB on Google Cloud Platform is the ability to seamlessly integrate with GCP’s extensive suite of services. This integration can enhance everything from deployment and observability to security and data management.

Google Kubernetes Engine (GKE) for Containerized Deployments:
- Orchestration: GKE provides a powerful, managed Kubernetes environment for deploying, managing, and scaling containerized applications. While MongoDB itself can be deployed in containers, managing stateful applications like databases in Kubernetes requires careful consideration.
- StatefulSets: Kubernetes StatefulSets are ideal for deploying MongoDB replica sets on GKE, ensuring stable network identifiers and ordered scaling.
- Persistent Volumes: Use GCP Persistent Disks as Persistent Volumes for your MongoDB data, ensuring data persistence even if pods restart or reschedule.
- Cloud Marketplace: Solutions like the MongoDB Kubernetes Operator (available on Cloud Marketplace) can automate the deployment and management of MongoDB clusters on GKE, handling complex tasks like replica set configuration, sharding, and upgrades.
Cloud Storage for Backups:
- Cost-Effective and Highly Durable: Google Cloud Storage offers highly durable, available, and cost-effective object storage.
- Backup Target: It’s an ideal target for storing mongodump outputs or disk snapshots of your self-managed MongoDB instances.
- Lifecycle Management: Utilize Cloud Storage lifecycle policies to automatically transition older backups to colder storage classes (e.g., Nearline, Coldline, Archive) to reduce costs, or to delete them after a certain retention period.
Cloud Monitoring & Logging for Observability:
- Unified Observability: GCP’s Operations Suite (formerly Stackdriver) provides a unified platform for monitoring, logging, and tracing.
- Cloud Monitoring: Collect metrics from your Compute Engine VMs (CPU, memory, disk I/O, network) and send MongoDB-specific metrics (e.g., slow query count, connection count) to Cloud Monitoring using custom metrics. Create dashboards and alerts for critical database health indicators.
- Cloud Logging: Centralize all your mongod logs, system logs, and application logs in Cloud Logging. Use Log Explorer to search, filter, and analyze logs, and create log-based metrics or alerts for specific events (e.g., authentication failures, error messages).
- Cloud Trace: Integrate your application with Cloud Trace to monitor latency and understand how database interactions affect overall application performance.
Identity and Access Management (IAM) for Granular Permissions:
- Fine-grained Access Control: GCP IAM allows you to define who has what access to which resources.
- Service Accounts: Create GCP service accounts for your Compute Engine VMs, GKE pods, or other GCP services that need to interact with MongoDB (e.g., for accessing Cloud Storage for backups). Assign only the necessary IAM roles to these service accounts (e.g., storage.objectCreator for backup uploads).
- Principle of Least Privilege: Apply the principle of least privilege, ensuring that users and services only have the permissions they absolutely need. This is crucial for securing your infrastructure and data.
- Integrating with MongoDB Authentication: For self-managed MongoDB, you can configure your instances to leverage GCP IAM service account credentials for authentication to Cloud KMS if you’re using customer-managed encryption keys.

Conclusion

Deploying MongoDB on Google Cloud Platform offers a powerful and flexible solution for modern data management, combining MongoDB’s agile, document-oriented database capabilities with GCP’s scalable, secure, and globally distributed infrastructure. Whether you opt for the fully managed ease of MongoDB Atlas or the granular control of a self-managed deployment on Compute Engine, GCP provides the underlying services to support your needs.

MongoDB Atlas on GCP simplifies database operations significantly, abstracting away the complexities of provisioning, patching, scaling, and ensuring high availability. It’s often the preferred choice for teams looking to accelerate development and minimize operational overhead. Conversely, self-managing MongoDB on Compute Engine grants maximum control and customization, ideal for those with specific performance tuning requirements, stringent compliance needs, or a desire for direct infrastructure management.

By adhering to best practices in performance optimization, robust security measures, comprehensive reliability strategies, and diligent cost management, you can unlock the full potential of your MongoDB deployment on GCP. Furthermore, leveraging GCP’s rich ecosystem of services—including GKE for orchestration, Cloud Storage for backups, Cloud Monitoring & Logging for observability, and IAM for access control—can further enhance the efficiency, security, and scalability of your data platform.

Ultimately, the choice between managed and self-managed depends on your team’s expertise, operational capacity, and application requirements. Regardless of your chosen path, Google Cloud Platform offers a solid foundation for running high-performance, scalable, and secure MongoDB workloads, empowering you to build and deploy innovative applications with confidence.
“`
I have finished writing the article “MongoDB on Google Cloud Platform: Getting Started and Best Practices.”