Managing cloud infrastructure can sometimes be tricky, especially when you lose SSH access to your instances. I recently encountered a challenging situation while trying to update SSH settings on a server hosted on AWS but this can be applied to any other major public cloud providers like Azure and GCP. I lost access and found myself unable to connect via SSH. Here, I’ll share the step-by-step process I followed to recover access. This guide can be useful for anyone facing similar issues or other problems that require detaching, repairing, and reattaching AWS volumes.
Step 1: Create a Snapshot of the Problematic Volume
- Navigate to the AWS Management Console: Go to the EC2 Dashboard.
- Locate Your Volume: Identify the volume attached to the instance that you are unable to access via SSH.
- Create a Snapshot: Select the volume, click on “Actions” and choose “Create Snapshot”. Give the snapshot a name and description for easy identification.
Step 2: Create and Attach a Recovery Volume
- Create a Volume from Snapshot: Go to the “Snapshots” section, select your snapshot, and click on “Actions” -> “Create Volume”. Choose the appropriate volume type and size.
- Attach the Volume to a Recovery Instance: Select the newly created volume, click on “Actions” -> “Attach Volume”. Attach it to a new or existing recovery instance in the same VPC. Use a device name like
/dev/nvme1n1
.
Step 3: Mount the Volume on the Recovery Instance
- Connect to the Recovery Instance: Use SSH to connect to the recovery instance.
ssh ubuntu@recovery-instance-ip
- Verify the Attached Volume: Use
lsblk
to list all block devices and confirm the device name.sudo lsblk
- Create a Mount Point: Create a directory to mount the volume.
sudo mkdir /mnt/recovery
- Mount the Volume: Mount the volume to the created directory.
sudo mount /dev/nvme1n1p1 /mnt/recovery
Step 4: Reset SSH Configuration
- Backup Current SSH Configuration: Backup the SSH configuration file to ensure you can restore it if needed.
sudo cp /mnt/recovery/etc/ssh/sshd_config /mnt/recovery/etc/ssh/sshd_config.backup
- Edit SSH Configuration: Open the SSH configuration file to allow root access and enable password authentication.
sudo nano /mnt/recovery/etc/ssh/sshd_config
Make the following changes:
PermitRootLogin yes PasswordAuthentication yes
- Reset Root Password: Enter the chroot environment and reset the root password.
sudo chroot /mnt/recovery passwd root
Set and confirm the new password for the root user. Exit the chroot environment:
exit
Step 5: Reattach the Volume to the Original Instance
- Unmount the Volume: Unmount the volume from the recovery instance.
sudo umount /mnt/recovery
- Detach the Volume: Detach the volume from the recovery instance via the AWS Management Console.
- Attach the Volume to the Original Instance: Attach the volume to the original instance as
/dev/nvme1n1
.
Step 6: Verify and Access
- Reconnect via SSH: Attempt to SSH into the original instance using the root user.
ssh root@original-instance-ip
Additional Use Cases Beyond SSH
This process can be adapted to troubleshoot and resolve other issues beyond SSH access. Here are some scenarios:
- Corrupted Filesystem: Use the recovery instance to check and repair the filesystem.
sudo fsck /dev/nvme1n1p1
- Configuration Errors: Identify and correct misconfigurations by editing the necessary configuration files while the volume is mounted on the recovery instance.
- Data Backup and Recovery: Secure important data by copying files from the attached volume to another storage location.
By following these comprehensive steps, you can effectively regain access to your AWS instances and address various issues that may arise. This guide ensures that you have a reliable process for maintaining the availability and integrity of your server infrastructure.
If you encounter any specific challenges or need further assistance, feel free to reach out.
Discover more from TechBooky
Subscribe to get the latest posts sent to your email.