Earlier today I cloned a VM in one of our vCenter appliances. The cloning process completed without a hitch (as it usually does). However, I was not able to power on the VM. I looked at the “Tasks” view in vCenter and found this:
This is a fairly generic error message. However, the “Connection Refused” part prompted me to take a look at the vCenter services. You can view these services by browsing to Home > Administration > System Configuration > Services in the vSphere Web Client for vCenter. On the Summary tab, there is a handy “Services Health” section that will give you a high level overview of the overall health of the services. You can see here that three of the services are in a critical state on my vCenter appliance.
You can hover your mouse pointer over the critical services link and view the critical services. You can also click on each service to get a better view of the status. The Postgres database service on my appliance was one of the services in critical state. I clicked on it and viewed the screen below. You can see here that the problem is obvious, the filesystem holding the Postgres database service is completely out of space. I was able to identify that the log directory was the culprit.
Its simple enough to SSH into the appliance and delete the logs. However, this will not prevent the problem from happening again. After doing some Googling (I love that the Merriam-Webster dictionary identifies this as a real word in the English language!), I found these KB articles:
Follow this one to decrease the maximum backup size and maximum backup index size of the SSO logs in the log4j.properties configuration file:
To offload your logs to a syslog server, follow this article:
This will prevent the log directory from consuming the entire partition again. After following these steps I rebooted the vCSA for good measure and was able to power on the virtual machine after doing so.
Hopefully this will help someone who is having the same (or similar) problem!