Intermittent Aria Automation Login Issue Resolved by Infrastructure Health Validation
Environment Details
We recently faced an intermittent login issue in our Production environment involving the following VMware components:
VMware Identity Manager (vIDM): 3.3.7
VMware Aria Automation: 8.18.1
Issue Description
After users successfully authenticated via tenant login, the Aria Automation home page intermittently failed to load and displayed the error:
“Page can’t be reached”
This issue did not occur consistently, making it difficult to immediately isolate the root cause.
Initial Validation Checks
During the issue window, we validated the following:
✅ Network connectivity between components was healthy
✅ DNS entries were resolving correctly
✅ No authentication failures were observed in vIDM
Reference screenshots were captured for further analysis.
Troubleshooting Approach
1. Validate Aria Automation & vIDM Services
We first verified that all Aria Automation and vIDM services were up and running across the infrastructure.
No immediate service-level failures were observed.
2. NSX-T Load Balancer Alert Investigation
We then noticed alerts indicating that the NSX-T Load Balancer pool was down.
Further investigation revealed:
NSX-T load balancer pool members (Aria Automation appliances) were returning HTTP 500 errors
This directly impacted access to the Aria Automation UI
This shifted the focus toward Aria Automation appliance health.
3. Aria Automation Health Check
We logged in to the Aria Automation appliance and ran the built-in health checks:
/opt/health/run.sh
During execution, the following error was observed:
Running check memory-usage
make: *** [/opt/health/Makefile:73: memory-usage] Error 1
This error matched a known issue documented in the Broadcom knowledge base:
🔗 KB Reference:
https://knowledge.broadcom.com/external/article/394277/aria-automation-node-marked-unhealthy-du.html
4. Memory Utilization Analysis
Using the top command, we analyzed memory usage on the Aria Automation appliance and observed:
Active memory utilization above 96%
Physical RAM was available, but memory pressure was extremely high
Due to this condition, the appliance was marked unhealthy
NSX-T Load Balancer removed the node from the pool, causing intermittent UI failures
Root Cause Identified
🔴 Root Cause:
High memory utilization on the Aria Automation appliance caused health check failures, resulting in:
Appliance marked unhealthy
NSX-T Load Balancer pool member failure
Intermittent Aria Automation UI access issues
vIDM AD sync temporarily masking the problem by re-establishing sessions
Resolution Steps
1. Restart Aria Automation via vLCM
Since memory pressure was the core issue, we performed a controlled restart of Aria Automation using vLifecycle Manager (vLCM).
🔗 KB Reference:
https://knowledge.broadcom.com/external/article/399735/
2. Post-Restart Validation
After the restart:
Memory utilization returned to normal levels
Aria Automation health checks passed successfully
NSX-T Load Balancer pool members returned to UP state
Aria Automation UI became stable with no further intermittent issues
Final Outcome
✅ Issue permanently resolved
✅ No further login or UI access issues observed
✅ NSX-T load balancer stability restored
✅ No need for manual vIDM AD sync workaround
Key Takeaways
Intermittent Aria Automation UI issues can be infrastructure health-related, not authentication-related
NSX-T Load Balancer health is a critical dependency for Aria Automation availability
Regular monitoring of memory utilization is essential
Built-in health checks (
run.sh) are invaluable for root cause analysis
Comments
Post a Comment