Intermittent Aria Automation Login Issue Resolved by Infrastructure Health Validation


Environment Details

We recently faced an intermittent login issue in our Production environment involving the following VMware components:

  • VMware Identity Manager (vIDM): 3.3.7

  • VMware Aria Automation: 8.18.1

Issue Description

After users successfully authenticated via tenant login, the Aria Automation home page intermittently failed to load and displayed the error:

“Page can’t be reached”

This issue did not occur consistently, making it difficult to immediately isolate the root cause.


Initial Validation Checks

During the issue window, we validated the following:

  • ✅ Network connectivity between components was healthy

  • ✅ DNS entries were resolving correctly

  • ✅ No authentication failures were observed in vIDM

Reference screenshots were captured for further analysis.


Troubleshooting Approach

1. Validate Aria Automation & vIDM Services

We first verified that all Aria Automation and vIDM services were up and running across the infrastructure.
No immediate service-level failures were observed.


2. NSX-T Load Balancer Alert Investigation

We then noticed alerts indicating that the NSX-T Load Balancer pool was down.

Further investigation revealed:

  • NSX-T load balancer pool members (Aria Automation appliances) were returning HTTP 500 errors

  • This directly impacted access to the Aria Automation UI

This shifted the focus toward Aria Automation appliance health.


3. Aria Automation Health Check

We logged in to the Aria Automation appliance and ran the built-in health checks:

/opt/health/run.sh

During execution, the following error was observed:

Running check memory-usage
make: *** [/opt/health/Makefile:73: memory-usage] Error 1

This error matched a known issue documented in the Broadcom knowledge base:

🔗 KB Reference:
https://knowledge.broadcom.com/external/article/394277/aria-automation-node-marked-unhealthy-du.html


4. Memory Utilization Analysis

Using the top command, we analyzed memory usage on the Aria Automation appliance and observed:

  • Active memory utilization above 96%

  • Physical RAM was available, but memory pressure was extremely high

  • Due to this condition, the appliance was marked unhealthy

  • NSX-T Load Balancer removed the node from the pool, causing intermittent UI failures


Root Cause Identified

🔴 Root Cause:
High memory utilization on the Aria Automation appliance caused health check failures, resulting in:

  • Appliance marked unhealthy

  • NSX-T Load Balancer pool member failure

  • Intermittent Aria Automation UI access issues

  • vIDM AD sync temporarily masking the problem by re-establishing sessions


Resolution Steps

1. Restart Aria Automation via vLCM

Since memory pressure was the core issue, we performed a controlled restart of Aria Automation using vLifecycle Manager (vLCM).

🔗 KB Reference:
https://knowledge.broadcom.com/external/article/399735/

2. Post-Restart Validation

After the restart:

  • Memory utilization returned to normal levels

  • Aria Automation health checks passed successfully

  • NSX-T Load Balancer pool members returned to UP state

  • Aria Automation UI became stable with no further intermittent issues


Final Outcome

✅ Issue permanently resolved
✅ No further login or UI access issues observed
✅ NSX-T load balancer stability restored
✅ No need for manual vIDM AD sync workaround


Key Takeaways

  • Intermittent Aria Automation UI issues can be infrastructure health-related, not authentication-related

  • NSX-T Load Balancer health is a critical dependency for Aria Automation availability

  • Regular monitoring of memory utilization is essential

  • Built-in health checks (run.sh) are invaluable for root cause analysis


Comments

Popular posts from this blog

Creating Snapshots for Unmanaged VMs in Aria Automation (vRealize Automation)

Bulk import security policies into Palo Alto Networks firewalls

Automating Tag Creation & Assignment to VMs with vRA + vRO