Hey everyone,
I'm working on a registration script, but I'm running into issues when the app I'm automating has several possible flows or UI layouts during account creation.
For example:
Here's how I'm currently handling it:
When the element appears, the bot clicks it.
Then it checks for the next possible button or screen in the same way — block by block.
This approach works functionally, but the problem is that sometimes I have to scroll through 5–6 text blocks (using Check Element Exists) just to get to the correct clickable button, which turns a 3-second flow into a 10-second one.
And this happens on multiple screens across different registration variants.
I’m currently automating 10 real Android phones in parallel, but I sometimes notice that the script:
Example: Facebook’s post-login menu — I’ve recorded 3 different UI versions, and still, sometimes something new shows up and breaks the logic.
So I'm looking for a way to build scripts that are:
I'm working on a registration script, but I'm running into issues when the app I'm automating has several possible flows or UI layouts during account creation.
For example:
- Sometimes the button says "Get Started",
- Other times it's "Create Account",
- Occasionally a popup appears asking for permissions (e.g. "Don't allow"),
- Then the form with first name, last name, password shows up – but even this screen can vary in layout or order.
Here's how I'm currently handling it:
When the element appears, the bot clicks it.
Then it checks for the next possible button or screen in the same way — block by block.
This approach works functionally, but the problem is that sometimes I have to scroll through 5–6 text blocks (using Check Element Exists) just to get to the correct clickable button, which turns a 3-second flow into a 10-second one.
And this happens on multiple screens across different registration variants.
My questions:I also tested Check Text (OCR) briefly, but I’m not using it in my current implementation. The problem was that sometimes layout loaded too slow and OCR found the same element like before and it broke script.
- Is it OK to rely on chained XPath checks like this? Or is there a cleaner way to structure multi-path flows?
- Would Get Text give more accurate screen detection?
- How do you structure logic for variable UI flows – long chain or clear stage-based logic?
- What’s the best way to avoid unnecessary processing time (e.g. when checking multiple possible elements)?
I’m currently automating 10 real Android phones in parallel, but I sometimes notice that the script:
- fails at points where it shouldn’t,
- or can’t detect a button that normally works, even when I’ve already prepared 3 layout variants.
Example: Facebook’s post-login menu — I’ve recorded 3 different UI versions, and still, sometimes something new shows up and breaks the logic.
So I'm looking for a way to build scripts that are:
- fast and efficient,
- resistant to layout changes,
- and context-aware (don’t waste time scanning irrelevant blocks).