I managed to send the photo to gemini via API, and get its output to get the x, y coordinates of the button.
But the button location seems to be wrong.
Does anyone have experience using AI to determine clicking coordinates? (do a screenshot of instance, then send this screenshot to AI to ask x, y coordinates to click)