DEV Community

Cover image for [F#] Bring Chrome Window to front in Selenium (Windows)
Ruxo Zheng
Ruxo Zheng

Posted on

[F#] Bring Chrome Window to front in Selenium (Windows)

(Note: This article uses .NET Web Driver and .NET Interoperability to work with the Win32 API.)

In this article, I'll show you how to bring a Chrome browser window to the front, ensuring it becomes the active, focused window during Selenium automation. Usually, Selenium is used to automate web tasks in the background, allowing you to continue working on other tasks. However, some websites 😉 get suspicious if they detect actions happening while the browser is inactive—after all, how could there be interactions if the browser isn't even in focus?

To help address this, let's take a closer look at how browser focus works. In JavaScript, the onfocus and onblur events allow websites to determine if the browser is the active window. The onfocus event triggers when the window gains input focus (meaning it can receive keyboard input and is the active window on the screen), while the onblur event triggers when the browser loses focus. These simple events can give websites clues about whether interactions seem automated.

Many websites also use more sophisticated approaches to detect human-like interaction patterns. Therefore, if you're trying to convincingly simulate human behavior, controlling the browser window—bringing it to the front or switching between windows—can be essential. This is where the Win32 API comes in handy.

Little Note about F#

This article uses F#. Here is a little explanation of its syntax:In C# if you have a method/function named f which takes one argument (supposed it's x), you can call it with var y = f(x). But F# version, the possibilities are:

let y = f(x)   // normal call
let y = f x    // parentheses are optional
let y = x |> f // forward pipe call
Enter fullscreen mode Exit fullscreen mode

Bringing Chrome to the Front with Selenium

On Windows, you can control any application window using the Win32 API, but you need a "window handle" to do so. A window handle (often abbreviated as HWND) is a token provided by the Windows OS that uniquely identifies a window in the system's GUI environment.

When using Selenium, starting ChromeDriver automatically opens the browser, but it doesn't directly provide the window handle. Instead, we can get to it indirectly via the ChromeDriver process ID, which is accessible through ChromeDriverService.

Step 1: Obtain ChromeDriver Process ID

When we create the ChromeDriver instance, we can access its process ID through the given ChromeDriverService object.

let options = ChromeOptions()
let service = ChromeDriverService()
let driver = ChromeDriver(service, options)

printfn $"Process ID = {service.ProcessId}"
Enter fullscreen mode Exit fullscreen mode

Step 2: Find the Browser's Process ID

The ChromeDriver executable (chromedriver.exe) is responsible for launching the Chrome browser instance, but we need the browser's process ID, not the driver's. To find it, we need to iterate through all running Chrome processes on the machine and identify the one spawned by our driver.

open System.Diagnostics

let getBrowserId (driver_process_id: int) =
    let chrome_processes = Process.GetProcessesByName "chrome"
    try
        let candidates = chrome_processes |> Seq.filter (fun p -> getParentId(p.Id) = driver_process_id)
                                          |> Seq.map _.Id
                                          |> Seq.toArray
        assert (candidates.Length = 1)  // must be one Chrome window
        candidates[0]
    finally
        chrome_processes |> Seq.iter _.Dispose()
Enter fullscreen mode Exit fullscreen mode

Step 2.1: Get the Parent Process ID

Getting the parent process ID from a given process ID is a bit tricky — it requires querying detailed process information using the NtQueryInformationProcess function. Essentially, we open the process with the known ID, call this function, and extract the required information.

open System.Runtime.InteropServices

[<Flags>]
type private ProcessAccessFlags = QueryInformation = 0x400u

[<Struct; StructLayout(LayoutKind.Sequential)>]
type private ProcessBasicInformation = {
    Reserved1: IntPtr
    PebBaseAddress: IntPtr
    Reserved2_0: IntPtr
    Reserved2_1: IntPtr
    UniqueProcessId: IntPtr
    InheritedFromUniqueProcessId: IntPtr
}
with
    static member Default = { Reserved1 = IntPtr.Zero; PebBaseAddress = IntPtr.Zero; Reserved2_0 = IntPtr.Zero; Reserved2_1 = IntPtr.Zero; UniqueProcessId = IntPtr.Zero; InheritedFromUniqueProcessId = IntPtr.Zero }

[<DllImport("kernel32.dll")>]
extern IntPtr private OpenProcess(ProcessAccessFlags dwDesiredAccess, bool bInheritHandle, int dwProcessId)

[<DllImport("kernel32.dll", SetLastError = true)>]
extern bool private CloseHandle(IntPtr hObject)

[<DllImport("ntdll.dll")>]
extern int private NtQueryInformationProcess(IntPtr ProcessHandle, int ProcessInformationClass, ProcessBasicInformation& ProcessInformation, int ProcessInformationLength, int& ReturnLength)

let getParentId (process_id: int) =
    let handle = OpenProcess(ProcessAccessFlags.QueryInformation, false, process_id)
    if handle = IntPtr.Zero then
        failwith "OpenProcess failed"
    try
        let mutable pbi = ProcessBasicInformation.Default
        let mutable returnLength = 0
        let size = Marshal.SizeOf pbi
        let status = NtQueryInformationProcess(handle, 0, &pbi, size, &returnLength)
        if status <> 0 then failwithf $"NtQueryInformationProcess failed with status %d{status}"
        pbi.InheritedFromUniqueProcessId.ToInt32()
    finally
        CloseHandle handle |> ignore
Enter fullscreen mode Exit fullscreen mode

Step 3: Locate the Window Handle

Once we have the process ID for the browser, the next step is to enumerate the window handles owned by this process.

type private EnumWindowsProc = delegate of IntPtr * IntPtr -> bool

[<DllImport("user32.dll")>]
extern bool private EnumWindows(EnumWindowsProc lpEnumFunc, IntPtr lParam)

[<DllImport("user32.dll")>]
extern int private GetWindowThreadProcessId(IntPtr hWnd, int& lpdwProcessId)

[<DllImport("user32.dll")>]
extern bool private IsWindowVisible(IntPtr hWnd)

let findBrowserHandle (browser_pid: int) =
    let mutable result = []

    let enumWindowsProc (hWnd: IntPtr) (_: IntPtr) :bool =
        let mutable pid = 0
        GetWindowThreadProcessId(hWnd, &pid) |> ignore
        if pid = browser_pid then result <- hWnd :: result
        true

    let callback = EnumWindowsProc(enumWindowsProc)
    if not <| EnumWindows(callback, IntPtr.Zero) then failwith "EnumWindows failed"

    result |> Seq.filter IsWindowVisible
           |> Seq.exactlyOne
Enter fullscreen mode Exit fullscreen mode

Since there could be multiple windows, we specifically look for the visible one—this is the Chrome window that we want to interact with.

Final Step: Bring Chrome to the Front!

Now that we have the correct window handle, we can bring it to the front by using the SetForegroundWindow API. This call will make Chrome the active window, ready for any user input.

[<DllImport("kernel32.dll")>]
extern uint GetLastError()

[<DllImport("user32.dll")>]
extern bool private SetForegroundWindow(IntPtr hWnd)

let [<Literal>] private NoError = 0u
let [<Literal>] private AccessDeniedError = 5u

let setForegroundWindow (hwnd: IntPtr) =
    if not (hwnd |> SetForegroundWindow) then
        match GetLastError() with
        | AccessDeniedError -> printfn "Cannot set foreground, process doesn't own foreground privilege"; false
        | NoError -> true // window is already on top... I think...
        | code -> failwithf $"""SetForegroundWindow failed (0x%X{code})"""
    else
        true
Enter fullscreen mode Exit fullscreen mode

One caveat here is that in order to successfully set the foreground window, the process making the request (your Selenium script) must already be the foreground process. In practical terms, this means your Selenium script needs to be the active application before it can bring the browser to the front. To make this strategy work smoothly, ensure your script maintains foreground status whenever necessary.

Conclusion

In this tutorial, we used a combination of Win32 API calls to obtain and manipulate the window handle of a Chrome browser, allowing us to bring it to the front during Selenium automation. While this may seem like a lot of effort for something seemingly simple, it's often necessary if you need to convincingly simulate human interactions. Ideally, Selenium would offer more direct support for this, but until then, this approach ensures your automation scripts remain effective and human-like.

Top comments (0)