DEV Community

Cover image for How to tap the Android screen from the underlying Linux system
Philipp
Philipp

Posted on

How to tap the Android screen from the underlying Linux system

In recent years phone screens seem to only have gotten bigger. This is great because it allows you to see more on your screen, but it also has some drawbacks. One of them has been very annoying to me: I can no longer reach buttons at the top left of the screen in a comfortable way.

In a way, I would divide the screen in three areas:

  • Easy to reach: the area can be reached with the thumb while holding the phone.
  • Not comfortable: you can reach the area, but it's not as comfortable as the previously mentioned one.
  • Unreachable: this area is not in the reach of my thumb without repositioning my hand at the edge of the phone.

Here is a screenshot with an overlay that shows which areas are easy to reach with a thumb

So to me the most annoying buttons are those at the top left. While those on the top right can still be reached with a little effort, the ones in the top left corner require more effort.

So how do we solve this problem?

The best way I came up with to solve this problem was a simple idea: What if there was a way to tap the top left corner without leaving the "Easy to reach" category?

My phone has a fingerprint scanner at the back that is very easy to reach. This scanner also doesn't have any functionality when the phone is unlocked.

Detecting a finger on the sensor

So I took a look at the Android system log and found the following lines when putting the finger on and off the sensor:

fpc_fingerprint_hal: report_input_event - Reporting event type: 1, code: 96, value:1
fpc_fingerprint_hal: report_input_event - Reporting event type: 1, code: 96, value:0
Enter fullscreen mode Exit fullscreen mode

The only relevant difference between these lines is the number at the end -- 1 for "finger down", 0 for "finger up".

So that was easy -- just write a program that scans the logcat output, detects these lines and then runs the input tap x y shell command to tap a specific point. Right?

No.

It's so slow

The input command seemed very slow to me. It took quite some time from tapping the sensor to a reaction to the click. While testing it appeared to take at least 300ms, often worse with about 400ms.

According to a lot of anecdotal evidence, actions that take 100ms or less are perceived as instant. So this command definitely fails all expectations of "instant" (it was probably not designed to be fast, anyway). But why is that?

The "input" command

Android comes with a lot of different commands in /system/bin. Most of them are to be expected in a typical Linux environment (like tail, cat etc.) and some of them are specific to Android.

The input command, to my surprise, was just a shell script:

#!/system/bin/sh
# Script to start "input" on the device, which has a very rudimentary
# shell.
#
base=/system
export CLASSPATH=$base/framework/input.jar
exec app_process $base/bin com.android.commands.input.Input "$@"
Enter fullscreen mode Exit fullscreen mode

If I read that correctly, it basically starts a Java program that can simulate a tap. There are also other actions it can do but for this post I don't care.

Reducing the delay

One method to not have the long, noticeable delay is -- quite simply -- not relying on the input command. It just writes some data, that shouldn't be too hard to copy. So instead of starting a script that starts a program that writes a small piece of data, we can just write it ourselves.

But what should we write and where should the data be written?

I don't know exactly why, but I never really looked at the documentation (also this now makes a lot more sense) and started reverse-engineering this... open source protocol. Yea... anyway.

The first step when trying to reproduce a behavior is watching it. So how can we watch taps on the screen as they happen?

The getevent utility allows us to watch certain events happen in real time. It also makes it easy to list device files associated with those events.

Using getevent -pl (in a root shell on the phone) we can get a nice overview of devices, their events and device file paths:

chiron:/ $ getevent -pl
add device 1: /dev/input/event6
name:     "msm8998-tasha-snd-card Button Jack"
events:
    KEY (0001): KEY_VOLUMEDOWN        KEY_VOLUMEUP          KEY_MEDIA             BTN_3
                BTN_4                 BTN_5
input props:
    INPUT_PROP_ACCELEROMETER
add device 2: /dev/input/event5
name:     "msm8998-tasha-snd-card Headset Jack"
events:
    SW  (0005): SW_HEADPHONE_INSERT   SW_MICROPHONE_INSERT  SW_LINEOUT_INSERT     SW_JACK_PHYSICAL_INS
                SW_PEN_INSERTED       0010                  0011                  0012
input props:
    <none>
add device 3: /dev/input/event4
name:     "uinput-fpc"
events:
    KEY (0001): KEY_KPENTER           KEY_UP                KEY_LEFT              KEY_RIGHT
                KEY_DOWN              BTN_GAMEPAD           BTN_EAST              BTN_C
                BTN_NORTH             BTN_WEST
input props:
    <none>
add device 4: /dev/input/event3
name:     "gpio-keys"
events:
    KEY (0001): KEY_VOLUMEUP
    SW  (0005): SW_LID
input props:
    <none>
add device 5: /dev/input/event0
name:     "qpnp_pon"
events:
    KEY (0001): KEY_VOLUMEDOWN        KEY_POWER
input props:
    <none>
add device 6: /dev/input/event2
name:     "uinput-goodix"
events:
    KEY (0001): KEY_HOME
input props:
    <none>
add device 7: /dev/input/event1
name:     "synaptics_dsx"
events:
    KEY (0001): KEY_WAKEUP            BTN_TOOL_FINGER       BTN_TOUCH
    ABS (0003): ABS_X                 : value 0, min 0, max 1079, fuzz 0, flat 0, resolution 0
                ABS_Y                 : value 0, min 0, max 2159, fuzz 0, flat 0, resolution 0
                ABS_MT_SLOT           : value 9, min 0, max 9, fuzz 0, flat 0, resolution 0
                ABS_MT_TOUCH_MAJOR    : value 0, min 0, max 255, fuzz 0, flat 0, resolution 0
                ABS_MT_TOUCH_MINOR    : value 0, min 0, max 255, fuzz 0, flat 0, resolution 0
                ABS_MT_POSITION_X     : value 0, min 0, max 1079, fuzz 0, flat 0, resolution 0
                ABS_MT_POSITION_Y     : value 0, min 0, max 2159, fuzz 0, flat 0, resolution 0
                ABS_MT_TRACKING_ID    : value 0, min 0, max 65535, fuzz 0, flat 0, resolution 0
input props:
    INPUT_PROP_DIRECT
Enter fullscreen mode Exit fullscreen mode

It looks confusing at first, but especially the last device is interesting: It has all kinds of events that are associated with a multitouch device. That's our screen. So now we know where to write data, the device file /dev/input/event1.

The question what we should write can be answered by watching the getevent -l output:

/dev/input/event1: EV_ABS       ABS_MT_TRACKING_ID   0000504c
/dev/input/event1: EV_KEY       BTN_TOUCH            DOWN
/dev/input/event1: EV_KEY       BTN_TOOL_FINGER      DOWN
/dev/input/event1: EV_ABS       ABS_MT_POSITION_X    00000037
/dev/input/event1: EV_ABS       ABS_MT_POSITION_Y    0000008d
/dev/input/event1: EV_SYN       SYN_REPORT           00000000
/dev/input/event1: EV_ABS       ABS_MT_TOUCH_MAJOR   00000006
/dev/input/event1: EV_SYN       SYN_REPORT           00000000
/dev/input/event1: EV_ABS       ABS_MT_TRACKING_ID   ffffffff
/dev/input/event1: EV_KEY       BTN_TOUCH            UP
/dev/input/event1: EV_KEY       BTN_TOOL_FINGER      UP
/dev/input/event1: EV_SYN       SYN_REPORT           00000000
Enter fullscreen mode Exit fullscreen mode

This is the output when doing a single tap in the top left corner of the display. Note that the numbers next to ABS_MT_POSITION_{X,Y} are the coordinates I just tapped. So the question is: how do we translate this? Not at all, we just remove the -l ("label event types and names in plain text") option to get a more "raw" data stream:

/dev/input/event1: 0003 0039 0000504d        # ABS_MT_TRACKING_ID  
/dev/input/event1: 0001 014a 00000001        # BTN_TOUCH           
/dev/input/event1: 0001 0145 00000001        # BTN_TOOL_FINGER     
/dev/input/event1: 0003 0035 00000037        # ABS_MT_POSITION_X   
/dev/input/event1: 0003 0036 0000008d        # ABS_MT_POSITION_Y   
/dev/input/event1: 0000 0000 00000000        # SYN_REPORT          
/dev/input/event1: 0003 0030 00000006        # ABS_MT_TOUCH_MAJOR  
/dev/input/event1: 0000 0000 00000000        # SYN_REPORT          
/dev/input/event1: 0003 0039 ffffffff        # ABS_MT_TRACKING_ID  
/dev/input/event1: 0001 014a 00000000        # BTN_TOUCH           
/dev/input/event1: 0001 0145 00000000        # BTN_TOOL_FINGER     
/dev/input/event1: 0000 0000 00000000        # SYN_REPORT          
Enter fullscreen mode Exit fullscreen mode

OK, so that is the data. And we know where to write it. But still... how?
Let's take a look at the source code of the sendevent command. It seems to basically be a lower-level version of the input command (not really, but still kind of).

The most interesting part is the input_event struct, which is filled with data and then written to a device file:

struct input_event {
    struct timeval time;
    __u16 type;
    __u16 code;
    __s32 value;
};
Enter fullscreen mode Exit fullscreen mode

So before we had three columns with numbers in our output, and now we have three unsigned integers we want to fill with data: type, code and value. The getevent command outputs hex numbers, so we have to make sure we don't accidentally use the wrong number format when specifying them in a program (definitely never happened to me...sure ;)).

Putting it all together

Now all we have to do is write the twelve events we observed previously in sequence to the device file and then test the program.

While implementing this is possible in any language, I chose Go for the task because of the ability to easily cross-compile from Windows to Arm64 Android. It also made it extra easy to define the events needed for a single tap:

// Define the input_event struct, but in Go
type InputEvent struct {
    Time  syscall.Timeval
    Type  EventType
    Code  EventCode
    Value uint32
}

// Some const definitions, names are from the getevent output
type EventType uint16

const (
    EV_ABS EventType = 0x0003
    EV_KEY EventType = 0x0001
    EV_SYN EventType = 0x0000
)

// Known event codes for a touch sequence
type EventCode uint16

const (
    ABS_MT_TRACKING_ID EventCode = 0x0039
    BTN_TOUCH          EventCode = 0x014a
    BTN_TOOL_FINGER    EventCode = 0x0145
    ABS_MT_POSITION_X  EventCode = 0x0035
    ABS_MT_POSITION_Y  EventCode = 0x0036
    ABS_MT_TOUCH_MAJOR EventCode = 0x0030
    SYN_REPORT         EventCode = 0x0000
)

// Value field of BTN_TOUCH, BTN_TOOL_FINGER
const (
    TOUCH_VALUE_DOWN = 0x00000001
    TOUCH_VALUE_UP   = 0x00000000
)

// This event happens more often; marks the start/end of a sequence
var eventSynReport = InputEvent{
    Type:  EV_SYN,
    Code:  SYN_REPORT,
    Value: 0x00000000,
}

// touch is the whole sequence of events that simulates a single tap
// While testing it seemed like not all SYN_REPORT events are necessary,
// but we will just use the same sequence as observed above
var touch = []InputEvent{
    {
        Type:  EV_ABS,
        Code:  ABS_MT_TRACKING_ID,
        Value: 0x0000e800, // Touch tracking ID, seems like we don't need to care about it
    },
    // Pretend to put the finger down
    {
        Type:  EV_KEY,
        Code:  BTN_TOUCH,
        Value: TOUCH_VALUE_DOWN,
    },
    {
        Type:  EV_KEY,
        Code:  BTN_TOOL_FINGER,
        Value: TOUCH_VALUE_DOWN,
    },
    // Top right corner
    {
        Type:  EV_ABS,
        Code:  ABS_MT_POSITION_X,
        Value: 0x00000071,
    },
    {
        Type:  EV_ABS,
        Code:  ABS_MT_POSITION_Y,
        Value: 0x000000a3,
    },
    eventSynReport,
    {
        Type:  EV_ABS,
        Code:  ABS_MT_TOUCH_MAJOR,
        Value: 0x00000005,
    },
    eventSynReport,
    {
        Type:  EV_ABS,
        Code:  ABS_MT_TRACKING_ID,
        Value: 0xffffffff,
    },
    // Now put the finger up again
    {
        Type:  EV_KEY,
        Code:  BTN_TOUCH,
        Value: TOUCH_VALUE_UP,
    },
    {
        Type:  EV_KEY,
        Code:  BTN_TOOL_FINGER,
        Value: TOUCH_VALUE_UP,
    },
    eventSynReport,
}
Enter fullscreen mode Exit fullscreen mode

Now we just write our sequence to the device file f:

// Assumption: f is the opened display device file /dev/input/event1
for _, ievent := range touch {
    err := binary.Write(f, binary.LittleEndian, ievent)
    if err != nil {
        panic("writing input event: " + err.Error())
    }
}```



Now we just write our sequence to the device file `f`:



```go
// Assumption: f is the opened display device file /dev/input/event1
for _, ievent := range touch {
    err := binary.Write(f, binary.LittleEndian, ievent)
    if err != nil {
        panic("writing input event: " + err.Error())
    }
}
Enter fullscreen mode Exit fullscreen mode

You can find the whole program here.

One interesting detail about the sequence is that it doesn't always have to be the same. Sometimes, there are more SYN_REPORT events in a sequence, but interestingly they do not appear to change the result. According to the documentation, if no SYN_REPORT has been sent between two events, they are seen as sent in the same moment of time; so this event type acts as a separator.

Now that we have the code for a single tap, we can of course adjust the code to be able to tap any position by simply changing the x and y values.

In my tests this program has been a lot faster than the method with the input command, which was a nice outcome.

Actually using it

Now that we have done all the work to get a working tap program, we only need to integrate it into a program that detects the fingerprint press, then sends those events. I'll spare you the details on that, you can see the whole program on GitHub.

It's basically a daemon that runs in the background and detects the aforementioned log lines to react with a tap. It also has a few more commands, but they are not as technically interesting as the tap.

I also packaged the program into a Magisk (root solution with addons) module as that allows me to easily run it on boot.

Further ideas

One could use getevent and this method of writing events to create an event recorder that can accurately replay sequences of events. So if you want to automatically input a pin on the lock screen, that should be possible (the screen device file doesn't have any restrictions on when the tap can happen, I think the input command is limited to an unlocked phone only, no lock screen access).

Thanks

If you found this interesting and want to create something like this or adapt the program for your phone, take a look at the repository.

If there are any mistakes in this post please feel free to point them out. Thank you :)

Latest comments (0)