2024
The goal of this project is straightforward: to utilize hand-gesture recognition to create a more intuitive way of interacting with a desktop computer. This technology has long been widely used in the mobile tech sector, but desktop users (i.e. most office workers, engineers, and students) are stuck using the same keyboard and trackpad/mouse setup invented decades ago. Hand-gesture sensing technology has made great strides in recent years thanks to advances in machine-learning. However the need for fairly powerful hardware and the relatively slow response times that come with an ML-based approach make it unattractive for this type of project.
My current prototype was completed in late 2024. The setup uses an Arduino and 2 ultrasonic sensors to detect the following gestures: left swipe, right swipe, raising hand, lowering hand, and covering (placing hand so that it covers both sensors). The system offers two modes: media control mode, and multitasking mode. Both are explained below.
The media control mode allows a user to play/pause and skip/rewind media, as well as raise or lower volume. When a gesture is detected, software is used to emulate a keypress (such as the play or skip button) which Windows recognizes as a media control command. I'm currently experimenting with integrating the Spotify API into my program to allow for control of remotely connected audio devices such as a home speaker system.
This mode, currently in the earlier stages of development, allows a user to access Windows' multitasking features without the need for keyboard shortcuts. Activating this mode opens up the "exploded desktop" view where all open windows are visible. The user can then move to their desired window and select it. In future updates, I hope to improve the accuracy and speed of this mode. I'm also experimenting with integrating other multitasking features, such as virtual desktops and window snapping, into this mode.
My first prototype was completed in July 2024, and used one Raspberry Pi, one ultrasonic sensor, and a camera. This system used OpenCV for camera control and MediaPipe for finger identification. Trial and error revealed two major disadvantages of a vision-based system:
while visual detection worked well in test conditions, accuracy fell greatly any time lighting, background, or hand position was altered
slow reaction speed hindered the ability to perform multiple gesture actions (such as required for multitasking support)
The dual sensor setup performed better in several important ways, including:
almost instant response time
able to run on much less powerful hardware
ability to detect left/right swipes
unaffected by changing light levels
First functioning dual sensor setup, implemented on breadboard
Current design, with circuitry based on prototype shown to the left