Sign Language Detection-LSTM TFOD

Ritik Vaidande

4 min readDec 25, 2021

Realtime Sign Language Detection using sequences.

Extract holistic keypoints
Train an LSTM DL model
Make real time predictions using sequences

Signs like Hello, Thankyou, I Love You will be trained to the model and then predicted in realtime.

Import Dependencies

We will be using Tensorflow as our framework.

OpenCV to play with webcam.

mediapipe will be responsible to extract face, hand, body keypoints from the image and also to draw those on it.

And sklearn for train test split and matplotlib to plot some test images.

Detect Face, Hand, Pose Landmarks

Create two variables mp_holistic and mp_drawing.

mp_holistic will be the detection model and mp_drawing will be drawing utilities.

Created a function mediapipe_detection for detection purpose.

Here image will be the frame coming from OpenCV and model will the mediapipe holistic model.

Color conversion BGR2RGB then again RGB2BGR, image writeable is set to false and then again true (this saves some memory). At last return the image and results.

Created an draw_styled_landmarks for actually drawing the detected landmarks on the image with some styling.

Here image from OpenCV and results from the mediapipe_detection model will be passed.

Using draw_landmarks function of mediapipe we are passing the results values such as results.face_landmarks, results.pose_landmarks and so on which we got from the detection model.

Pass the connection map, FACE_CONNECTIONS(Facial keypoints), POSE_CONNECTIONS(Torso), HAND_CONNECTIONS(Left and Right hand).

And specify color, thickness, circle_radius to the DrawingSpec function.

Set mediapipe model with initial and preceding detection confidence value. Add these two functions mediapipe_detection and draw_styled_landmarks to the OpenCV Video Capture stuff.

Webcam will capture frames in realtime and pass on to the detection model and drawing utilites.

Extract Keypoints Values

def extract_keypoints(results):
    pose = np.array([[res.x, res.y, res.z, res.visibility] for res in results.pose_landmarks.landmark]).flatten() if results.pose_landmarks else np.zeros(33*4)    face = np.array([[res.x, res.y, res.z] for res in results.face_landmarks.landmark]).flatten() if results.face_landmarks else np.zeros(468*3)    lh = np.array([[res.x, res.y, res.z] for res in results.left_hand_landmarks.landmark]).flatten() if results.left_hand_landmarks else np.zeros(21*3)    rh = np.array([[res.x, res.y, res.z] for res in results.right_hand_landmarks.landmark]).flatten() if results.right_hand_landmarks else np.zeros(21*3)
    return np.concatenate ([pose, face, lh, rh])

Create extract_keypoints function which will convert the values in arrays and if the are no such keypoints present it will replace them by creating zero array.

Setup Folders for Collection

Specify the path for exported data. Create an array which has the name of the signs we want in our Sign Language Recogniton Model. Set no_sequences = 30, that means there will be thirty videos of data. Set sequence_length = 30 which means 30 frames in an video.

And the for loop block of code will create folders for each of our signs(action). In each of the action folder, there will be 30 folders containing their video data.

Collect Keypoints for Train and Test