Motion capture, also called mocap or mo-cap, is a method used to record detailed movements of people or objects and store them in a computer. It is used in areas such as the military, entertainment, sports, medicine, and to help computers and robots work better.
In movies, TV shows, and video games, motion capture involves recording the movements of actors and using that data to create moving images of characters in 2D or 3D. When it includes movements of the face, fingers, or small expressions, it is sometimes called performance capture. In many fields, motion capture is also called motion tracking, but in filmmaking and games, motion tracking usually refers to matching camera movements to scenes.
During motion capture sessions, the movements of one or more actors are recorded many times each second. Early methods used several cameras to find 3D positions, but often the goal is to record only movement, not the actor’s appearance. This movement data is then applied to a 3D model so the model moves in the same way as the actor. This is different from an older method called rotoscoping.
Camera movements can also be captured so a pretend camera in a scene can move, such as panning, tilting, or moving around a stage, controlled by a camera operator while an actor performs. At the same time, the motion capture system can record the camera, props, and the actor’s movements. This helps computer-generated characters, images, and sets match the perspective of the camera’s video images. A computer processes the data and shows the actor’s movements, giving the correct camera positions based on the scene’s objects. Getting camera movement data from recorded footage later is called match moving or camera tracking.
The first virtual actor created using motion capture was made in 1993 by Didier Pourcel and his team at Gribouille. They copied the body and face of French comedian Richard Bohringer and used early motion-capture tools to animate the character.
Applications
Motion capture technology has many uses. It is most commonly used in video games, movies, and robotics research. For example, Purdue University uses motion capture to help develop robots.
In video games, motion capture helps create realistic movements for characters like athletes and martial artists. In 1988, motion capture was used to animate characters in games like Vixen and Last Apostle Puppet Show. Later, in the 1990s, motion capture was used to create 3D characters in games such as Virtua Fighter and Virtua Fighter 2. By 1995, companies like Acclaim Entertainment and Namco used motion capture to improve their games. Examples include Crash Bandicoot, Spyro the Dragon, and Dinosaur Planet.
Motion capture is also used for indoor positioning. Researchers use it to test robot systems, such as control and perception tools. Outside, GPS and Real-Time Kinematics (RTK) can track movement accurately, but this does not work well indoors. Many motion capture systems work with the Robotic Operating System (ROS) to help developers test robots.
In aerial robotics, motion capture helps track drones indoors, avoiding rules that limit outdoor testing. Purdue University has the largest indoor motion capture system in the world, called PURT. It uses 60 cameras to track objects with millimeter accuracy, which is important for testing new technologies.
Movies use motion capture to create special effects and digital characters. Examples include Gollum from The Lord of the Rings, King Kong, and Avatar. In Total Recall (1990), motion capture was used for scenes where actors walked through an X-ray machine. Batman Forever (1995) and Star Wars: Episode I – The Phantom Menace (1999) also used motion capture. Sinbad: Beyond the Veil of Mists (2000) and Final Fantasy: The Spirits Within (2001) were among the first films to use motion capture widely.
Motion capture is also used in television shows like Laflaque, Sprookjesboom, and Headcases. In virtual reality and augmented reality, motion capture helps users interact with digital content by tracking hand movements. This is useful for training and simulations.
In medicine, motion capture helps analyze how people move, which is important for physical therapy. It can also help patients recover after surgery by providing real-time feedback.
During the making of Avatar, directors used motion capture software to see how actors’ movements would look in the film in real time. This helped create more realistic scenes. In The Avengers, motion capture was used to create digital characters.
Methods and systems
Motion tracking, also called motion capture, began as a tool for studying movement in biomechanics research during the 1970s and 1980s. Over time, it was used in education, training, sports, and later in computer animation for television, movies, and video games as the technology improved. Since the 20th century, performers have worn markers near their joints to track movement by measuring the positions or angles between these markers. Markers can be made of materials like acoustic, inertial, LED, magnetic, or reflective surfaces, or combinations of these. These markers are tracked at least twice as fast as the motion being recorded. A system's ability to capture clear details is important for both how precisely it can see movement (spatial resolution) and how quickly it can record movement (temporal resolution). Motion blur can cause problems similar to a blurry image. Starting in the 21st century, new methods were developed because of faster technology. Many modern systems can separate a person’s outline from the background. Then, joint angles are calculated using a math model that fits the outline. For movements that do not change the outline, hybrid systems use both markers and the outline but require fewer markers. In robotics, some motion capture systems use a method called simultaneous localization and mapping.
Optical systems
Optical systems use information from image sensors to find the 3D position of an object using two or more cameras that work together. Usually, special markers are placed on a person to help capture data, but newer systems can track features on the person’s body without markers. Adding more cameras helps track more people or larger areas. These systems record movement in three directions for each marker, and turning movements are calculated by comparing the positions of several markers, such as those on the shoulder, elbow, and wrist. Some systems now combine optical sensors with other types of sensors to avoid blind spots, track more people, and reduce the need for manual data editing.
Passive optical systems use markers covered with a material that reflects light back toward the camera. The camera can be adjusted to only detect the bright markers, ignoring other surfaces like skin or clothing.
The center of a marker is estimated as a point in the 2D image. The brightness of each pixel helps calculate the exact position of the marker with high precision.
To prepare the cameras, objects with markers in known positions are used. This helps determine the camera positions and correct for lens distortions. If two cameras see the same marker, they can find its 3D location. Most systems use between 2 and 48 cameras, though some use over 300 to reduce errors from markers switching places. More cameras are needed to cover all sides of a person or multiple people.
Software helps solve problems when markers look the same. Unlike systems that use wires or electronics, passive systems use reflective markers attached to skin or clothing. These markers need to be replaced regularly. They are often attached directly to the skin or to a special suit. These systems can track many markers at speeds of 120 to 160 frames per second, and faster if the area being tracked is smaller.
Active optical systems use markers that emit their own light, such as LEDs. They can be lit one at a time or in patterns that help identify them. This method allows markers to be seen from farther distances and provides clearer data. This results in very precise measurements, often as small as 0.1 mm.
In TV shows like Stargate SG1, active systems were used to capture actors moving around objects that would be hard to track with other methods. Similar systems were used in movies like Van Helsing and Rise of the Planet of the Apes to capture complex movements on large sets. Each marker can be identified individually, which helps in real-time applications. Another way to identify markers is through computer processing, but this requires more time.
Some systems use colored LEDs to track specific body parts. One of the first active systems in the 1980s used a mix of passive and active markers with mirrors and sensors.
Active systems can be improved by flashing one marker at a time or using patterns to identify markers. Systems with higher resolution can capture more detailed movements. Directors can see actors’ performances in real time on computer-generated characters. These systems work well outdoors and can capture up to 960 frames per second. They cost about $20,000 for a system with eight cameras.
Some systems use inexpensive projectors instead of high-speed cameras. These projectors use light to encode information, and special tags on objects decode the light. These tags can find their own positions, directions, and lighting conditions. They work in natural light and can be hidden in clothing or objects. This method uses less data and is ideal for real-time applications, though it is still being tested.
Motion capture technology has been used by researchers for many years, helping study many areas.
Underwater cameras are protected by waterproof housings that resist corrosion and chlorine, making them suitable for use in pools. Some cameras can work as infrared cameras, using cyan light for better underwater visibility. Underwater cameras can measure distances of 15–20 meters, depending on water clarity and the type of markers used. The number of cameras also affects the range. Different types of underwater markers are available for various uses.
Non-optical systems
Inertial motion capture technology uses small motion sensors, body movement models, and computer programs that combine sensor data. The motion data from these sensors is usually sent wirelessly to a computer, where it is recorded or displayed. Most systems use devices called inertial measurement units (IMUs), which include a gyroscope, magnetometer, and accelerometer to measure body rotation. These rotations are shown as a digital skeleton in the software. Like optical markers, using more IMU sensors improves the accuracy of the motion data. These systems do not require cameras, lights, or markers for relative movement, but cameras or other tools are needed to track the user's exact position if required. Inertial systems can record all six directions of human movement in real-time and provide limited direction information if a magnetic sensor is included. However, these sensors are less precise and can be affected by electrical interference. Benefits of inertial systems include working in many environments, no need for setup, portability, and large tracking areas. Disadvantages include less accurate position tracking and errors that grow over time. These systems are similar to Wii controllers but are more sensitive and have faster updates. They can measure the direction to the ground within one degree. Inertial systems are becoming more popular among game developers because they are easy to set up and use. Prices for these systems range from $1000 to $80,000.
Mechanical motion capture systems track joint angles by attaching a skeleton-like structure to the body. As the person moves, the mechanical parts of the structure move with them, recording the motion. These systems work in real-time, are relatively low-cost, are not affected by obstacles, and are wireless with no limits on tracking space. They usually consist of rigid metal or plastic rods connected with joints that measure movement. These systems cost between $25,000 and $75,000, plus an external system to track the user's exact position. Some systems also provide limited feedback through touch or force.
Magnetic systems use three coils on both the transmitter and receiver to calculate position and direction. The strength of the electrical signals from these coils helps determine the location and orientation of the receiver. These systems provide six directions of movement and require fewer markers than optical systems, such as one on the upper arm and one on the lower arm to track elbow movement. However, magnetic fields can be disrupted by metal objects like steel bars or wiring, as well as by electrical devices like monitors or lights.
The response of these systems is not always consistent, especially near the edges of the tracking area. Wires from the sensors can limit large movements. Magnetic systems allow real-time monitoring of motion data. Their tracking area is much smaller than optical systems. Magnetic systems are divided into alternating-current (AC) systems, which use sine waves, and direct-current (DC) systems, which use square pulses.
Stretch sensors are flexible devices made of silicone that measure changes in movement, such as stretching, bending, or pressure. When the sensor is stretched or compressed, its electrical properties change. This data can be sent via Bluetooth or directly to a computer to detect small body movements. These sensors are not affected by magnetic interference and do not require markers. Their flexibility prevents errors from growing over time, which happens with inertial systems. However, these sensors have a weaker signal compared to other types, requiring filtering or computer programs to improve their accuracy. This process can slow down the system's response time.
Related techniques
Many companies that make traditional motion capture equipment offer some type of low-resolution facial capture. This uses between 32 and 300 markers, which can be either active or passive. These methods are limited by the time needed to place the markers, adjust their positions, and process the data. The technology also limits the quality and detail of the final results.
High-fidelity facial motion capture, also called performance capture, is a newer method that records more complex facial movements to capture detailed expressions. Facial capture systems are now divided into several groups, including traditional motion capture data, systems that use pre-designed face shapes, methods that record the actual shape of an actor’s face, and special systems developed by specific companies.
The two main techniques for facial capture are stationary systems with multiple cameras that take pictures from different angles. Software like the stereo mesh solver from OpenCV is used to create a 3D model of the face. Another method uses light arrays to calculate surface details by observing changes in brightness as the light or camera moves. These methods are limited by the camera’s resolution, the size of the face in the camera’s view, and the number of cameras used. If the face takes up half of the camera’s view and the camera has a high resolution, small facial movements can be detected by comparing images. Recent research focuses on increasing the number of images captured each second and using optical flow to transfer movements to other digital faces, rather than just creating a 3D model of the actor’s face and expressions.
Radio frequency positioning systems are becoming more useful because higher frequency devices allow more precise tracking than older systems like radar. Light travels about 30 centimeters every billionth of a second, so a 10 gigahertz signal can be accurate to about 3 centimeters. By measuring signal strength to a quarter of the signal’s wavelength, accuracy can improve to about 8 millimeters. To match the precision of optical systems, frequencies of 50 gigahertz or higher are needed, but these systems face similar challenges, such as being blocked by objects and requiring a clear path. These systems are better suited for tracking large areas where high precision is not required, as lower accuracy is acceptable at greater distances. Many scientists believe radio frequency systems will not reach the precision needed for motion capture.
In 2015, researchers at the Massachusetts Institute of Technology developed a system that tracks movement using radio frequency signals.
Another method uses a rotating sphere, like a hamster ball, that contains internal sensors to record an actor’s movements. This system allows the actor to move freely without external cameras or equipment. While this could reduce costs for motion capture, the sphere can only record movement in one continuous direction. Additional sensors would be needed to capture more complex movements.
A different method uses a 6DOF motion platform with an omni-directional treadmill and high-resolution optical motion capture. This allows the person being tracked to move in any direction and navigate uneven surfaces. This technology is used for medical rehabilitation, biomechanical research, and virtual reality.
In 3D pose estimation, an actor’s body position can be recreated from a picture or depth map.