Motion capture

Date

Motion capture, also called mocap or mo-cap, is a method used to record very detailed movements of objects or people and store them in a computer system. It is used in the military, entertainment, sports, medicine, and to help test computer vision and robots. In movies, TV shows, and video games, motion capture involves recording the actions of human actors and using that data to create animations for digital characters in 2D or 3D.

Motion capture, also called mocap or mo-cap, is a method used to record very detailed movements of objects or people and store them in a computer system. It is used in the military, entertainment, sports, medicine, and to help test computer vision and robots.

In movies, TV shows, and video games, motion capture involves recording the actions of human actors and using that data to create animations for digital characters in 2D or 3D. When it includes movements of the face, fingers, or small expressions, it is often called performance capture. In many areas, motion capture is also called motion tracking, but in filmmaking and games, motion tracking usually means matching camera movements to real-life footage.

During motion capture sessions, the movements of one or more actors are recorded many times each second. Early methods used multiple cameras to calculate 3D positions, but often the goal is to record only the movement, not the actor’s appearance. This movement data is then applied to a 3D model, making the model move in the same way as the actor. This process is different from an older method called rotoscoping.

Camera movements can also be captured so that a virtual camera in a scene can move, such as panning, tilting, or moving closer to the action, as directed by a camera operator. At the same time, the motion capture system can record the camera, props, and the actor’s performance. This helps computer-generated characters, images, and sets match the perspective of the camera’s real footage. A computer processes the data and shows the actor’s movements, allowing the virtual camera to be positioned correctly within the scene. Using camera movement data from recorded footage to recreate camera positions is called match moving or camera tracking.

The first virtual actor created using motion capture was made in 1993 by Didier Pourcel and his team at Gribouille. They copied the body and face of French comedian Richard Bohringer and used early motion-capture tools to animate the character.

Advantages

Motion capture has several benefits compared to traditional methods of creating 3D computer animation:

  • Motion capture can produce results quickly, almost in real time. This helps save money in entertainment projects that usually rely on setting key positions for animation. An example of this is the Hand Over technique.
  • The amount of work needed does not increase as much when the performance becomes more complex or longer, unlike traditional methods. This makes it easier to try different styles or ways of acting, allowing the performer’s unique qualities to shine through.
  • Motion capture can recreate detailed movements and realistic physical actions, such as how objects move naturally, how weight affects motion, and how forces are shared between objects.
  • Motion capture can create a much larger amount of animation data in the same amount of time compared to traditional animation. This helps reduce costs and meet project deadlines.
  • The use of free software and tools from other companies can lower the overall cost of motion capture.

Disadvantages

  • Special tools and software are needed to collect and use the data.
  • The cost of the software, tools, and people needed can be too high for small projects.
  • The system might need certain space conditions, such as specific camera angles or avoiding magnetic interference.
  • If problems happen, it is easier to reshoot the scene than to fix the data later. Only a few systems let users watch the data as it is being recorded to check if the scene needs to be redone.
  • At first, the results depend on what can be done within the recording area without changing the data.
  • Movements that break physics rules, like floating without touching the ground, cannot be recorded.
  • Traditional animation methods, such as showing character movement before an action (anticipation), smooth motion after an action (follow through), or changing a character’s shape (like squash and stretch), must be added after the data is collected.
  • If the computer model has a different size or shape than the person being recorded, errors might happen. For example, if a cartoon character has large hands, they might overlap the character’s body if the person being recorded moves carelessly.

Applications

Motion capture technology has many uses. It is commonly used in video games, movies, and robotics research. For example, Purdue University uses motion capture to help develop robots.

Video games often use motion capture to create realistic movements for characters. In 1988, motion capture was used to animate characters in games like Vixen and Last Apostle Puppet Show. Later, in the 1990s, motion capture helped create 3D characters in games like Virtua Fighter. Companies like Acclaim Entertainment and Namco also used motion capture for their games. Examples include Crash Bandicoot, Spyro the Dragon, and Dinosaur Planet.

Motion capture is also used for indoor positioning. Researchers use it to test how robots move and sense their environment. Outdoors, GPS and other systems can track movement accurately, but indoors, this is harder. Many motion capture systems work with a software called ROS, which helps researchers test robots.

In aerial robotics, motion capture is used to track drones indoors, where flying is easier than outdoors. Purdue University has the largest indoor motion capture system, called PURT. It uses 60 cameras to track objects with millimeter accuracy, providing a "ground truth" reference for testing other technologies.

Movies use motion capture to create special effects and digital characters. Examples include Gollum in The Lord of the Rings, King Kong, Avatar, and Tron: Legacy. The 1990 film Total Recall used motion capture for a scene with actors walking through an X-ray machine. Other films like Batman Forever and Star Wars: Episode I – The Phantom Menace also used motion capture.

Some animated films, like Happy Feet and Monster House, used motion capture, while others, like Cars, did not. The film Avatar used motion capture to create the Na'vi characters.

Motion capture is also used in television shows, such as Laflaque and Headcases. In virtual reality, it helps users interact with digital content by tracking hand movements.

In medicine, motion capture helps analyze how people move, which is useful for physical therapy. It can track recovery progress and provide personalized treatment plans.

During the making of Avatar, directors used motion capture software to see how actors’ movements looked in real time. This helped them direct scenes more effectively. In The Avengers, motion capture was used to create realistic character movements.

Methods and systems

Motion tracking, also called motion capture, began as a tool used to study movement in biomechanics research during the 1970s and 1980s. Over time, this technology expanded into areas like education, training, sports, and computer animation for television, movies, and video games. Starting in the 20th century, performers wore markers near their joints to help track movement by measuring the positions or angles between these markers. Systems use different types of markers, such as those that use light, sound, movement sensors, magnetic signals, or reflective surfaces, or combinations of these. These systems work best when they record data at least twice as fast as the motion being studied. The accuracy of the system depends on both how clearly it can detect positions (spatial resolution) and how quickly it can record movements (temporal resolution), as unclear images caused by motion blur can create similar problems to low resolution. Since the early 2000s, new methods have been developed because of fast technological progress. Many modern systems can now separate the performer’s outline from the background. After this, joint angles are calculated by matching the outline to a mathematical model. For movements that do not change the performer’s outline, hybrid systems combine marker tracking with silhouette analysis, but they use fewer markers. In robotics, some motion capture systems use techniques that track movement and map environments at the same time.

Optical systems

Optical systems use information from image sensors to find the 3D position of an object by using two or more cameras that are set up to see the same area. Usually, data is collected by placing special markers on people or objects. However, newer systems can track movement by identifying surface features on the subject without needing markers. Adding more cameras helps track more people or cover a larger area. These systems record movement in three directions (up/down, left/right, forward/backward) for each marker. To find how a body part moves, like the angle of an elbow, the system compares the positions of several markers, such as those on the shoulder, elbow, and wrist. Some systems now use both optical sensors and inertial sensors (which measure movement and rotation) to avoid problems when markers are blocked from view, track more people, and reduce the need for manual data editing.

Passive optical systems use markers made of a material that reflects light back toward the camera. The camera can be set to only detect these bright markers, ignoring other parts like skin or clothing.

The center of a marker is estimated as a point in the 2D image captured by the camera. By analyzing the brightness of each pixel, the system can find the exact center of the marker with very high precision.

To prepare the system, an object with markers placed at known positions is used to set up the cameras and measure how the camera lenses distort images. If two cameras see the same marker, they can calculate its exact 3D position. A typical system uses between 2 and 48 cameras, but some systems use over 300 cameras to avoid mistakes where markers are confused with each other. More cameras are needed to fully surround a person or multiple people being tracked.

Some software helps reduce errors where markers are mixed up because all passive markers look the same. Unlike systems that use wires or electronic devices, passive systems use small rubber balls covered in reflective tape, which must be replaced over time. These markers are often attached directly to skin (as in medical studies) or fastened to a special suit made of stretchy fabric. These systems can track many markers at speeds of about 120 to 160 frames per second. If the resolution is lowered and only a small area is tracked, the speed can increase to as high as 10,000 frames per second.

Active optical systems use LEDs (light-emitting diodes) to find positions. One LED is turned on at a time, or multiple LEDs are used with software to identify them based on their positions, similar to how stars are used for navigation. Instead of reflecting light, the markers themselves produce light. Because light brightness decreases with distance, this method allows for capturing movement over larger areas and improves the accuracy of measurements. This results in very precise tracking, often down to 0.1 millimeters within the system's range.

The TV show Stargate SG1 used an active optical system for special effects, allowing actors to move around objects that would be hard to track with other systems.

In Van Helsing, active markers were used to capture the movements of Dracula's flying brides on large sets, similar to how they were used in Rise of the Planet of the Apes. Each marker is powered one at a time, which helps identify them uniquely but may reduce the frame rate. This method is useful for real-time applications. Another way to identify markers is through software that processes data, which requires more computer power.

Some systems use colored LEDs as markers, with each color assigned to a specific body part.

One of the first active marker systems in the 1980s used a mix of passive and active markers, including rotating mirrors and colored glass, with special sensors to detect light.

Active systems can be improved by turning on one marker at a time or changing the brightness of markers over time to help identify them. Systems with 12-megapixel resolution can capture more detailed movements than 4-megapixel systems because they have better clarity and faster recording. Directors can see actors' performances in real time on computer-generated characters. Unique marker IDs help reduce errors and provide cleaner data. LEDs with built-in processing and radio synchronization allow motion capture in bright sunlight, with frame rates from 120 to 960 per second. These systems cost about $20,000 for a setup with eight cameras, 12-megapixel resolution, and a frame rate of 120 hertz for one actor.

Some systems reverse the usual setup by using inexpensive, high-speed projectors with many LEDs instead of high-speed cameras. These projectors use light to mark positions in space. Instead of using reflective or glowing markers, the system uses small sensors attached to objects to decode the light patterns. These sensors can calculate their own position, direction, and how light interacts with them.

These sensors work in normal light and can be hidden in clothing or objects. They can be used in unlimited numbers without confusing each other. Since they don’t need high-speed cameras, they use less data and are easier to handle. They also record how light behaves in the scene, which helps match lighting when adding digital effects. This method is good for real-time motion capture on film sets or virtual sets but has not yet been widely tested.

Motion capture technology has been used by researchers and scientists for many years, helping them learn about movement in fields like biology and engineering.

Underwater motion capture systems use cameras in waterproof cases that resist rust and chlorine, making them suitable for use in pools and basins. There are two types of underwater cameras: industrial high-speed cameras that can also work as infrared cameras, and infrared cameras that use a blue light flash instead of regular infrared light for better visibility underwater. Some high-speed cameras use LED lights or have options for image processing.

Underwater cameras can typically measure distances of 15 to 20 meters, depending on water clarity, the camera type, and the markers used. Clear water allows for the best range, and the area covered also depends on the number of cameras. Many types of underwater markers are available for different uses.

Non-optical systems

Inertial motion capture systems use small sensors, body movement models, and special computer programs to record motion. The sensors, called inertial guidance systems, send movement data wirelessly to a computer, where it is stored or viewed. Most systems use devices called inertial measurement units (IMUs), which include parts that measure rotation, direction, and movement. These measurements are used to create a digital skeleton in software. More sensors usually make the motion data look more natural. These systems do not require cameras or markers for relative movement, but they need something else to find the user's exact position if needed. Inertial systems can record all six directions of movement in real time and provide some direction information if they have a magnetic sensor, though this information is less clear and affected by electrical interference. Benefits include working in many places, no setup needed, portability, and large areas for movement. Problems include less accurate position tracking and errors that grow over time. These systems are similar to Wii controllers but are more precise and faster. They can measure the direction to the ground within one degree. Game developers are using these systems more often because they are easy to set up and work quickly. Prices for these systems range from $1,000 to $80,000.

Mechanical motion capture systems track the angles of body joints and are often called exoskeleton systems because they attach to the body like a skeleton. A performer wears a structure that moves with their body, and the parts of the structure measure their movement. These systems work in real time, are relatively low-cost, do not need cameras, and are wireless with no limits on movement space. They usually have rigid metal or plastic parts connected by joints and sensors that measure movement. These systems cost between $25,000 and $75,000, plus an extra system to find the user's exact position. Some systems also provide limited feedback through touch or pressure.

Magnetic motion capture systems use the strength of magnetic fields from three coils on both the transmitter and receivers to find position and direction. By comparing the electrical signals from the coils, the system can determine movement and orientation. These systems record movement in all six directions with fewer sensors than optical systems, such as one on the upper arm and one on the lower arm to measure elbow movement. However, metal objects like steel bars or wires, and electrical devices like monitors or lights, can interfere with the magnetic field and affect accuracy. The system's response is not always smooth, especially near the edges of the movement area. Wires from the sensors can limit large movements. Magnetic systems allow real-time viewing of motion data. Their movement area is much smaller than optical systems. Magnetic systems are divided into two types: direct-current (DC) systems, which use square electrical signals, and alternating-current (AC) systems, which use wave-like signals.

Stretch sensors are flexible devices that measure stretching, bending, or pressure and are usually made of silicone. When the sensor stretches or compresses, its electrical properties change. This data can be sent through Bluetooth or directly to a computer to detect small body movements. Stretch sensors are not affected by magnetic interference and do not need cameras. They also do not have position errors over time, which is a problem for inertial systems. However, because of the materials used, stretch sensors have weaker signals and need special processing, like filtering or computer programs, to be useful. This processing can slow down the system compared to other types of sensors.

Related techniques

Traditional motion capture systems often use basic facial capture methods that rely on markers placed on the face. These systems use between 32 and 300 markers, which can be either active (using light) or passive (reflecting light). However, these methods are limited by the time needed to apply the markers, adjust their positions, and process the data. The technology also limits the quality and detail of the results.

High-fidelity facial motion capture, also called performance capture, is a more advanced method used to record detailed facial movements that show complex emotions. Current facial capture systems are divided into different types, including traditional motion capture data, systems that use pre-defined facial shapes, systems that map the actual shape of an actor’s face, and specialized systems.

Two main techniques are used in facial capture. One involves using multiple cameras to record facial expressions from different angles and software like the stereo mesh solver from OpenCV to create a 3D model of the face. Another method uses light sources to calculate the surface details of the face based on changes in brightness as the light or camera moves. These methods are limited by the camera’s resolution, the size of the face in the camera’s view, and the number of cameras used. If the face takes up half the camera’s view and the camera has high resolution, very small facial movements can be detected by comparing images. Recent efforts focus on increasing the speed of recording and using optical flow to transfer these movements to other digital faces, rather than just creating a 3D model of the actor’s face.

Radio frequency systems are becoming more useful because higher frequency signals allow for more precise tracking than older technologies like radar. Light travels at 30 centimeters per nanosecond, so a 10 gigahertz signal can achieve accuracy of about 3 centimeters. By measuring signal strength to a quarter of a wavelength, accuracy can improve to about 8 millimeters. To match the precision of optical systems, frequencies of 50 gigahertz or higher are needed. However, these systems face challenges like signal interference and reflections. They may be ideal for tracking large areas with less detailed accuracy, as high precision is not required over long distances. Many scientists believe radio frequency systems will not achieve the accuracy needed for motion capture.

In 2015, researchers at the Massachusetts Institute of Technology developed a system that tracks motion using radio frequency signals.

Another method involves placing the actor inside a rotating sphere, similar to a hamster ball, that contains sensors to record movement without external cameras or equipment. While this could reduce costs, the sphere can only track movement in one direction. Additional sensors would be needed to capture more complex movements.

An alternative is using a 6DOF motion platform with an omni-directional treadmill and high-resolution optical motion capture. This allows the actor to move freely, even on uneven surfaces. This technology is used in medical rehabilitation, biomechanical research, and virtual reality.

In 3D pose estimation, an actor’s body position can be determined using an image or depth map.

More
articles