Before discussing the future of panoramic video, let’s first understand how panoramic video is achieved.

Latest update time：2015-12-17

Reads：

[Editor's Note] The author of this article is Wang Rui, a senior practitioner in the VR industry.

Perhaps it is in the past one or two years, as the VR craze surges, the word "panorama" has been brought up again and again, and then given various names such as "virtual reality", "3D reality", "360 degrees", "720 degrees", etc., so that it is regarded by many people as the main synonym for the specific presentation form of virtual reality content.

Indeed, the lack of VR content has attracted the attention of more and more developers and business groups, and panoramic pictures and videos, as well as VR movies, which are still in their infancy, will undoubtedly become a good way to implement content. It can bring viewers a fully immersive experience without requiring too many interactive methods and the resulting learning costs, and can achieve various extreme effects through offline rendering and photography.

So, what is the definition and implementation process of panorama, and how can people construct panorama content? This article will try to explain it from multiple different key angles, hoping to be helpful to the creators who come after them.

1. Projection method

Panoramic photography is not a new concept. In fact, it can be traced back to the 12th century "Night Banquet of Han Xizai":

Of course, this is not a truly immersive experience. Even if we roll this long painting into a cylinder and stand in the center to watch it, we will still feel that something is missing. Yes, there is an obvious seam, and blank areas above and below the head.

The reason for this problem is very simple, because the people of the Song Dynasty did not intend to make this painting an immersive experience - of course this is nonsense - the real reason is that the physical space field of view corresponding to the picture does not reach the level of full enclosure, that is, 360 degrees in the horizontal direction (longitude) and 180 degrees in the vertical direction (latitude). Yes, speaking of this, you must have thought of this picture:

A world map like this may have been on the wall of your home for some years, and you may have never looked at it since you went to college, but it meets all the requirements of a panoramic picture. If you put it into various VR glasses to view it, it's like being surrounded by the whole world .

This mathematical process that can correctly unfold the real scene of the entire physical field of view onto a 2D picture and restore it to VR glasses for immersive viewing is called projection.

The seemingly ordinary world map uses a common projection method called Equirectangular. Its characteristic is that the image size in the horizontal perspective can be well maintained, while the vertical perspective, especially when approaching the poles, will undergo infinite size stretching.

The stretching phenomenon of this projection method is more obvious in the picture below. Pay attention to the changes in the texture on the dome. The closer to the top of the picture, the more severe the distortion. Fortunately, the significance of VR helmets and application software is to restore these obviously deformed pictures to full-view content, so that users can have an immersive sense of envelopment.

However, there are more than one way to project panoramic images. For example, the recently released Ricoh Theta S and Insta360 panoramic cameras use another simpler and more effective projection strategy:

The images output by its two fisheye cameras each cover a 180-degree horizontal and vertical field of view. Putting the two output results together creates an immersive surround volume with a full field of view.

Of course, the 2D images generated by this kind of projection method, called Fisheye, are actually more distorted. When the images are transformed and displayed in VR glasses through image reprojection, they are limited by the image sampling frequency (or in layman's terms, the pixel size), and such distortions will be restored to a certain extent, which may also cause a decrease in the quality of the panoramic content itself.

From this point of view, as an important carrier matrix of panoramic content, the projected image (or video) should not only completely contain all the content shot, but also avoid excessive distortion to avoid quality loss when re-projected onto VR glasses.

So, in addition to the two projection methods mentioned above, are there more options to choose from? The answer is, of course, and there are plenty!

For example, the Mercator projection has smaller stretching deformation along the axis than the Equirectangular projection, and the proportion of the actual scene is more realistic, but it can only express about 140 degrees of content in the vertical direction.

Another example is the Equisolid projection, which is also called the "asteroid" or "720-degree" panorama. It can even display a 360-degree vertical field of view, but the premise is that the user does not care about the quality loss caused by the huge distortion:

So, is there any projection method that can generate an image that can cover at least 360 degrees horizontally and 180 degrees vertically without any distortion of the image?

The answer is: there is no way to project a single image without distortion. However, if the resulting image is not a single image, there are ways:

If you happen to be a graphics developer or virtual reality software developer, this picture should be very familiar to you. This is Cubemap.

It is equivalent to a cubic box composed of six images. If the observer is located at the center of the cube, each image will correspond to a surface of the cube, and in physical space it is equivalent to a field of view of 90 degrees horizontally and vertically. The observer is surrounded by these six images at the center, and the final field of view can also reach 360 degrees horizontally and 360 degrees vertically, and there is absolutely no distortion or deformation of the image.

as follows:

This is an ideal projection result, and if you happen to know how to use some offline rendering software or plug-ins to produce and output panoramic content, this must be the most suitable choice. However, in actual shooting, it is almost impossible for us to use this cubic map recording method, the reason is very simple - our existing shooting equipment is difficult to do so.

2. Splicing and fusion

If there are six cameras, and their FOV angles are strictly limited to 90 degrees both horizontally and vertically, and then a meticulous bracket is built, and these six cameras are firmly and stably mounted on the bracket, ensuring that their center points strictly coincide with each other and each faces the same direction - in this way, the output image may just meet the standards of the cube map and can be used directly.

However, no matter the photosensitive area of the camera lens, the focal length parameters (and the FOV angle calculated from it), or the steel structure design and production of the bracket, it is impossible to ensure that the above parameters are accurately met. A few mm of optical or mechanical errors may seem harmless, but for the tightly fitted cubic image, it will inevitably leave one or more obvious cracks in the final immersive scene. What's more, there are also vibration problems caused by the movement of the bracket and focus offset problems caused by the aging of the camera lens. These seemingly minor troubles are enough to make the ideal physical model we just built come to nothing.

The gap between ideal and reality is so huge, but fortunately we still have a solution - yes, if we leave enough redundancy at the stitching point, and then correctly identify and process the overlapping area of the two camera images, then we can output six pictures and compose panoramic content - and this is another magic weapon for panoramic content production, image stitching and edge fusion.

The picture below is the 360Heros series panoramic camera.

It uses 6 GoPro sports cameras and a bracket to assist in shooting. The six cameras face different directions. If the 4X3 wide viewing angle setting is used, the horizontal and vertical FOV angles are approximately 122 degrees and 94 degrees.

In the panoramic video stitching and output software, read the input streams or video files of the six cameras and set their actual position information on the bracket (or directly obtain the posture information recorded by the digital camera itself). In this way, we can obtain video content that is sufficient to cover the entire field of view.

As we described before, because precise alignment is impossible, it is necessary to provide necessary redundancy in the field of view of each camera. Therefore, the resulting video images will overlap to a certain extent. When the panoramic image is directly output, there may be obvious overlapping areas or incorrect edges.

Although several common panoramic video processing tools, such as VideoStitch and Kolor, have a certain degree of automatic edge blending function, we still have to manually crop and adjust these edge areas (for example, PTGui is used in the figure below to correct the seams of each picture), select edge areas with higher image quality or less distortion, and ensure that the pictures are strictly aligned.

This work is time-consuming and labor-intensive, and there is an important prerequisite, that is, the image as the input source must be able to cover the entire 360-degree field of view and have redundancy.

As we calculated before, if six cameras are assembled, the FOV angle of each camera should not be less than 90 degrees. For GoPro Hero3 series cameras, the 4x3 wide field of view mode must be used at this time. If the aspect ratio is 16x9, the vertical FOV angle may not reach the required value, resulting in the problem of "cannot be spliced anyway" - of course, we can avoid this problem by adjusting the orientation angle of each camera on the bracket, or increasing the number of cameras, but from any perspective, using a wide field of view camera with an aspect ratio close to 1x1 is a more ideal choice.

If you just want to output a panoramic picture, then the above steps are usually more than enough, and you don't need to consider more. However, it is difficult for people wearing VR helmets to scream with motionless pictures. It is more exciting to see dynamic scenes of war or ghosts around you. If you are considering how to make such a VR movie, there is a question that must be raised, that is -

Synchronization - in simple terms, it is how all the cameras you have start at the same time and maintain consistent frame rates during recording.

This may not seem like a big deal, but if the start times of the two cameras are inconsistent, it will directly affect their alignment and stitching results - even if there are a lot of dynamic elements in the scene or the camera position changes during the process, the result may not be aligned at all. Therefore, for panoramic shooting work that requires a large number of cameras to participate at the same time, the need for synchronized start and synchronized recording becomes particularly important.

To fundamentally solve this problem from the hardware, you can use the "genlock" technology, that is, to control the synchronization of each camera by transmitting time code through an external device (a typical example is the Red One professional movie camera). Of course, not all cameras have a dedicated Genlock interface. In this case, you can also consider some traditional or slightly "copycat" synchronization methods, such as: shouting when you see injustice on the road...

At the beginning of the filming, the actor roars or claps his hands hard. Then, during the splicing process, the time node corresponding to the roar in each video is found as the synchronization start position, and then the panoramic video is spliced. Although this method is not very accurate, it also does not incur any additional costs; however, after ensuring the basic synchronization start position, the fine adjustment and splicing of the video will undoubtedly simplify the difficulty of post-production to a considerable extent.

A similar method is to cover all cameras with black cloth, and then quickly remove it when filming begins, etc. In short, when the hardware conditions are not fully met, it is time for everyone to show their magical powers.

3. Stereoscopic and pseudo-stereoscopic

You may have noticed that all the panoramic video shooting processes discussed above have overlooked a key point: no matter what projection method is used, the generated content is only a 360-degree panoramic content. There is no problem in watching it on a PC or web page, but if you want to input such content into a VR helmet display, the result may be incorrect. In order to give the picture a three-dimensional sense and present it to the human eye, the content we provide must be displayed in a horizontally separated mode for the left and right eyes:

This seems to be just a copy of the original panoramic picture, but if you observe carefully, you will find that there is a certain offset between the left and right pictures near the border of the picture. Because there is a certain difference in the perspective of human eyes, the images seen by each eye are somewhat different, and then the brain can get a three-dimensional feeling through calculation. The closer the scenery is to the human eye, the more obvious the parallax is, and the scenery in the distance has relatively weak three-dimensional feeling.

Any existing VR glasses need to ensure through structural design that the wearer's left and right eyes can only see half of the actual screen, that is, they can see the separated left and right eye images respectively, thereby simulating the real operating mechanism of the human eye.

In this case, the panoramic content shooting equipment also needs to make some corresponding changes, such as changing the original 6 cameras to 12 cameras, that is, there are two cameras for the left and right eyes responsible for shooting in each direction; the construction form of the bracket is therefore very different from the original design (the picture shows 360 Heros3 Pro12, which uses 12 GoPro sports cameras).

There is nothing special to do for the stitching and fusion software. It just needs to read the six video streams twice, process them and output two different panoramic videos, one for the left eye and one for the right eye. Then, they can be merged into one picture through post-production tools or applications.

Of course, there are many alternative ways, such as Panono, which shocked Kickstarter in 2011 but has not been released on schedule even though VR panoramic applications are popular today. Its design principle is to shoot with 36 cameras evenly distributed on the sphere, and then stitch them together to obtain panoramic images for the left and right eyes.

Although this design looks very fancy, it is actually the same in essence: the images taken by 36 cameras facing different directions are superimposed together to cover the horizontal and vertical 360-degree field of view, and it can definitely cover it twice! Coupled with its own precise structural design and installation posture, it can accurately calculate the stitched panoramic image from the inside, and directly output the video stream or file according to the standard of the left and right eye images. The actual resolution that can be output is also quite impressive.

Similar products include Bublcam (four ultra-wide-angle lenses throughout the sphere), Nokia's OZO (eight wide-angle lenses throughout the sphere), and Jaunt's products under development, etc. They all have the ability to directly output panoramic content in stereoscopic form.

Of course, in the worst case scenario, we still have a choice, which is to create a three-dimensional model ourselves...

The original panoramic image is copied into two copies, one of which is offset slightly to the left and the other slightly to the right, and then a slight perspective transformation is performed on each (in order to simulate the deflection of the sight angle). The "stereo" image thus formed also has a certain stereo deception effect in most cases, but for nearby scenes, or when there is an occlusion relationship between the scenes in the left and right eye images (such as simulating a face pressed against a door with one eye blocked by a latch), there will be obvious flaws. Of course, for enthusiasts who are still in the ignorant stage of VR panoramic content, this may not be a serious problem for the time being.

Click "Read original text" to apply for free Try "Pico 1 Virtual Reality Helmet" for a total of 10 units, and you can get it for free if you succeed

WeChat ID: leiphone-sz

Long press the QR code on the left to close

Note

Latest articles about

■Database "Suicide Squad"

■Exclusive: Yin Shiming takes over as President of Google Cloud China

■After more than 150 days in space, the US astronaut has become thin and has a cone-shaped face. NASA insists that she is safe and healthy; it is reported that the general manager of marketing of NetEase Games has resigned but has not lost contact; Yuanhang Automobile has reduced salaries and laid off employees, and delayed salary payments

■Exclusive: Google Cloud China's top executive Li Kongyuan may leave, former Microsoft executive Shen Bin is expected to take over

■Tiktok's daily transaction volume is growing very slowly, far behind Temu; Amazon employees exposed that they work overtime without compensation; Trump's tariff proposal may cause a surge in the prices of imported goods in the United States

■OpenAI's 7-year security veteran and Chinese executive officially announced his resignation and may return to China; Yan Shuicheng resigned as the president of Kunlun Wanwei Research Institute; ByteDance's self-developed video generation model is open for use丨AI Intelligence Bureau

■Seven Swordsmen

■A 39-year-old man died suddenly while working after working 41 hours of overtime in 8 days. The company involved: It is a labor dispatch company; NetEase Games executives were taken away for investigation due to corruption; ByteDance does not encourage employees to call each other "brother" or "sister"

■The competition pressure on Douyin products is getting bigger and bigger, and the original hot-selling routines are no longer effective; scalpers are frantically making money across borders, and Pop Mart has become the code for wealth; Chinese has become the highest-paid foreign language in Mexico丨Overseas Morning News

■ByteDance has launched internal testing of Doubao, officially entering the field of AI video generation; Trump's return may be beneficial to the development of AI; Taobao upgrades its AI product "Business Manager" to help Double Eleven丨AI Intelligence Bureau