22/1/15 11:00 Updated with specs from , , , , a comment on resolution vs FOV, and an update on the HPU location from .
HoloLens blows me away with its possibilities. I love my Oculus Rift DK2, and Virtual Reality is perfect for when you want to just concentrate on the computer world, but I’ve always been keen to see a good Augmented Reality solution. HoloLens may be it – check it out at .
There had been rumours of MS working on something like this for a while – for example patent applications have been filed.  But no-one seemed to expect such a mature offering to be announced already, and with a possible early release in July 2015, with wider availability Autumn 2015 when Windows 10 is released. If the HoloLens, with the Windows 10 Holographic UI, deliver as announced then I’ll be buying.
Speaking of which, for all Microsoft’s talk of “Hologram” this and “Hologram” that, as far as I can see no holograms are being used. Instead, “Hologram” here is MS marketing speak for Augmented Reality. Their use of the word is innaccurate and misleading, but frankly is also more understandable to normal consumers, so it’s entirely understandable.
With that out of the way, here’s a bit of analysis of the HoloLens and the Windows Holographic UI. Note that I haven’t seen or touched one of these in person, so take everything with a big pinch of salt….
There are two sets of output supported – a “Holographic” display, and Spatial Audio.
The most stand-out feature is the “Holographic” display. This appears to be using an optical HMD with some kind of waveguide combiner. That’s what those thick lenses are. This is also touched on in the MS patent filing .
An important question is what the focal length is set to? Does it vary? To explain the importance of this let’s do a quick experiment. Put your hand out in front of you. Look at it, and you’ll notice the background gets blurry. Look at the background behind your hand – now your hand gets blurry. That’s because the lenses of your eyes are changing to focus on what you’re looking at.
If the focal length on the display is fixed, then the display will be out of focus some of the time. Looking at write-ups, people appear to have used the display at ranges from 50cm up to several metres – and with no comments about blurry visuals. It appears therefore that the optics are somehow either changing the focal length of the display, or are “flattening” the world at large, so that your eyes don’t need to change focal length between short and long ranges.
The waveguide is a way to shine light into your eyes, but if the world outside is too bright then you would have problems seeing the display. Therefore the front screen is tinted. A question is how much it is tinted – too little and you won’t be able to see the display in bright conditions, and too much and you won’t be able to see the outside world in darker conditions. It’s possible they’re photochromic and get darker when exposed to bright light.
I’ve attempted to estimate the dimensions of the display, but these should be taken with a massive pinch of salt. See the Maths section below for where I got the numbers from. My estimate is that the display, per eye, is around 5.6cm wide and 4cm high, and are 1-2.1cm away from the users’ eyes. Given that, that equates to approximately 80-120 degrees vertical Field of View, and 100-140 degrees horizontal field of view. If accurate, then that’s pretty impressive, and is broadly on par with the Oculus Rift.
Since the initial presentation, other write-ups have implied my initial estimate was wildly optimistic.  asserts 40×22 degrees, whereas  provides two estimates of 23 degrees and 44 degrees diagonal. Descriptions state that the display appears to be quite small – much smaller than that of the Oculus Rift DK2.
I don’t have any information on the resolution of the display. Microsoft have stated “HD” however that can mean many things – for example is that HD per eye, HD split between the two eyes? It should be noted as well that HD is pretty rubbish resolution for a display with a large field of view – put your face right next to your laptop or tablet screen and see how pixellated things suddenly look. There are some tricks that could be done if eye tracker is being used (see the Eye Tracker section) to greatly improve apparent resolution.
The write-ups I’ve seen implied that resolution wasn’t bad at all, so this will be something to keep an eye on.  asserts somewhere between 4Mpx (2.5k) and 8Mpx (4k).
It should be noted that the human eye has around a 0.3-0.6 arc-minute pixel spacing, which equates to 100-200 pixels per degree. The “Retina” display initially touted by Apple was around 53 pixels per degree. 
The audio aspect of gaming and computers in general has been quite poor for a while now. The standard is stereo, maybe with a subwoofer in a laptop or PC. Stereo can give you some hints about location, but full 5.1 surround sound has been a rarity for most PC users. There are some expensive headphones which support it, but these don’t work properly when you turn your head away from the screen – not ideal with a head-mounted display. It’s notable therefore that HoloLens supports spatial audio right out of the box.
With spatial audio DSPs are used to simulate surround sound, and furthermore it will take into account the direction you’re facing. It’s amazing how useful this is for understanding your surroundings – a lesson that Oculus has also learnt with their latest prototypes of the Oculus Rift.
Reports from the HoloLens imply it’s using some kind of speaker(s) rather than headphones. Questions remain about how directional the sound is (i.e. can other people hear what you’re hearing), how loud the audio is, and how good the fidelity is.
The HoloLens appears to be festooned with sensors, which makes sense given that it is supposed to be a largely standalone device.
Outward facing cameras
Either side of the headset are what look like two cameras. Alternately they may be a camera and an LED transmitter, as used by the MS Kinect. Either way, these cameras provide two sets of information to the computer. Firstly they detect the background and must provide a depth map of some kind – likely using similar techniques and APIs to the Kinect. Secondly, they detect hand movement and so are one of the sources of user input.
The background detection is used for ‘pinning’ augmented reality to the real world – when you turn your head you expect items in the virtual world to remain in a fixed location in the real world. That’s really hard to do, and vital to do well. The simplest way to do it is through the use of markers/glyphs – bits of paper with specific patterns that can be easily recognized by the computer. HoloLens is doing this marker-less, which is much harder.Techniques I’ve seen use algorithms such as PTAMM to build a ‘constellation’ of edges and corners, and then lock virtual objects to these.
Reports seem pretty positive about how this works, which is great news. A big question though is how it works in non-ideal lighting – how well does it track when it’s dark/dim or very bright, there’s moving shadows, etc. For example, what if you’re in a dim room with a bright TV running in the background, casting a constantly changing mix of light and dark around the room?
As mentioned, the cameras are also used for hand tracking. The cameras are very wide angle apparently, to be able to watch hands in a wide range of movement, however many questions remain. These include how well the tracking works when hands cross over, become fists, and turn. Some finger tracking must be performed judging by the click movement used in many of the demos – are all fingers tracked? And how is this information made available to developers.
During some of the demos the demonstrators have said that the HoloLens can tell where you’re “looking” – indeed that is used extensively to interface with the UI. This may be based on just the orientation of the head, or some reports seem to imply that there’s actual eye tracking.
If there is eye tracking, then there’s likely cameras (possibly in that protuberance in the center) tracking where the user’s pupils are. That would be very cool if so, as it provides another valuable interface for user input, but it could also provide even more.
When tracking the pupil, if the display can ‘move’ the display to different parts of the waveguide, then the display could do this to always provide higher resolution display at the location you’re looking at, without having to waste the processing power of having a high resolution over the whole display. Thus you could get an apparently high resolution over a broad field of view, with a display that only actually displays a high resolution over a small field of view.
Also, by analysis of how the pupils have converged, the computer can judge how far away your looking. For example – put your hand out in front of you and focus on one finger. Move the hand towards and away from your face, and you’ll feel your eyes converging as the finger gets closer – watch someone else’s eyes and you’ll see this clearly. If the computer can judge how far away you’re looking then it could change the focal length of the display itself, so that the display still appears in focus. It could also provide this information to any APIs – allowing a program to know for example which object the user is looking at when there’s a stack of semi-transparent objects stacked behind each other.
A microphone is built-in, which can be used both for VoIP such as Skype, and also as a source of user input using Cortana or similar. Questions include quality, and directionality – will the microphone pick up background noise?
The headset obviously detects when you move your head. This could be detected by the cameras, but the latency would likely be too large – Oculus have found latency of 20ms is a target, and anything over 50ms is absolutely unacceptable. Therefore there are likely gyros and accelerometers to quickly detect movement. Gyros drift over time, and while accelerometers can detect movement they become innaccurate when trying to estimate the net movement after several moves. Therefore it’s likely the external cameras are periodically being used to recalibrate these sensors.
Given that this headset is supposed to be standalone, it’s possible the headset also includes GPS and WiFi for geolocation as well.
I would be amazed if HoloLens doesn’t include Bluetooth support, which would then allow you to use other input devices, most notably a keyboard and mouse. Using a mouse may be more problematic – you need to map a two dimensional movement into a three dimensional world, however mice are vastly more precise for certain things.
One surprise in the launch was that no connection to a PC/laptop was needed. Instead, the HoloLens is supposed to be standalone. That said, all the computing isn’t done in the handset alone. According to  there’s also a box you wear around your neck, which contains the processor. Exactly what is done where – in the headset or the box – hasn’t been described, but we can make some educated guesses. And all this is directly related to the new Holographic Processor Unit (HPU).
Latency is king in VR/AR – head movement and other inputs need to be rapidly digested by the computer and updated on the display. If this takes longer than 50ms, you’re going to feel ill. Using a general-purpose CPU, and graphics processor unit (GPU) this is achievable but not easy. If your CPU is also busy trying to understand the work – tracking hand movements, backgrounds, cameras, etc – then that gets harder.
Therefore the HPU seems to be being used to offload some of this processing – the HPU can combine all the different data inputs and provide them to applications and the CPU as a simple, low bandwidth, data stream. For example, rather than the CPU having to parse a frame from a camera, detect where hands are, then identify finger locations, orientation, etc, the HPU can do all this and supply the CPU with a basic set of co-ordinates for each of the joints in the hands.
Using a specialist ASIC (chip) allows this to be done fast, and in a power-efficient manner. The HPU does a small number of things, but does them very very well.
I mentioned bandwidth a moment ago, and this provides a hint of where the HPU is. Multiple (possibly 4-6) cameras at sufficiently high frame rates result in vast amounts of data being used every second. This could be streamed wirelessly to the control box, but that would require very high frequency wireless which would be wasteful for power. If, however, the HPU is in the headset then it could instead stream the post-processed low-bandwidth data to/from the control box instead.
Where to put the GPU is a harder question – a lot of data needs to be sent to the graphics memory for processing, so it’s likely that the GPU is in the control box, which then wirelessly streams the video output to the headset.
Since my writeup,  has come out which states that in the demo/dev unit they used the HPU was actually worn around the neck, with the headset tethered to a PC. It’s unknown what this means for the final, release, version, but it sounds like there’s a lot of miniaturisation and optimisation needed at the moment.
While the HoloLens has been designed to be standalone (albeit with the control/processor box around your neck), a big question is whether it will support other control/processor boxes – for example will it be possible to interface HoloLens with a laptop or PC. This would allow power users willing to forego some flexibility of movement (depending on wireless ranges) to use the higher processor/GPU power in their non-portable boxes. This may require some kind of dongle to handle the wireless communication – assuming some non-standard wireless protocols are being used, possibly at a non-standard frequency – e.g. the 24GHz ISM band instead of the 2.4GHz used for WiFi and Bluetooth, or the 5.8GHz used for newer WiFi. My hope is that this will be supported.
Windows 10 Holographic UI
Windows 10 will support HoloLens natively – apparently all UIs will support it. This could actually be a lot simpler to implement than you’d imagine. Currently, each window on Windows has a location (X,Y) and a size(width,height). In a 3D display, the location now has to have a new Z co-ordinate (X,Y,Z), and a rotation around each axis (rX, rY, rZ). That provides sufficient information to display windows in a 3D world. Optionally you could also add warps to allow windows to be curved – that’s just a couple of other variables.
Importantly, all of this can be hidden from applications unless they want the information. An application just paints into a window, which Windows warps/transforms into the world. An application detects user input by mouse clicks in a 2D world, which Windows can provide by finding the intersection between the line of your gaze and the plane of the window.
So most applications should just work.
Furthermore, as the HoloLens will be using Windows 10, maybe it’s more likely that other platforms (e.g. laptops) also running Windows 10 will be able to interface with the headset.
That said, many developers will be excited to operate in a 3D world, and that’s where the APIs come in. The Kinect libraries were a bit of a pain to work with, so hopefully MS have learnt some lessons there. The key thing will be to provide a couple of different layers of abstraction for developers, to allow devs the flexibility to do what they want, but have MS libraries do the heavy lifting when possible. MS hasn’t a great history of this – with many APIs not providing easy access to lower level abstractions, so this will be something to watch.
It will also be interesting to see how the APIs and Holographic UI interoperate with other head mounted displays such as the Oculus Rift. Hopefully some standards can be defined to allow people to pick and choose their headset – there are some use cases that VR is better for than AR, and vice versa.
As ever with an announcement like this, there are many questions. However it’s impressive that Microsoft felt the product mature enough to provide journalists with interactive (albeit tightly scripted) demonstrations. Some of the questions, and things to look out for, include:-
– What is the actual resolution, Field of View, and refresh rate?
– Is there really eye tracking?
– How well does the AR tracking work, especially in non-ideal lighting?
– What is the battery life like?
– How well does the Holographic Interface actually work?
– What is the API, and how easy is it code against?
– What is the performance like, playing videos and games for example – given that games are very reliant on powerful GPUs?
– Can the headset be used with other Windows 10 platforms?
– Can other headsets be used with the Windows 10 Holographic UI?
– Patent arsiness: MS has filed several recent patents in this space, are they going to use these against other players in this space, or are they primarily for defensive use?
You may wonder how I came up with the estimate of Field of View. For source material I used several photos, some information on Head Geometry, and a bit of trigonometry.
Firstly, by looking at the photos in figures 2, 3, and 4 I estimated the following:-
– The display (per eyes) was around 110×80 pixels
– The display runs horizontally from roughly level with the outside of the eye, and is symmetrical around the pupil when looking dead ahead
– The display is somewhere between halfway between the depression of the nose between the eyes (sellion) and the tip of then nose, and the tip.
From this, we can get the following information, using the 50th percentile for women:-
– Eye width: 5.6cm (#5-#2 in , assuming symmetry of the eye around the pupil)
– Screen distance: 1cm to 2.1cm (#12-#11 in )
Given the 110×80 pixel ratio, that gives a height of around 4cm. Using the simple trig formula from figure 5, where tan X = (A/2)/B we can punch in some numbers.
Horizontally: A = 5.6cm, B=1 to 2.1cm, therefore C=70.3 to 53 degrees
Vertically: A=4cm, B=1 to 2.1cm, therefore C=63.4 to 43.6 degrees
Note that the field of view is 2 x C.
 provides a different estimate of the size of the display: “A frame appears in front of me, about the size of a 50-inch HDTV at 10 feet, or perhaps an iPad at half arm’s length.” This results in the following estimates:-
– 50-inch (127cm) (A) at 10 feet (305cm) (B) => C = 11.8 degrees diagonal
– iPad (9.7 inch, 24.6cm) (A) at half arm’s length (60cm/2) (B) => C = 22.3 degrees diagonal
 estimates 40×22 degrees, 60Hz 4Mpx(2.5k)/8Mpx(4k)
 Head Dimensions http://upload.wikimedia.org/wikipedia/commons/6/61/HeadAnthropometry.JPG