Apps should be able to see, and with that, they should be able to understand the world. In the sixth blog post in the series, we will cover exactly that, how to build UWP apps that take advantage of the camera found on the majority of devices (including the Xbox One with the Kinect) and build a compelling and intelligent experience for the phone, desktop, and the Xbox One. As with the previous blog posts, we are also open sourcing Adventure Works, a photo capture UWP sample app that uses native and cloud APIs to capture, modify and understand images. The source code is available on GitHub right now, so make sure to check it out.
If you missed the previous blog post from last week on Internet of Things, make sure to check it out. We covered how to build a cross-device IoT fitness experience that shines on all device form factors and how to use client and cloud APIs to make a real time connected IoT experience. To read the other blog posts and watch the recordings from the App Dev on Xbox live event that started it all, visit the App Dev on Xbox landing page.
Adventure Works
Adventure Works is a photo capture UWP sample app that takes advantage of the built in UWP camera APIs for capturing and previewing the camera stream. Using Win2D, an open source library for 2D graphics rendering with GPU acceleration, the app can enhance any photo by appling rich effects or filters, and by using intelligent Cognitive Services API it can analyze any photos to auto tag and caption it appropriately, and more importantly, detect people and emotion.
Camera APIs Camera and MediaCapture API
The first thing we need to implement is a way to get images into the app. This can be done via a variety of devices; a phone's forward facing camera, a laptop's integrated webcam, a USB web cam and even the Kinect's camera. Fortunately, when using the Universal Windows Platform we don't have to worry about the low level details of a camera because of the MediaCapture API. Let's dig into some code on how to get the live camera stream regardless of the Windows 10 device you're using.
To get started, we'll need to check what cameras are available to the application and check if any of them are front facing cameras:
var allVideoDevices = await DeviceInformation.FindAllAsync(DeviceClass.VideoCapture); var desiredDevice = allVideoDevices.FirstOrDefault(device => device.EnclosureLocation != null && device.EnclosureLocation.Panel == Windows.Devices.Enumeration.Panel.Front); var cameraDevice = desiredDevice ?? allVideoDevices.FirstOrDefault();
We can query the device using DeviceInformation.FindAllAsync to get a list of all devices that support video capture. What you get back from that Task is a DeviceInformationCollection object. From there you can use LINQ to get the first device in the list that reports being in the front panel.
The next line of code covers the scenario where the devices doesn't have a front facing camera; in that case it just gets the first camera in the list. This is a good fallback for devices that don't report being in the panel or the device just doesn't have a front facing camera.
Now it's time to initialize MediaCapture APIs using the selected camera.
_mediaCapture = new MediaCapture(); var settings = new MediaCaptureInitializationSettings { VideoDeviceId = _cameraDevice.Id }; await _mediaCapture.InitializeAsync(settings);
To start this stage, instantiate a MediaCapture object (be sure to keep the MediaCapture reference as a class field because you must Dispose when you're done using it later on). Now we create a MediaCaptureInitializationSettings object and use the camera's Id to set the VideoDeviceId property. Finally, we can initialize the MediaCapture by passing the settings to the InitializeAsync method.
At this point we can start previewing the camera, but before we do, we'll need a place for the video stream to be shown in the UI. This is done with a CaptureElement:
<CaptureElement Name="PreviewControl" Stretch="UniformToFill"></CaptureElement>
The CaptureElement has a Source property; we set that using the MediaCapture and then start the preview:
PreviewControl.Source = _mediaCapture; await _mediaCapture.StartPreviewAsync();
There are other considerations like device rotation and resolution, which the MediaCapture has easy to use APIs to access and modify those properties of the device and stream. Take a look at the Camera class in Adventure Works for a full implementation.
Effects
Now that we have a video stream, we can do a number of things above and beyond just taking a photo or recording video. Today, we'll discuss a few possibilities: applying a photo effect with Win2D, applying real time video effect using Win2D and real time face detection.
Win2D
Win2D is an easy-to-use Windows Runtime API for immediate mode 2D graphics rendering with GPU acceleration. It can be used to apply effects to photos, which is what we do in the Adventure Works demo application after a photo is taken. Let's take a look at how we accomplish this.
At this point in the app, the user has already taken a photo, the photo is saved in the app's LocalFolder, and the PhotoPreviewView is shown. The user has chosen to apply some filters by clicking the "Filters" AppBarButton, which shows a GridView with a list of photo effects they can apply.
Okay, now let's get to the code (note that the code is summarized, checkout the sample app for the full code in context). The PhotoPreviewView has Win2D CanvasControl in main section of the view:
<win2d:CanvasControl x:Name="ImageCanvas" Draw="ImageCanvas_Draw"/>
When the preview is intially shown, we load the image from the file into that Canvas. Take note that Invalidate() forces the bitmap to be redrawn:
_file = await StorageFile.GetFileFromPathAsync(photo.Uri); var stream = await _file.OpenReadAsync(); _canvasImage = await CanvasBitmap.LoadAsync(ImageCanvas, stream); ImageCanvas.Invalidate();
Now that the UI shows the photo, the user can select an effect from the list. This fires the GridView's SelectionChanged event and in the event handler we take the user's selection and set it to a _selectedEffectType field:
private void Collection_SelectionChanged(object sender, SelectionChangedEventArgs e) { _selectedEffectType = (EffectType)e.AddedItems.FirstOrDefault(); ImageCanvas.Invalidate(); }
Since calling Invalidate forces a redraw, it will hit the following event handler and use the selected effect:
private void ImageCanvas_Draw(CanvasControl sender, CanvasDrawEventArgs args) { var ds = args.DrawingSession; var size = sender.Size; ds.DrawImageWithEffect(_canvasImage, new Rect(0, 0, size.Width, size.Height), _canvasImage.GetBounds(sender), _selectedEffectType); }
The DrawImageWithEffect method is an extension method found in EffectsGenerator.cs that takes in a specific EffectType (also defined in EffectsGenerator.cs) and draws the image to the canvas with that effect.
public static void DrawImageWithEffect(this CanvasDrawingSession ds, ICanvasImage canvasImage, Rect destinationRect, Rect sourceRect, EffectType effectType) { ICanvasImage effect = canvasImage; switch (effectType) { case EffectType.none: effect = canvasImage; break; case EffectType.amet: effect = CreateGrayscaleEffect(canvasImage); break; // ... } ds.DrawImage(effect, destinationRect, sourceRect); } private static ICanvasImage CreateGrayscaleEffect(ICanvasImage canvasImage) { var ef = new GrayscaleEffect(); ef.Source = canvasImage; return ef; }
Win2D provides many different effects that can be applied as input to the built in Draw methods. A simple example is the GrayscaleEffect which simply changes the color of each pixels, but there are also effects that can do transforms and much more.
Win2D Video Effects
You can do a lot with Win2D and the camera. One more advanced scenario is to use Win2D to apply real time video effects to any video stream, including the camera preview stream so that the user can see what the effect looks like before they take the photo. We don't do this in Adventure Works, but it's worth touching on. Let's take a quick look.
Applying a video effect on a video stream starts with a VideoEffectDefinition object. This is passed to the MediaCapture by calling mediaCapture.AddVideoEffectAsync() and passing in that VideoEffectDefinition. Let's take a simple example, applying a grayscale effect.
First, create a class in a UWP Windows Runtime Component project and add a public sealed class GrayScaleVideoEffect that implement IBasicVideoEffect.
public sealed class GrayscaleVideoEffect : IBasicVideoEffect
The interface requires several methods (you can see all of them here); the one we'll focus on now is ProcessFrame() where each frame is passed and an output frame is expected. This is where you can use Win2D to apply the same effects to each frame (or analyze the frame for information).
Here's the code:
public void ProcessFrame(ProcessVideoFrameContext context) { using (CanvasBitmap inputBitmap = CanvasBitmap.CreateFromDirect3D11Surface(_canvasDevice, context.InputFrame.Direct3DSurface)) using (CanvasRenderTarget renderTarget = CanvasRenderTarget.CreateFromDirect3D11Surface(_canvasDevice, context.OutputFrame.Direct3DSurface)) using (CanvasDrawingSession ds = renderTarget.CreateDrawingSession()) { var grayscale = new GrayscaleEffect() { Source = inputBitmap }; ds.DrawImage(grayscale); } }
Back to the MediaCapture element, to add this effect to the camera preview screen, you need to call the AddVideoEffectAsync:
await _mediaCapture.AddVideoEffectAsync( new VideoEffectDefinition(typeof(GrayscaleVideoEffect).FullName), MediaStreamType.VideoPreview);
That's all there is to the effect. You can see a more complete demo of applying Win2D video effect here in the official Win2D samples on GitHub and you can install the Win2D demo app from the Windows Store here.
Face Detection
The VideoEffectDefinition can be used for much more than just applying beautiful image effects. You can also use it to process the frame for information. You can even detect faces using one! Luckily, this VideoEffectDefintion has already been created for you, the FaceDetectionEffectDefinition!
Here's how to use it (see the full implementation here):
var definition = new Windows.Media.Core.FaceDetectionEffectDefinition(); definition.SynchronousDetectionEnabled = false; definition.DetectionMode = FaceDetectionMode.HighPerformance; _faceDetectionEffect = (await _mediaCapture.AddVideoEffectAsync(definition, MediaStreamType.VideoPreview)) as FaceDetectionEffect;
You only need to instantiate the FaceDetectionEffectDefinition, set some of the properties to your needs and then add it to the initialized MediaCapture. The reason we're taking the extra step of setting the _faceDetectionEffect private field is so that we can spice it up a little more by hooking into the FaceDetected event:
_faceDetectionEffect.FaceDetected += FaceDetectionEffect_FaceDetected; _faceDetectionEffect.DesiredDetectionInterval = TimeSpan.FromMilliseconds(100); _faceDetectionEffect.Enabled = true;
Now, whenever that event handler is fired, we can, for example, snap a photo, start recording, or even process the video for more information, like detecting when someone is smiling! We can use the Microsoft Cognitive Services FaceAPI to detect a smile, let's take a look at this a little further.
Cognitive Services
Microsoft Cognitive Services let you build apps with powerful algorithms based on Machine Learning using just a few lines of code. To use these APIs, you could use the official NuGet packages, or call the REST endpoints directly. In the Adventure Works demo we use three of these to analyze photos: the Emotion API, Face API and Computer Vision API.
Emotion API
Let's take a look at how we can detect a smile using the Microsoft Services Emotion API. As mentioned above where we showed how to use the FaceDetectionEffectDefinition, we hooked into the FaceDetected event. This is a good spot to check to see if the people in the preview are smiling in real-time and then take the photo at just the right time.
When the FaceDetected event is fired it is passed two parameters: a FaceDetectionEffect sender and a FaceDetectedEventArgs args. We can determine if there is a face available by checking the ResultFrame.DetectedFaces property in the args.
In Adventure Works, when the handler is called (see here for full event handler), first we check if there are any DetectedFaces in the image, and if so, we can greb the location of each face within the frame and call the Emotion API through our custom method, CheckIfEveryoneIsSmiling:
public async Task<bool> CheckIfEveryoneIsSmiling(IRandomAccessStream stream, IEnumerable<DetectedFace> faces, double scale) { List<Rectangle> rectangles = new List<Rectangle>(); foreach (var face in faces) { var box = face.FaceBox; rectangles.Add(new Rectangle() { Top = (int)((double)box.Y * scale), Left = (int)((double)box.X * scale), Height = (int)((double)box.Height * scale), Width = (int)((double)box.Width * scale) }); } var emotions = await _client.RecognizeAsync(stream.AsStream(), rectangles.ToArray()); return emotions.Where(emotion => GetEmotionType(emotion) == EmotionType.Happiness).Count() == emotions.Count(); }
We use the RecognizeAsync method of the EmotionServiceClient to analyze the emotion of each face in the preview frame. We make the assumption that if everyone is happy in the photo they must be smiling.
Face API
Microsoft Cognitive Services Face API allows you to detect, identify, analyze, organize, and tag faces in photos. More specifically, it allows you to detect one or more human faces in an image and get back face rectangles for where in the image the faces are.
We use the API to identify faces in the photo so we can tag each person. When the photo is captured, we analyze the faces by calling our own FindPeople method and passing it the photo file stream:
public async Task<IEnumerable<PhotoFace>> FindPeople(IRandomAccessStream stream) { Face[] faces = null; IdentifyResult[] results = null; List<PhotoFace> photoFaces = new List<PhotoFace>(); try { // find all faces faces = await _client.DetectAsync(stream.AsStream()); results = await _client.IdentifyAsync(_groupId, faces.Select(f => f.FaceId).ToArray()); for (var i = 0; i < faces.Length; i++) { var face = faces[i]; var photoFace = new PhotoFace() { Rect = face.FaceRectangle, Identified = false }; if (results != null) { var result = results[i]; if (result.Candidates.Length > 0) { photoFace.PersonId = result.Candidates[0].PersonId; photoFace.Name = _personList.Where(p => p.PersonId == result.Candidates[0].PersonId).FirstOrDefault()?.Name; photoFace.Identified = true; } } photoFaces.Add(photoFace); } } catch (FaceAPIException ex) { } return photoFaces; }
The FaceServiceClient API contains several methods that allow us to easily call into the Face API in Cognitive Services. DetectAsync allows us to see if there are any faces in the captured frame, as well as their bounding box within the image. This is great for locating the face of a person in the image so you can draw their name (or something else more fun). The IdentifyAsync method can use the faces found in the DetectAsync method to identify known faces and get their name (or id for more unique identification).
Not shown here is the AddPersonFaceAsync method of the FaceServiceClient API which can be used to improve the recognition of a specific person by sending another image for that person to train the model better. And to create a new person if that person has not been added to the model, we can use the CreatePersonAsync method. To see how all of these methods work together in the Adventure Works sample, take a look at FaceAPI.cs on Github.
Computer Vision API
You can take this much further by implementing the Microsoft Services Computer Vision API and get information from the photo. Again, let's go back to PhotoPreviewView in the Adventure Works demo app. If the user clicks on the Details button, we call the AnalyzeImage method where we pass the photo's file stream to the VisionServiceClient AnalyzeImageAsync method and specify the VisualFeatures that we expect in return. It will analyze the image and return a list of tags describing what the API detected in the photo, a short description of the image, detected faces, and more (see the full implementation on GitHub).
private async Task AnalyzeImage() { var stream = await _file.OpenReadAsync(); var imageResults = await _visionServiceClient.AnalyzeImageAsync(stream.AsStream(), new[] { VisualFeature.Tags, VisualFeature.Description, VisualFeature.Faces, VisualFeature.ImageType }); foreach (var tag in imageResults.Tags) { // Take first item and use it as the main photo description // and add the rest to a list to show in the UI } } Wrap up
Now that you are familiar with the general use of the APIs, make sure to check out the app source on our official GitHub repository, read through some of the resources provided, watch the event if you missed it, and let us know what you think through the comments below or on twitter.
And come back next week for another blog post in the series where we will extend the Adventure Works example with some social features through enabling Facebook and Twitter login and sharing, integrating project Rome, and adding Maps and location.
Until then, happy coding!
Resources Previous Xbox Series Posts
Download Visual Studio to get started.
The Windows team would love to hear your feedback. Please keep the feedback coming using our Windows Developer UserVoice site. If you have a direct bug, please use the Windows Feedback tool built directly into Windows 10.
Source:
Camera APIs with a dash of cloud intelligence in a UWP app (App Dev on Xbox series)