画像の URL を教えるとコメントを返してくれる Bot を作ってみた

Bot Framework と Microsoft Azure を使ってあそんでまーす。
今回はタイトル通り、Bot に画像の URL を与えると、コメントを返してくれるのを作ってみました。

作ったのはこんな感じ。画像の URL を与えてください。

ひとまず、クラウディアさんの画像を教えると、こんな感じ。
人だってことはあってますが、サーフボードには載ってないっすね。。。

うちのワンコ画像を与えたら、カメラを向いた犬って応答がありました。正解！！

人の顔が映った画像だと、感情も推測してくれます。
去年、青森に遊びに行ったときっす。メッサ楽しかった！！

人が複数いることもコメントしてくれます。

もう一枚 AKB48 の画像 URL を与えたら、名前まで教えてくれました。
多分、自分よりも詳しい。。露出が若干高いので、RacyContentが出ています。

AdultContent も判断できますが、目的がそれちゃうのでやめときます。

人が画像を見ればすぐにわかることですが、それに近いことをシステムが読み取って教えてくれるというのが面白いと思います。

しくみ

全部ゴリゴリ実装しているわけではないです。
画像解析の部分は、Microsoft Azure の Cognitive Service の API を利用しています。

利用したのは以下の2つ

Computer Vision は、
簡単にいうと、画像解析してその画像がなにかっていうのを解析結果として返してくれます。
例えば、人とか犬とか。
どんな分類があるかっていうと下の画像みたいです。沢山ありますね。

このカテゴリ以外にもいろんな情報があります。
例えば、露出高めか否かを「Is Racy Content」でTrue/Falseで教えてくれたり。

多分裏側ではMLが動いてるんでしょうね～。

Emotion API は、「顔」が認識できたときに、顔の表情から感情を分析してかえしてくれます。
返してくれる感情は、以下の8種類です。

anger
contempt
disgust
fear
happiness
neutral
sadness
surprise

これも裏ではMLが動いてるんでしょうねー。

で、肝心のBOTはというと、Bot Framework のドキュメントに書いてあります。
https://docs.botframework.com/en-us/bot-intelligence/vision/#example-vision-bot

ソースなど

GITにしようかと思ったのですが、Computer Vision API と　Emotion API の接続キーをそのままUPしそうなので、今回もコード貼り付けにしたいと思います。

今回の環境は、Visual Studio 2015 + Bot Framework v3 です。
Microsoft Azure のポータルより、Vision API と Emotion API を作成します。

まずは、Vision API から

無料があってうれしい～

続いてEmotion API

こちらも無料があります。

で、続いてVisual Studio を起動して、Bot Application のテンプレートでプロジェクト作成後から。

最初にやったのは、Microsoft.Bot.Builder をv3.1.0に Update しました。

Microsoft.ProjectOxford.Vision をインストール

Microsoft.ProjectOxford.Emotion をインストール

んで、コード
Nugetから追加インストールした分を using に積みます

using System;
using System.Linq;
using System.Net.Http;
using System.Threading.Tasks;
using System.Web.Http;
using Microsoft.Bot.Connector;
// Vision API 用
using Microsoft.ProjectOxford.Vision;
using Microsoft.ProjectOxford.Vision.Contract;
// Emotion API
using Microsoft.ProjectOxford.Emotion;
using Microsoft.ProjectOxford.Emotion.Contract;
using System.Collections.Generic;

で、Post

HandleSystemMessageは、テンプレートを内容をそのまま使いますので割愛します。

意外といろんなところで、例外が出てきたのでtryだらけになってしまいました。

        public async Task<HttpResponseMessage> Post([FromBody]Activity activity)
        {

            if (activity == null || activity.GetActivityType() != ActivityTypes.Message)
            {
                //add code to handle errors, or non-messaging activities
                HandleSystemMessage(activity);
            }
            else
            {
                ConnectorClient connector = new ConnectorClient(new Uri(activity.ServiceUrl));
                Activity reply = null;
                String ReplyText="";
                try
                {
                    // vision の処理用
                    const string visionApiKey = "<Vision API の Key>;

                    //Vision SDK classes
                    VisionServiceClient visionClient = new VisionServiceClient(visionApiKey);
                    VisualFeature[] visualFeatures = new VisualFeature[] {
                                        VisualFeature.Adult, //recognize adult content
                                        VisualFeature.Categories, //recognize image features
                                        VisualFeature.Description //generate image caption
                                        };
                    AnalysisResult analysisResult = null;

                    //Else, if the user did not upload an image, determine if the message contains a url, and send it to the Vision API
                    try
                    {
                        analysisResult = await visionClient.AnalyzeImageAsync(activity.Text, visualFeatures);
                    }
                    catch (Exception e)
                    {
                        analysisResult = null; //on error, reset analysis result to null
                    }

                    if (analysisResult != null)
                    {
                        string imageCaption = "I think it's " + analysisResult.Description.Captions[0].Text +".";
                        if (analysisResult.Adult.IsRacyContent == true)
                        {
                            imageCaption += "[RacyContent]";
                        }
                        if (analysisResult.Adult.IsAdultContent == true)
                        {
                            imageCaption += "[AdultContent]";
                        }

                        ReplyText += imageCaption;
                    }

                    // Emotion 処理用
                    const string emotionApiKey = "Emotion API の Key";

                    //Emotion SDK objects that take care of the hard work
                    EmotionServiceClient emotionServiceClient = new EmotionServiceClient(emotionApiKey);
                    Emotion[] emotionResult = null;

                    try
                    {
                        emotionResult = await emotionServiceClient.RecognizeAsync(activity.Text);
                    }
                    catch (Exception e)
                    {
                        emotionResult = null;
                    }

                    if (emotionResult != null)
                    {
                        string emotiontext;
                        try
                        {
                            Scores emotionScores = emotionResult[0].Scores;

                            //Retrieve list of emotions for first face detected and sort by emotion score (desc)
                            IEnumerable<KeyValuePair<string, float>> emotionList = new Dictionary<string, float>()
                            {
                                { "angry", emotionScores.Anger},
                                { "contemptuous", emotionScores.Contempt },
                                { "disgusted", emotionScores.Disgust },
                                { "frightened", emotionScores.Fear },
                                { "happy", emotionScores.Happiness},
                                { "neutral", emotionScores.Neutral},
                                { "sad", emotionScores.Sadness },
                                { "surprised", emotionScores.Surprise}
                            }
                            .OrderByDescending(kv => kv.Value)
                            .ThenBy(kv => kv.Key)
                            .ToList();

                            KeyValuePair<string, float> topEmotion = emotionList.ElementAt(0);
                            string topEmotionKey = topEmotion.Key;
                            float topEmotionScore = topEmotion.Value;

                            emotiontext = "I found a face! I am " + (int)(topEmotionScore * 100) +
                                                             "% sure the person seems " + topEmotionKey + ".";

                        }
                        catch
                        {
                            emotiontext = "";
                        }

                        ReplyText += emotiontext;
                    }

                    if (ReplyText.Length != 0)
                    {
                        reply = activity.CreateReply();
                        reply.Recipient = activity.From;
                        reply.Type = ActivityTypes.Message;
                        reply.Text = ReplyText;
                        reply.Attachments = new System.Collections.Generic.List<Attachment>();
                        reply.Attachments.Add(new Attachment()
                        {
                            ContentUrl = activity.Text,
                            ContentType = "image/png"
                        });
                    }
                    else
                    {
                        reply = activity.CreateReply("???");
                    }
                }
                catch (Exception e1)
                {
                    Console.WriteLine(e1.Message);
                }

                await connector.Conversations.ReplyToActivityAsync(reply);

            }

            return new HttpResponseMessage(System.Net.HttpStatusCode.Accepted);
        }