数据要素产业

亚马逊门铃系统上的人脸识别是如何实现的？

12-07 23:46

TAG： 人脸识别亚马逊门铃系统 Python OpenCV FaceNet

作为一个新的亚马逊门铃的买家，我喜欢它提供的炫酷功能。然而，我认为我可以做一些改进。我需要的是为住在我家的人定制的门铃。要是门铃能认出是谁在敲门就好了。看到门铃是多么的受欢迎，我决定帮助大多数家庭，最好方法是让他们能够毫不费力地定制他们的门铃。我开发了一个应用程序，可以告诉你谁在你的门口，只需输入你的门铃帐户的用户名和密码。知道谁在你的门口，无需等待门铃在你的智能手机上显示视频，这是非常方便的。它大大提高了安全性，带来了极大的便利，甚至可以安装在一个自动开门的系统上。在深度学习时代，每个家庭都需要安装这些系统。下图说明了我的系统是如何工作的。

完整的代码可以在这里的Git存储库中找到。https：／／github．com／dude123studios／SmarterRingV2要求如下：tensorflow＝＝2．4．1

opencv－python＝＝4．5．1．48
mtcnn＝＝0．1．0
ring＿doorbell＝＝0．7．0
oauthlib～＝3．1．0
numpy～＝1．19．5
scipy～＝1．6．1
scikit－learn＝＝0．24．1
gtts＝＝2．2．2
playsound～＝1．2．2

让我们来分析一下发生了什么。通过输入用户名和密码作为环境变量，Ring API就能够连接到你的帐户。该API允许用户访问python特性。这里（https：／／github．com／tchellomello／python－ring－doorbell）有API存储库和简短的文档。这是Ring．py的一个片段，它实例化了一个与你的门铃的连接：import os
import json
from pathlib import Path
from ring＿doorbell import Ring， Auth
from oauthlib．oauth2 import MissingTokenError
cache＿file ＝ Path（＂test＿token．cache＂）
def token＿updated（token）：
cache＿file．write＿text（json．dumps（token））
def otp＿callback（）：
auth＿code ＝ input（＂［INPUT］ 2FA code：＂）
return auth＿code
def main（download＿only＝False）：
if cache＿file．is＿file（）：
auth ＝ Auth（＂MyProject／1．0＂， json．loads（cache＿file．read＿text（））， token＿updated）
else：
username ＝ os．environ．get（＇USERNAME＇）
password ＝ os．environ．get（＇PASSWORD＇）
auth ＝ Auth（＂MyProject／1．0＂， None， token＿updated）
try：
auth．fetch＿token（username， password）
except MissingTokenError：
auth．fetch＿token（username， password， otp＿callback（））
ring ＝ Ring（auth）
ring．update＿data（）
wait＿for＿update（ring， download＿only＝download＿only）

wait＿for＿update方法持续运行并实例化一个正在等待客户端的处理程序。它会继续刷新，直到发现Ring的存储历史记录有更新。一旦发生这种情况，它检查门铃是否被按了。如果是这样，它会把整个视频下载到你的设备上。为了加快这一过程，请使用智能手机上的ring应用程序缩小视频录制的大小。你的门铃响了，最后一段视频就会传到你的电脑上。从那里，我们截取了那段视频的多个帧，以确保一个人的脸都不会被遮住。我在utils．py中定义了这个方法。它将在稍后显示。下面是ring．py的另一个片段。用于处理主线程：import time
def wait＿for＿update（ring， download＿only＝False）：
id ＝－1
start ＝ time．time（）
while True：
try：
ring．update＿data（）
except：
time．sleep（1）
continue
doorbell ＝ ring．devices（）［＇authorized＿doorbots＇］［0］
for event in doorbell．history（limit＝20， kind＝＇ding＇）：
current＿id ＝ event［＇id＇］
break
if current＿id ！＝ id：
id ＝ current＿id
print（＇［INFO］ finished search in：＇， str（time．time（）－ start））
start ＝ time．time（）
if download＿only：
handle＿video（ring， True）
return
handle ＝ handle＿video（ring）
if handle：
text＿to＿speech（handle）
else：
text＿to＿speech（＇The person at the door is not very clear＇）
time．sleep（1）

如果你对identify、get＿first＿frame和text＿to＿speech方法调用有点困惑，不要担心！我们就要谈到这个了！现在我们的处理程序已经就位，让我们开始面部识别吧！FaceNetFaceNet是谷歌在2015年开发的一个模型。FaceNet使用一种称为聚类的过程

聚类的目的是创建一种嵌入，就像单词一样。唯一的区别是，该模型不是学习向量标记的id，而是将图像压缩到一个小的潜在空间。具体来说，给定一幅形状为（160，160，3）的图像，FaceNet模型，产生一个形状为（128）的矢量，称为它的嵌入。该模型将确保不同人的面孔在嵌入空间中的距离较远，同一个人的面孔距离较近。这样，一个人无论在什么样的光线条件下，从什么样的角度，或者什么妆容，都可以被认出来。FaceNet架构FaceNet类似于ResNet和InceptionV3。架构如下所示。输入图像经过1x1Conv层和2x2Pooling层，然后沿着深度ResNet下行，由成对的Inception层和残差连接层组成。最后的层包含多个3x3Conv、Concat和2x2Pooling层。

加载模型的代码很简单。模型存储在目录model／files／中。from tensorflow．keras．models import load＿model

model ＝ load＿model（＇model／facenet＿keras．h5＇）

开发一个模型来概括它以前从未见过的面孔是很困难的。FaceNet模型是在MS－Celeb－1M数据集上训练的，该数据集包含100万张不同名人的照片。通过对同一个人的图像组进行L2归一化，以及余弦相似函数，FaceNet能够产生令人难以置信的高识别精度。我发明了一种方便的方法来登记你家人的面孔，运行submit＿face．py，并传递参数“name”（要注册的人的名字）。另外，为了提高准确性和匹配照明条件，你可以使用布尔参数“from＿door”，如果为真，将直接从你的门铃的最后录制的视频中保存图像。这些图像被存储在目录data／faces／中。用MTCNN人脸检测对它们进行预裁剪。检测方法将在稍后显示，它是face＿recognition．py的一部分。对于拍到的视频，我抓取了视频的特定帧，并测试哪些帧可以工作。我们将需要做一些图像预处理，以及其他小的函数，我将在utils．py中定义：import cv2
def normalize（img）：
mean， std ＝ img．mean（）， img．std（）
return （img － mean）／（std ＋ 1e－7）
def preprocess（cv2＿img）：
cv2＿img ＝ normalize（cv2＿img）
cv2＿img ＝ cv2．resize（cv2＿img，（160， 160））
return cv2＿img
def get＿specific＿frames（video＿path， times）：
vidcap ＝ cv2．VideoCapture（video＿path）
frames ＝［］
for time in times：
vidcap．set（1， time ＊ 15）
success， image ＝ vidcap．read（）
if success：
frames．append（image）
return frames

一旦你想要识别的每个人的图像都在目录data／faces／中，我们就可以将其转换为编码。我们把这作为单独的一步，因为我们L2标准化了每个人对应的所有图像。import os
from utils import preprocess
import cv2
import numpy as np
from sklearn．preprocessing import Normalizer
import face＿recognition
import pickle
encoding＿dict ＝｛｝
l2＿normalizer ＝ Normalizer（＇l2＇）
for face＿names in os．listdir（＇data／faces／＇）：
person＿dir ＝ os．path．join（＇data／faces／＇， face＿names）
encodes ＝［］
for image＿name in os．listdir（person＿dir）：
image＿path ＝ os．path．join（person＿dir， image＿name）
face ＝ cv2．imread（image＿path）
face ＝ preprocess（face）
encoding ＝ face＿recognition．encode（face）
encodes．append（encoding）
if encodes：
encoding ＝ np．sum（encodes， axis＝0）
encoding ＝ l2＿normalizer．transform（np．expand＿dims（encoding， axis＝0））［0］
encoding＿dict［face＿names］＝ encoding
path ＝＇data／encodings／encoding．pkl＇
with open（path，＇wb＇） as file：
pickle．dump（encoding＿dict， file）

预处理函数是我用来标准化图像并将其重塑为（160，160，3）的函数，而识别函数是一个执行编码函数的类。如果你注意到了，我将这些编码保存为字典。在执行实时识别时，这个字典很方便，因为它是存储人名和编码的一种简单方法。实时人脸识别现在我们有了我们想要识别的人的图像，那么实时识别过程是如何工作的呢？如下图所示：

门铃响时，下载一个视频，选择多个帧。利用这些帧，用detect＿faces方法进行多实例的人脸检测。下面是face＿recognition．py 类的一个片段：import cv2
import mtcnn
face＿detector ＝ mtcnn．MTCNN（）
conf＿t ＝ 0．99
def detect＿faces（cv2＿img）：
img＿rgb ＝ cv2．cvtColor（cv2＿img， cv2．COLOR＿BGR2RGB）
results ＝ face＿detector．detect＿faces（img＿rgb）
faces ＝［］
for res in results：
x1， y1， width， height ＝ res［＇box＇］
x1， y1 ＝ abs（x1）， abs（y1）
x2， y2 ＝ x1 ＋ width， y1 ＋ height
confidence ＝ res［＇confidence＇］
if confidence ＜ conf＿t：
continue
faces．append（cv2＿img［y1：y2， x1：x2］）
return faces
def detect＿face（cv2＿img）：
img＿rgb ＝ cv2．cvtColor（cv2＿img， cv2．COLOR＿BGR2RGB）
results ＝ face＿detector．detect＿faces（img＿rgb）
x1， y1， width， height ＝ results［0］［＇box＇］
cv2．waitKey（1）
x1， y1 ＝ abs（x1）， abs（y1）
x2， y2 ＝ x1 ＋ width， y1 ＋ height
confidence ＝ results［0］［＇confidence＇］
if confidence ＜ conf＿t：
return None
return cv2＿img［y1：y2， x1：x2］

对图像进行预处理并送入FaceNet。FaceNet将输出每个人脸的128维嵌入。然后使用余弦相似度将这些向量与encode ．pkl中存储的向量进行比较。人脸与输入人脸最接近的人被返回。如果一张脸距离它最近的脸有一个特定的阈值，则返回“未知”。这表明这张脸不像任何已知的脸。下面是face＿recognition．py类的其余部分：from utils import preprocess
from model．facenet＿loader import model
import numpy as np
from scipy．spatial．distance import cosine
import pickle
from sklearn．preprocessing import Normalizer
l2＿normalizer ＝ Normalizer（＇l2＇）
def encode（img）：
img ＝ np．expand＿dims（img， axis＝0）
out ＝ model．predict（img）［0］
return out
def load＿database（）：
with open（＇data／encodings／encoding．pkl＇，＇rb＇） as f：
database ＝ pickle．load（f）
return database
recog＿t ＝ 0．35
def recognize（img）：
people ＝ detect＿faces（img）
if len（people）＝＝ 0：
return None
best＿people ＝［］
people ＝［preprocess（person） for person in people］
encoded ＝［encode（person） for person in people］
encoded ＝［l2＿normalizer．transform（encoding．reshape（1，－1））［0］
for encoding in encoded］
database ＝ load＿database（）
for person in encoded：
best ＝ 1
best＿name ＝＇＇
for k， v in database．items（）：
dist ＝ cosine（person， v）
if dist ＜ best：
best ＝ dist
best＿name ＝ k
if best ＞ recog＿t：
best＿name ＝＇UNKNOWN＇
best＿people．append（best＿name）
return best＿people

这样就完成了大部分的识别任务。语音合成我想知道谁在门口。一开始，我以为在铃声设备上播放声音是最佳策略，但亚马逊不允许我这么做，只允许我播放铃声伴随的默认声音。因此，从文本到语音似乎是一种更合适的方式。这可以通过两个包GTTS和playsound来简化。GTTS使用谷歌的Tacotron 2模型。虽然完全理解它的工作原理并不重要，但对于感兴趣的读者来说，该图说明了它的架构

Tacotron与Seq2Seq非常相似，但是它使用了双向LSTM、卷积层、预网络层，以及最重要的2D生成输入到解码器（光谱图）。如果你想了解更多关于Tacotron 2的内容，这里有一个由CodeEmporium制作的关于这个主题的视频。https：／／www．youtube．com／watch？v＝le1LH4nPfmE＆ab＿channel＝CodeEmporium虽然Tacotron 2算不上是最好的，尤其是与transformer 模型相比，但它确实做到了。使用GTTS python API的方法如下：from gtts import gTTS
from playsound import playsound
language ＝＇en＇
slow＿audio＿speed ＝ False
filename ＝＇tts＿file．mp3＇
def text＿to＿speech（text）：
audio＿created ＝ gTTS（text＝text， lang＝language，
slow＝slow＿audio＿speed）
audio＿created．save（filename）
playsound（filename）

很简单。我使用playsound而不是os．system的原因是，os．system将默认打开默认的声音播放器应用程序，而playsound不会弹出任何窗口。这就完成了项目的最后一个步骤。总结和Git存储库请在这里查看我的git存储库，以获得完整的代码，并轻松地定制你自己的门铃。https：／／github．com／dude123studios／SmarterRingV2在README．md中查看说明，并解释在你自己的家里使用这个系统的确切步骤。只需要5分钟就可以安装好！亚马逊，把它放进你的下一个门铃里！进一步的探索和问题FaceNet是一个相当过时的模式。在过去的五年里，在transformer模型方面有了重大发现，例如ViT。GPT－3是一个概括之神。完成创建广义嵌入的任务后，GPT－3之类的转换器会更好地工作吗？卷积神经网络可能不是面部识别的最佳选择，因为长期依赖关系（如耳朵或下颚线）需要庞大的网络。另一方面，transformer模型可以考虑到自相似性，并且实时进行人脸识别的速度要快得多。

图片标题

更多>数据要素产业相关信息

最新发布

点击排行

Select Language

AI社区

今日排行

本月搜索

Dataset Category

数据要素产业

亚马逊门铃系统上的人脸识别是如何实现的？