{ "cells": [ { "cell_type": "markdown", "metadata": { "collapsed": false }, "source": [ "# 项目背景-BOTBAY机器人港湾\n", "\n", "BotBay致力于创建专属的拟人机器人,我们理想中它可以接入不同的平台【微信、5G】,作为每一人完成日常工作生活专属助理,你可以给它起一个名字,这样就可以伴随终身,我们希望无论你今后的工作生活如何变化,它都可以普适的服务能力,目前版本我们赋能BotBay工作消息整理和待办提醒功能,例如:\n", "1.把机器人拉进群,帮助我记录群里面的文字、图片、文件,并自动将文件存储到云盘,文字经过过滤后形成纪要;\n", "2.在群里面@我或者私聊我,要求查看当日信息“日报”、“纪要”、并支持将“纪要发送邮箱”;\n", "3.模拟一个工作任务,看看机器人如何提醒我的。\n", "\n", "# 作品演示\n", "## 视频\n", "[B站链接](https://www.bilibili.com/video/BV1q64y127Vd/)\n", "\n", "## 部分截图\n", "### 账号绑定和给机器人起名字\n", "> 新用户启动chatbot交互时,由于它还不认识你,所以需要向你确认账户【基于本团队之前开发过的一套用户体系】和机器人它自己的姓名\n", "
\n", "\n", "\n", "### 停止与启动机器人应答\n", "> 由于我们使用的是本人微信号,考虑到不影响日常收发消息,所以实现了开关\n", "
\n", "\n", "\n", "### 自动纪要生成\n", "> 根据关键词提取算法,判断群聊消息中那些内容更加有可能属于重要信息,支持纪要发送邮箱【模拟会议纪要的过程】\n", "
\n", "\n", "\n", "### 群文件、图片、音频、视频自动归档-移动端\n", "> 一个工程向的小机制,帮助归档群聊文件,防止文件过期、手机电脑更换等问题\n", "
\n", "\n", "\n", "> 当然这个其实是一个正儿八经的网盘系统【基于本团队之前开发过的一套网盘】\n", "
\n", "\n", "\n", "### 待办提醒与代操作\n", "> 如果BOTBAY接入了业务办公系统的话,那它就可以采用询问的方式协助你处理待办工作,如下图我们模拟了一个申请单提交审批流程\n", "
\n", "\n", "
\n", "\n", "\n", "### 信息归档日报\n", "> 根据收集到的Text/Audio/Video/Attachement/Image,以及Room/Contact/mentionList等信息,进行归类、统计、分析\n", "
\n", "\n", "
\n", "\n", "
\n", "\n", "\n", "> 当然也有PC端的展现\n", "
\n", "\n", "\n", "# 平台架构\n", "本项目采用一入口,一平台,多支撑的模式进行设计与开发,其中:\n", "\n", "* 一入口 - 微信入口,采用chatbot模式实现用户与系统的交互与应答。\n", "\n", "* 一平台 - botPlatform:托管chatbot,启动wechaty实例,接收消息,按状态机模式处理基础消息响应与逻辑分发。\n", "\n", "* 多支撑 - paddleWorkers:使用paddleHub提供的支撑服务,本项目中使用paddle提供的图片OCR解析微信消息中的图片文字,今后可拓展不同的paddle服务,支撑chatbot实现更多功能。\n", "\n", "# 核心逻辑\n", "## botPlatform-托管chatbot\n", "> 技术路线为NodeJs+Express+MongoDB,主要关键技术为:状态机、分词与关键词提取\n", "由于整体代码量巨大,因此本次只上传了关键部分代码\n", "\n", "| 序号 | 模块名称 | 功能 | 代码 |\n", "| -------- | -------- | -------- | -------- |\n", "| 1 | CMX-CoreHandler | 实现用户认证、用户管理、角色权限等功能 | 无 |\n", "| 1.1 | user.js | 用户相关功能 | 无 |\n", "| 1.2 | bot.js | chatbot相关功能 | **有** |\n", "| 1.3 | application.js | 应用相关功能 | 无 |\n", "| 2 | CMX-FileHandler | 实现文件处理、自动归档,网盘功能 | 无 |\n", "| 3 | CMX-ResourceHandler | 实现流程处理、表单数据处理功能 | 无 |\n", "\n", "---\n", "### 配置信息\n", "> 这里主要是botWechatMap变量在后面的login过程限制了可以扫码的微信用户白名单\n", "```\n", "var _LANG = 'ch';//默认中文\n", "const BOTCONFIG = {\n", " autoregistHello: 'hello bot',\n", " botWechatMap: {\n", " porbello: 'https://u.wechat.com/MG3oDlaSML_iJ3AN6me3Uv4'//不是随意的微信扫码都有效,这里配置了白名单\n", " },\n", " language: {\n", " ch: {\n", " hello: '您好,我是您的专属助手',\n", " //...等等其它语言\n", " }\n", " }\n", "};\n", "```\n", "config/index.js\n", "```\n", "const config = {\n", "\tbot: {\n", " enable: true,//开启机器人服务\n", " tokens: ['puppet_padlocal_e55XXXXXXXX55906d2']//如果有多个token,可以启动多个实例\n", " \t},\n", "}\n", "```\n", "\n", "### 启动wechaty实例\n", "```\n", "var bots = [];\n", "if (config.bot && config.bot.enable) {\n", " for (let i = 0; i < config.bot.tokens.length; i++) {//如果有多个token,则循环运行实例\n", " const bot = new Wechaty({\n", " puppet: new PuppetPadlocal({\n", " token: config.bot.tokens[i]\n", " }),\n", " name: 'BotBay'\n", " });\n", " bot.cmx = bot.cmx || {};\n", " bot.cmx.use = false;\n", " bots.push(bot);\n", " bot\n", " .on('scan', (qrcode, status) => {\n", " bot.cmx.qrcode = qrcode;//有二维码时赋值\n", " })\n", " .on('login', async (user) => {\n", " console.log(`User ${user} logged in`);\n", " const contact = bot.userSelf();\n", " if (BOTCONFIG.botWechatMap[contact.id]) {\n", " console.log(`check pass`);\n", " bot.cmx.use = true;//本bot状态是否为待机\n", " bot.cmx.qrcode = '';//把二维码输出到前端管理页面用\n", " bot.cmx.wechatqr = BOTCONFIG.botWechatMap[contact.id];//给前端管理页面显示本bot对应的微信二维码\n", " } else {\n", " console.log(`check fail`);//如果不是白名单微信扫码,则强制登出,这里复现貌似登出后不能立刻回调到scan事件\n", " await bot.logout();\n", " //TOOD 不知道是否需要主动重启wechaty\n", " }\n", "\n", " })\n", " .on('logout', user => {\n", " console.log(`User ${user} log out`);\n", " bot.cmx.use = false;//bot状态置为待机\n", " })\n", " .on('error', e => console.info('Bot', 'error: %s', e))\n", " .on('message', message => onMessage(message, bot))\n", " .on('friendship', friendship => onFriendship(friendship, bot))\n", " .on('room-join', (room, inviteeList, inviter) => onRoomJoin(room, inviteeList, inviter, bot))\n", " .start();\n", " }\n", "}\n", "```\n", "### 获取我自己的bot\n", "```\n", "async function getMyBot(wechatid) {\n", " return new Promise((resolve, reject) => {\n", " Models.Botlists.findOne({//里面存储了每个人bot的信息\n", " wechatid: wechatid\n", " }).lean().exec((err, data) => {\n", " if (err || !data) resolve(false);\n", " else {\n", " if (data.owner) {\n", " Models.Userworkspacelinks.findOne({\n", " user: data.owner\n", " }).lean().exec((dmErr, dmData) => {\n", " if (dmErr || !dmData) resolve(false);\n", " else resolve(Object.assign(data, {\n", " workspace: dmData.workspace\n", " }));\n", " });\n", " } else {\n", " resolve(false);\n", " }\n", " }\n", " });\n", " });\n", "}\n", "```\n", "其中bot的字段大致如下\n", "```\n", "const botlistsScheMa = new Schema({\n", " nickname: String,//昵称\n", " owner: String,//所属人\n", " expires: { type: Date },//过期时间\n", " state: { type: Number, default: 1 },//状态\n", " birthday: { type: Date, default: Date.now },//生日\n", " desc: String,//个人简介\n", " worldranking: Number,//排名\n", " level: Number,//等级\n", " wechatid: String,//关联微信号\n", " hello: String//自定义触发语\n", "});\n", "\n", "```\n", "### 状态机状态枚举\n", "1. HELLO - 初始化状态,新添加机器人为好友或使用“变身机器人”触发\n", "2. WAITUSERNAME - 检查发现不明确用户账户,等待账户信息\n", "3. WAITNICKNAME - 检查用户尚未给本机器人起名,等待昵称信息\n", "4. FREE - 目前基础信息完整,响应交互\n", "\n", "在下文中状态机在不同时机发生状态变化\n", "\n", "### 添加好友\n", "```\n", "async function onFriendship(friendship, bot) {\n", " const contact = friendship.contact();\n", " if (friendship.type() === bot.Friendship.Type.Receive) { // 1. receive new friendship request from new contact\n", " let hasbotinfo = await ((key, wechatid) => {\n", " return new Promise((resolve, reject) => {\n", " Models.Botlists.findOne({\n", " $or: [{ hello: key }, { wechatid: wechatid }]//根据触发语或微信号检查是否已有机器人\n", " }).lean().exec((err, data) => {\n", " if (err) resolve(false);\n", " else resolve(data || false);\n", " });\n", " });\n", " })(friendship.hello(), contact.id);\n", " if (hasbotinfo === false && friendship.hello() == BOTCONFIG.autoregistHello) {//autoregistHello是默认通用的触发语\n", " hasbotinfo = 'new wechat user';\n", " }\n", " if (hasbotinfo !== false) {\n", " await friendship.accept();//接收好友申请\n", " console.log(`Request from ${contact.name()} is accept succesfully!`);\n", " if (hasbotinfo == 'new wechat user') {\n", " RedisClient.set('BOT-' + contact.id, 'WAITUSERNAME');//状态机置为等待账户名\n", " await fsmJob(bot, contact);\n", " } else {\n", " if (!isEmpty(hasbotinfo.nickname)) {\n", " RedisClient.set('BOT-' + contact.id, 'HELLO');//状态机置为打招呼\n", " await fsmJob(bot, contact, hasbotinfo.nickname);\n", " } else {\n", " RedisClient.set('BOT-' + contact.id, 'WAITNICKNAME');//状态机置为等待昵称\n", " await fsmJob(bot, contact);\n", " }\n", " await ((_query, _updatedata) => {\n", " return new Promise((resolve, reject) => {\n", " Models.Botlists.updateOne(_query, _updatedata, (err, data) => {\n", " if (err) {\n", " console.error(err);\n", " resolve(false);\n", " } else resolve(data);\n", " });\n", " });\n", " })({\n", " _id: hasbotinfo._id\n", " }, {\n", " wechatid: contact.id//更新一下微信号\n", " });\n", " }\n", " } else {\n", " RedisClient.del('BOT-' + contact.id);\n", " console.log(`no exist botinfo from ${friendship.hello()}`);\n", " }\n", " } else if (friendship.type() === bot.Friendship.Type.Confirm) { // 2. confirm friendship\n", " console.log(`New friendship confirmed with ${contact.name()}`);\n", " }\n", "}\n", "```\n", "\n", "### 接收消息\n", "```\n", "async function onMessage(msg, bot) {\n", " const contact = msg.talker();\n", " if (contact.id == 'wexin' || msg.self()) {\n", " return;\n", " }\n", " await fsmJob(bot, contact, msg, true);//直接调用状态机动作\n", "}\n", "```\n", "\n", "### 状态机动作\n", "```\n", "async function fsmJob(bot, contact, msg, reply) {\n", " let FSM = await (() => {//查询当前用户状态\n", " return new Promise((resolve, reject) => {\n", " RedisClient.get('BOT-' + contact.id, function (err, result) {\n", " if (err) {\n", " resolve(false);\n", " } else {\n", " resolve(result || '');\n", " }\n", " });\n", " });\n", " })();\n", " if (FSM)\n", " await botDoProcess[FSM](bot, contact, msg, reply);//直接执行对应动作\n", " else {//说明是新用户\n", " if (reply && msg) {//说明用户主动发消息给bot\n", " //...有若干代码,主要思想就是根据用户发的消息,进行相应处理\n", " }\n", " }\n", "}\n", "```\n", "主要逻辑在botDoProcess变量中实现,\n", "```\n", "const botDoProcess = {\n", " WAITUSERNAME: async (bot, contact, msg, reply)=>{//接收到的是用户账户名,检查数据库是否存在,存在则与bot绑定},\n", " WAITNICKNAME: async (bot, contact, msg, reply)=>{//接收到的是bot昵称,更新数据库},\n", " FREE: async (bot, contact, msg, reply)=>{//处理指令【纪要、纪要发送邮箱、日报、帮助等】,对文本、音频、视频、附件、图片进行处理、归档、统计、分析},\n", " HELLO: async (bot, contact, nickname)=>{\n", " await contact.say(BOTCONFIG.language[_LANG].hello + nickname);\n", " RedisClient.set('BOT-' + contact.id, 'FREE');//空闲\n", " await fsmJob(bot, contact);\n", " },\n", "}\n", "```\n", "\n", "### 分词与关键词提取\n", "使用CppJieba提供底层分词算法实现\n", "\n", "## paddleWorkers-提供chatbot支撑服务\n", "> 本次只使用了图片OCR这一个功能,并且封装为http接口【因为pyton实现的paddleWorker,nodejs实现的botPlatform】,暴露给botPlatform使用,得力于paddlehub的组件成熟度,所以代码量很少,这里给paddlehub点个赞\n", "\n", "```\n", "from flask import request, Flask\n", "import json\n", "import paddlehub as hub\n", "import cv2\n", "import requests\n", "import os\n", "\n", "\n", "app = Flask(__name__)\n", "ocr = None\n", "\n", "\n", "@app.route('/imageOcr', methods=['GET'])\n", "def image_ocr():\n", " path = request.args.get('imagePath')\n", " print(path)\n", " file_name = os.path.basename(path)\n", " file_down = requests.get(path)\n", " with open('/mnt/'+file_name,'wb') as f:\n", " f.write(file_down.content)\n", " ocr_res = ocr.recognize_text(images=[cv2.imread('/mnt/'+file_name)])\n", " data = ocr_res[0]['data']\n", " res_data = {}\n", " text = []\n", " for item in data:\n", " text.append(item['text'])\n", " res_data['msg'] = '请求成功'\n", " res_data['code'] = 200\n", " res_data['data'] = text\n", " os.remove('/mnt/'+file_name)\n", " return json.dumps(res_data,ensure_ascii=False)\n", "\n", "\n", "\n", "def load_model():\n", " global ocr\n", " ocr = hub.Module(name=\"chinese_ocr_db_crnn_server\")\n", "\n", "\n", "if __name__ == \"__main__\":\n", " load_model()\n", " app.run(host=\"0.0.0.0\", port=9000)\n", "\n", "```\n", "## webpages-实现前端页面\n", "> 虽然写前端页面的工作相比于高大上的机器学习、深度学习、人工智能、自然语言处理这些门类,显得不上档次,但是有一个\"友好一点点\"的界面总还算是件好事。\n", "\n", "基本上就是这个样子按组件化编写的页面\n", "```\n", "\n", "
\n", "

\n", " 群聊文件\n", "

\n", " \n", " \n", " 暂无数据~\n", " \n", "
\n", "
\n", "```\n", "开发语言为VUE,使用Echarts的图表,这部分就不赘述如何开发的了,按设计稿实现就好了。\n", "\n", "# 尚未解决问题\n", "1. 目前版本基于状态机的消息处理逻辑是不能应答非标准化的指令的,可以通过引入自然语言处理和多轮对话技术辅助触发状态变化;\n", "2. 由于wechaty原理基于微信号的消息收发,所以存在添加好友人数上线,目前本方案可支持不添加好友情况下,在微信群中@机器人的方式进行交互,但复杂场景下还是需要添加好友的。考虑到这点,botPlatform在最开始的配置信息中,预留了多个wechaty实例使用的token数组,并且通过循环创建的方式,可以在服务器端启动多个wechaty实例待机,并且根据策略派发实例响应用户交互(ps.不成熟);\n", "3. 基于wechaty作为消息收发中枢的模式无法满足生产环境下高可用的要求,如果一个实例宕机,其它实例目前没有平滑无缝接管服务的方式,所以只能多拜拜大神了。\n", "\n" ] } ], "metadata": { "kernelspec": { "display_name": "PaddlePaddle 2.0.0b0 (Python 3.5)", "language": "python", "name": "py35-paddle1.2.0" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.7.4" } }, "nbformat": 4, "nbformat_minor": 1 }