You cannot select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.

422 lines
20 KiB
Plaintext

{
"cells": [
{
"cell_type": "markdown",
"metadata": {
"collapsed": false
},
"source": [
"# 项目背景-BOTBAY机器人港湾\n",
"\n",
"BotBay致力于创建专属的拟人机器人我们理想中它可以接入不同的平台【微信、5G】作为每一人完成日常工作生活专属助理你可以给它起一个名字这样就可以伴随终身我们希望无论你今后的工作生活如何变化它都可以普适的服务能力目前版本我们赋能BotBay工作消息整理和待办提醒功能例如\n",
"1.把机器人拉进群,帮助我记录群里面的文字、图片、文件,并自动将文件存储到云盘,文字经过过滤后形成纪要;\n",
"2.在群里面@我或者私聊我,要求查看当日信息“日报”、“纪要”、并支持将“纪要发送邮箱”;\n",
"3.模拟一个工作任务,看看机器人如何提醒我的。\n",
"\n",
"# 作品演示\n",
"## 视频\n",
"[B站链接](https://www.bilibili.com/video/BV1q64y127Vd/)\n",
"\n",
"## 部分截图\n",
"### 账号绑定和给机器人起名字\n",
"> 新用户启动chatbot交互时由于它还不认识你所以需要向你确认账户【基于本团队之前开发过的一套用户体系】和机器人它自己的姓名\n",
"<br/>\n",
"<img width=\"300\" src=\"https://ai-studio-static-online.cdn.bcebos.com/26294f201ccc4a26a27e954653a920190d991798392b47218bc8a4412b363c31\"/>\n",
"\n",
"### 停止与启动机器人应答\n",
"> 由于我们使用的是本人微信号,考虑到不影响日常收发消息,所以实现了开关\n",
"<br/>\n",
"<img width=\"300\" src=\"https://ai-studio-static-online.cdn.bcebos.com/99c5993e00314b27a657cce8811a48852d549d16a003444da8a9fd0b4b5ee7b3\"/>\n",
"\n",
"### 自动纪要生成\n",
"> 根据关键词提取算法,判断群聊消息中那些内容更加有可能属于重要信息,支持纪要发送邮箱【模拟会议纪要的过程】\n",
"<br/>\n",
"<img width=\"300\" src=\"https://ai-studio-static-online.cdn.bcebos.com/88c174b364254c76992ac6395d87a7ba2d72fcffa02845169936ec2df15d89da\"/>\n",
"\n",
"### 群文件、图片、音频、视频自动归档-移动端\n",
"> 一个工程向的小机制,帮助归档群聊文件,防止文件过期、手机电脑更换等问题\n",
"<br/>\n",
"<img width=\"300\" src=\"https://ai-studio-static-online.cdn.bcebos.com/118241d57082479fb446e4559db2add8e46b2d8179d043cf8ebcc5bbbda5cf61\"/>\n",
"\n",
"> 当然这个其实是一个正儿八经的网盘系统【基于本团队之前开发过的一套网盘】\n",
"<br/>\n",
"<img src=\"https://ai-studio-static-online.cdn.bcebos.com/3bab69230c9e4772ab55362d737f2f628128ef5b8b8c448ba80286c7511ecce1\"/>\n",
"\n",
"### 待办提醒与代操作\n",
"> 如果BOTBAY接入了业务办公系统的话那它就可以采用询问的方式协助你处理待办工作如下图我们模拟了一个申请单提交审批流程\n",
"<br/>\n",
"<img width=\"300\" src=\"https://ai-studio-static-online.cdn.bcebos.com/07d16b6d0aeb4fb6ba610993f422b10413050d41881d4d3c9dd6153baf3a6b9e\"/>\n",
"<br/>\n",
"<img width=\"300\" src=\"https://ai-studio-static-online.cdn.bcebos.com/7aeae982dc3245cab77befdca611e095c292f869043d4ed087c6d7ff70ef9edf\"/>\n",
"\n",
"### 信息归档日报\n",
"> 根据收集到的Text/Audio/Video/Attachement/Image以及Room/Contact/mentionList等信息进行归类、统计、分析\n",
"<br/>\n",
"<img width=\"300\" src=\"https://ai-studio-static-online.cdn.bcebos.com/b0400068e3184cb58cd06c00857dffaad874fbc7168542bab1076f076432e94c\"/>\n",
"<br/>\n",
"<img width=\"300\" src=\"https://ai-studio-static-online.cdn.bcebos.com/7b257d289fb34168ad4caa21d0a2aca83fdbd6f37c4944d5994f18cf4b6ed632\"/>\n",
"<br/>\n",
"<img width=\"300\" src=\"https://ai-studio-static-online.cdn.bcebos.com/a6434c93e2ca4d64be28aef5a48899e746d84c4534714e5ea3e66464bba7ba04\"/>\n",
"\n",
"> 当然也有PC端的展现\n",
"<br/>\n",
"<img src=\"https://ai-studio-static-online.cdn.bcebos.com/b2c9693c682047aea0d000a70cf97b7426d1995c642745a2b4b5b18a6a9870fd\"/>\n",
"\n",
"# 平台架构\n",
"本项目采用一入口,一平台,多支撑的模式进行设计与开发,其中:\n",
"\n",
"* 一入口 - 微信入口采用chatbot模式实现用户与系统的交互与应答。\n",
"\n",
"* 一平台 - botPlatform托管chatbot启动wechaty实例接收消息按状态机模式处理基础消息响应与逻辑分发。\n",
"\n",
"* 多支撑 - paddleWorkers使用paddleHub提供的支撑服务本项目中使用paddle提供的图片OCR解析微信消息中的图片文字今后可拓展不同的paddle服务支撑chatbot实现更多功能。\n",
"\n",
"# 核心逻辑\n",
"## botPlatform-托管chatbot\n",
"> 技术路线为NodeJs+Express+MongoDB主要关键技术为状态机、分词与关键词提取\n",
"由于整体代码量巨大,因此本次只上传了关键部分代码\n",
"\n",
"| 序号 | 模块名称 | 功能 | 代码 |\n",
"| -------- | -------- | -------- | -------- |\n",
"| 1 | CMX-CoreHandler | 实现用户认证、用户管理、角色权限等功能 | 无 |\n",
"| 1.1 | user.js | 用户相关功能 | 无 |\n",
"| 1.2 | bot.js | chatbot相关功能 | **有** |\n",
"| 1.3 | application.js | 应用相关功能 | 无 |\n",
"| 2 | CMX-FileHandler | 实现文件处理、自动归档,网盘功能 | 无 |\n",
"| 3 | CMX-ResourceHandler | 实现流程处理、表单数据处理功能 | 无 |\n",
"\n",
"---\n",
"### 配置信息\n",
"> 这里主要是botWechatMap变量在后面的login过程限制了可以扫码的微信用户白名单\n",
"```\n",
"var _LANG = 'ch';//默认中文\n",
"const BOTCONFIG = {\n",
" autoregistHello: 'hello bot',\n",
" botWechatMap: {\n",
" porbello: 'https://u.wechat.com/MG3oDlaSML_iJ3AN6me3Uv4'//不是随意的微信扫码都有效,这里配置了白名单\n",
" },\n",
" language: {\n",
" ch: {\n",
" hello: '您好,我是您的专属助手',\n",
" //...等等其它语言\n",
" }\n",
" }\n",
"};\n",
"```\n",
"config/index.js\n",
"```\n",
"const config = {\n",
"\tbot: {\n",
" enable: true,//开启机器人服务\n",
" tokens: ['puppet_padlocal_e55XXXXXXXX55906d2']//如果有多个token可以启动多个实例\n",
" \t},\n",
"}\n",
"```\n",
"\n",
"### 启动wechaty实例\n",
"```\n",
"var bots = [];\n",
"if (config.bot && config.bot.enable) {\n",
" for (let i = 0; i < config.bot.tokens.length; i++) {//如果有多个token则循环运行实例\n",
" const bot = new Wechaty({\n",
" puppet: new PuppetPadlocal({\n",
" token: config.bot.tokens[i]\n",
" }),\n",
" name: 'BotBay'\n",
" });\n",
" bot.cmx = bot.cmx || {};\n",
" bot.cmx.use = false;\n",
" bots.push(bot);\n",
" bot\n",
" .on('scan', (qrcode, status) => {\n",
" bot.cmx.qrcode = qrcode;//有二维码时赋值\n",
" })\n",
" .on('login', async (user) => {\n",
" console.log(`User ${user} logged in`);\n",
" const contact = bot.userSelf();\n",
" if (BOTCONFIG.botWechatMap[contact.id]) {\n",
" console.log(`check pass`);\n",
" bot.cmx.use = true;//本bot状态是否为待机\n",
" bot.cmx.qrcode = '';//把二维码输出到前端管理页面用\n",
" bot.cmx.wechatqr = BOTCONFIG.botWechatMap[contact.id];//给前端管理页面显示本bot对应的微信二维码\n",
" } else {\n",
" console.log(`check fail`);//如果不是白名单微信扫码则强制登出这里复现貌似登出后不能立刻回调到scan事件\n",
" await bot.logout();\n",
" //TOOD 不知道是否需要主动重启wechaty\n",
" }\n",
"\n",
" })\n",
" .on('logout', user => {\n",
" console.log(`User ${user} log out`);\n",
" bot.cmx.use = false;//bot状态置为待机\n",
" })\n",
" .on('error', e => console.info('Bot', 'error: %s', e))\n",
" .on('message', message => onMessage(message, bot))\n",
" .on('friendship', friendship => onFriendship(friendship, bot))\n",
" .on('room-join', (room, inviteeList, inviter) => onRoomJoin(room, inviteeList, inviter, bot))\n",
" .start();\n",
" }\n",
"}\n",
"```\n",
"### 获取我自己的bot\n",
"```\n",
"async function getMyBot(wechatid) {\n",
" return new Promise((resolve, reject) => {\n",
" Models.Botlists.findOne({//里面存储了每个人bot的信息\n",
" wechatid: wechatid\n",
" }).lean().exec((err, data) => {\n",
" if (err || !data) resolve(false);\n",
" else {\n",
" if (data.owner) {\n",
" Models.Userworkspacelinks.findOne({\n",
" user: data.owner\n",
" }).lean().exec((dmErr, dmData) => {\n",
" if (dmErr || !dmData) resolve(false);\n",
" else resolve(Object.assign(data, {\n",
" workspace: dmData.workspace\n",
" }));\n",
" });\n",
" } else {\n",
" resolve(false);\n",
" }\n",
" }\n",
" });\n",
" });\n",
"}\n",
"```\n",
"其中bot的字段大致如下\n",
"```\n",
"const botlistsScheMa = new Schema({\n",
" nickname: String,//昵称\n",
" owner: String,//所属人\n",
" expires: { type: Date },//过期时间\n",
" state: { type: Number, default: 1 },//状态\n",
" birthday: { type: Date, default: Date.now },//生日\n",
" desc: String,//个人简介\n",
" worldranking: Number,//排名\n",
" level: Number,//等级\n",
" wechatid: String,//关联微信号\n",
" hello: String//自定义触发语\n",
"});\n",
"\n",
"```\n",
"### 状态机状态枚举\n",
"1. HELLO - 初始化状态,新添加机器人为好友或使用“变身机器人”触发\n",
"2. WAITUSERNAME - 检查发现不明确用户账户,等待账户信息\n",
"3. WAITNICKNAME - 检查用户尚未给本机器人起名,等待昵称信息\n",
"4. FREE - 目前基础信息完整,响应交互\n",
"\n",
"在下文中状态机在不同时机发生状态变化\n",
"\n",
"### 添加好友\n",
"```\n",
"async function onFriendship(friendship, bot) {\n",
" const contact = friendship.contact();\n",
" if (friendship.type() === bot.Friendship.Type.Receive) { // 1. receive new friendship request from new contact\n",
" let hasbotinfo = await ((key, wechatid) => {\n",
" return new Promise((resolve, reject) => {\n",
" Models.Botlists.findOne({\n",
" $or: [{ hello: key }, { wechatid: wechatid }]//根据触发语或微信号检查是否已有机器人\n",
" }).lean().exec((err, data) => {\n",
" if (err) resolve(false);\n",
" else resolve(data || false);\n",
" });\n",
" });\n",
" })(friendship.hello(), contact.id);\n",
" if (hasbotinfo === false && friendship.hello() == BOTCONFIG.autoregistHello) {//autoregistHello是默认通用的触发语\n",
" hasbotinfo = 'new wechat user';\n",
" }\n",
" if (hasbotinfo !== false) {\n",
" await friendship.accept();//接收好友申请\n",
" console.log(`Request from ${contact.name()} is accept succesfully!`);\n",
" if (hasbotinfo == 'new wechat user') {\n",
" RedisClient.set('BOT-' + contact.id, 'WAITUSERNAME');//状态机置为等待账户名\n",
" await fsmJob(bot, contact);\n",
" } else {\n",
" if (!isEmpty(hasbotinfo.nickname)) {\n",
" RedisClient.set('BOT-' + contact.id, 'HELLO');//状态机置为打招呼\n",
" await fsmJob(bot, contact, hasbotinfo.nickname);\n",
" } else {\n",
" RedisClient.set('BOT-' + contact.id, 'WAITNICKNAME');//状态机置为等待昵称\n",
" await fsmJob(bot, contact);\n",
" }\n",
" await ((_query, _updatedata) => {\n",
" return new Promise((resolve, reject) => {\n",
" Models.Botlists.updateOne(_query, _updatedata, (err, data) => {\n",
" if (err) {\n",
" console.error(err);\n",
" resolve(false);\n",
" } else resolve(data);\n",
" });\n",
" });\n",
" })({\n",
" _id: hasbotinfo._id\n",
" }, {\n",
" wechatid: contact.id//更新一下微信号\n",
" });\n",
" }\n",
" } else {\n",
" RedisClient.del('BOT-' + contact.id);\n",
" console.log(`no exist botinfo from ${friendship.hello()}`);\n",
" }\n",
" } else if (friendship.type() === bot.Friendship.Type.Confirm) { // 2. confirm friendship\n",
" console.log(`New friendship confirmed with ${contact.name()}`);\n",
" }\n",
"}\n",
"```\n",
"\n",
"### 接收消息\n",
"```\n",
"async function onMessage(msg, bot) {\n",
" const contact = msg.talker();\n",
" if (contact.id == 'wexin' || msg.self()) {\n",
" return;\n",
" }\n",
" await fsmJob(bot, contact, msg, true);//直接调用状态机动作\n",
"}\n",
"```\n",
"\n",
"### 状态机动作\n",
"```\n",
"async function fsmJob(bot, contact, msg, reply) {\n",
" let FSM = await (() => {//查询当前用户状态\n",
" return new Promise((resolve, reject) => {\n",
" RedisClient.get('BOT-' + contact.id, function (err, result) {\n",
" if (err) {\n",
" resolve(false);\n",
" } else {\n",
" resolve(result || '');\n",
" }\n",
" });\n",
" });\n",
" })();\n",
" if (FSM)\n",
" await botDoProcess[FSM](bot, contact, msg, reply);//直接执行对应动作\n",
" else {//说明是新用户\n",
" if (reply && msg) {//说明用户主动发消息给bot\n",
" //...有若干代码,主要思想就是根据用户发的消息,进行相应处理\n",
" }\n",
" }\n",
"}\n",
"```\n",
"主要逻辑在botDoProcess变量中实现\n",
"```\n",
"const botDoProcess = {\n",
" WAITUSERNAME: async (bot, contact, msg, reply)=>{//接收到的是用户账户名检查数据库是否存在存在则与bot绑定},\n",
" WAITNICKNAME: async (bot, contact, msg, reply)=>{//接收到的是bot昵称更新数据库},\n",
" FREE: async (bot, contact, msg, reply)=>{//处理指令【纪要、纪要发送邮箱、日报、帮助等】,对文本、音频、视频、附件、图片进行处理、归档、统计、分析},\n",
" HELLO: async (bot, contact, nickname)=>{\n",
" await contact.say(BOTCONFIG.language[_LANG].hello + nickname);\n",
" RedisClient.set('BOT-' + contact.id, 'FREE');//空闲\n",
" await fsmJob(bot, contact);\n",
" },\n",
"}\n",
"```\n",
"\n",
"### 分词与关键词提取\n",
"使用CppJieba提供底层分词算法实现\n",
"\n",
"## paddleWorkers-提供chatbot支撑服务\n",
"> 本次只使用了图片OCR这一个功能并且封装为http接口【因为pyton实现的paddleWorkernodejs实现的botPlatform】暴露给botPlatform使用得力于paddlehub的组件成熟度所以代码量很少这里给paddlehub点个赞\n",
"\n",
"```\n",
"from flask import request, Flask\n",
"import json\n",
"import paddlehub as hub\n",
"import cv2\n",
"import requests\n",
"import os\n",
"\n",
"\n",
"app = Flask(__name__)\n",
"ocr = None\n",
"\n",
"\n",
"@app.route('/imageOcr', methods=['GET'])\n",
"def image_ocr():\n",
" path = request.args.get('imagePath')\n",
" print(path)\n",
" file_name = os.path.basename(path)\n",
" file_down = requests.get(path)\n",
" with open('/mnt/'+file_name,'wb') as f:\n",
" f.write(file_down.content)\n",
" ocr_res = ocr.recognize_text(images=[cv2.imread('/mnt/'+file_name)])\n",
" data = ocr_res[0]['data']\n",
" res_data = {}\n",
" text = []\n",
" for item in data:\n",
" text.append(item['text'])\n",
" res_data['msg'] = '请求成功'\n",
" res_data['code'] = 200\n",
" res_data['data'] = text\n",
" os.remove('/mnt/'+file_name)\n",
" return json.dumps(res_data,ensure_ascii=False)\n",
"\n",
"\n",
"\n",
"def load_model():\n",
" global ocr\n",
" ocr = hub.Module(name=\"chinese_ocr_db_crnn_server\")\n",
"\n",
"\n",
"if __name__ == \"__main__\":\n",
" load_model()\n",
" app.run(host=\"0.0.0.0\", port=9000)\n",
"\n",
"```\n",
"## webpages-实现前端页面\n",
"> 虽然写前端页面的工作相比于高大上的机器学习、深度学习、人工智能、自然语言处理这些门类,显得不上档次,但是有一个\"友好一点点\"的界面总还算是件好事。\n",
"\n",
"基本上就是这个样子按组件化编写的页面\n",
"```\n",
"<a-col :xs=\"24\" :sm=\"12\" :md=\"12\" :lg=\"12\" :xl=\"12\">\n",
" <div class=\"duplicate-file-item border-size p-size dark-bg9\">\n",
" <p class=\"item-title\">\n",
" <img src=\"~assets/img/information-archiving/icon@2x.png\" alt />群聊文件\n",
" </p>\n",
" <duplicate-file v-if=\"fileBot.length\" :data=\"fileBot\"></duplicate-file>\n",
" <a-empty\n",
" v-else\n",
" :image=\"require('~/static/images/error/no-data@2x.png')\"\n",
" :image-style=\"{\n",
" height: '95px',\n",
" }\"\n",
" >\n",
" <span slot=\"description\">暂无数据~</span>\n",
" </a-empty>\n",
" </div>\n",
"</a-col>\n",
"```\n",
"开发语言为VUE使用Echarts的图表这部分就不赘述如何开发的了按设计稿实现就好了。\n",
"\n",
"# 尚未解决问题\n",
"1. 目前版本基于状态机的消息处理逻辑是不能应答非标准化的指令的,可以通过引入自然语言处理和多轮对话技术辅助触发状态变化;\n",
"2. 由于wechaty原理基于微信号的消息收发所以存在添加好友人数上线目前本方案可支持不添加好友情况下在微信群中@机器人的方式进行交互但复杂场景下还是需要添加好友的。考虑到这点botPlatform在最开始的配置信息中预留了多个wechaty实例使用的token数组并且通过循环创建的方式可以在服务器端启动多个wechaty实例待机并且根据策略派发实例响应用户交互ps.不成熟);\n",
"3. 基于wechaty作为消息收发中枢的模式无法满足生产环境下高可用的要求如果一个实例宕机其它实例目前没有平滑无缝接管服务的方式所以只能多拜拜大神了。\n",
"\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "PaddlePaddle 2.0.0b0 (Python 3.5)",
"language": "python",
"name": "py35-paddle1.2.0"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.4"
}
},
"nbformat": 4,
"nbformat_minor": 1
}