大数据概论(慕课版)

常信系列
分享 推荐 0 收藏 2 阅读 739
贺宁 (作者) 978-7-115-66435-8

关于本书的内容有任何问题,请联系 刘佳

1.双高校,常州信息职业技术学院出品。
2.融入课程思政元素。
3.配套丰富的视频及教学资源。
4.计划打造新形态教材,活页式。
5.本书可作为高职院校大数据技术、云计算技术等相关专业以及公选课的基础教材,可以作为数据科学和大数据开发爱好者自学的学习资料。
¥49.80 ¥42.33 (8.5 折)

内容摘要

内 容 提 要
本书针对大数据、云计算、软件技术、信息管理和其他相关专业学生的发展需求,系统、全面地介绍
了数据科学和大数据技术的基本知识和技能,详细介绍了数据科学基础、大数据的概论、大数据的行业应
用、大数据的基础存储方式、大数据技术基础、Hadoop 分布式平台、大数据分析、数据可视化以及大数
据价值,本书将通过浅显易懂的案例,将枯燥的技术或者数学知识简化至适合高职院校学生的层次,让读
者在学习过程中获得成就感,从而激发学生的学习兴趣。
本书既包括数据科学、大数据的基本知识,大数据技术的各个环节的初探,每个章节也会涵盖在大数
据理论和大数据技术在典型行业的具体应用,帮助读者在建立初步的大数据思维概念后,为后续深入学习
大数据技术打下良好的基础。
本书适用于高职院校一年级、二年级大数据相关专业,以及对于大数据有浓厚兴趣的学生使用。

目录

第 1 章 绪论及数据科学 ····························· 1
实例描述:评价哪个班级的考试
成绩更好? ················································ 1
1.1 数据科学简述 ··································· 1
1.1.1 数据科学的来源 ···································· 2
1.1.2 数据科学的基本内容 ···························· 4
1.2 数据科学的应用领域 ······················· 6
1.2.1 在计算机编程领域 ································ 6
1.2.2 在数据库领域 ········································ 8
1.2.3 数据处理流程 ······································ 10
1.2.4 数据科学所在的行业领域 ·················· 10
1.3 数据科学与统计学领域 ················· 11
1.3.1 统计学领域 ·········································· 12
1.3.2 概率论领域 ·········································· 16
1.3.3 数据领域 ·············································· 22
进阶案例——奥运会数据分析 ·············· 25
本章小结 ················································ 28
同步训练 ················································ 28
第 2 章 大数据概述 ································· 30
实例描述——国家电网公司数据
治理实践 ················································ 30
2.1 初识大数据 ····································· 33
2.2 大数据的概念 ································· 33
2.3 大数据的特征 ································· 34
2.3.1 Volume ·················································· 34
2.3.2 Variety ··················································· 34
2.3.3 Velocity ················································· 34
2.3.4 Value ····················································· 34
2.3.5 Veracity ················································· 35
2.4 大数据的存储 ································· 35
2.4.1 文件交互期 ·········································· 35
2.4.2 可扩展性解决方案 ······························ 35
2.4.3 容量和性能突破 ·································· 35
2.4.4 云文件系统的兴起 ······························ 35
2.5 数据类型 ········································· 36
2.5.1 结构化数据类型 ·································· 36
2.5.2 半结构化数据类型 ······························ 36
2.5.3 非结构化数据类型 ······························ 37
2.6 大数据的技术与应用 ····················· 37
2.6.1 大数据技术 ·········································· 38
2.6.2 大数据的应用 ······································ 39
2.6.3 大数据的挑战 ······································ 42
2.7 大数据的价值 ································· 42
进阶案例——智慧农业大数据案例 ······ 45
本章小结 ················································ 47
同步训练 ················································ 47
第3 章 大数据采集与预处理 ·················· 53
实例描述:飞机如何加固钢板提升
士兵生还率 ·············································· 53
3.1 大数据来源 ····································· 54
3.1.1 交易数据 ·············································· 54
3.1.2 移动通信数据 ······································ 54
3.1.3 人为数据 ·············································· 54
3.1.4 机器和传感器数据 ······························ 54
3.1.5 互联网开放数据 ·································· 55
3.1.6 常用数据平台: ·································· 55
3.2 大数据采集 ····································· 55
3.2.1 数据采集设备 ······································ 55
3.2.2 日志采集与用户行为链路分析 ·········· 58
3.2.3 大数据采集技术 ·································· 61
大数据概论(慕课版)
2
3.3 数据预处理概述 ····························· 63
3.3.1 数据清洗 ·············································· 64
3.3.2 数据集成 ·············································· 65
3.3.3 数据归约 ·············································· 67
进阶案例——数据采集之网络爬虫 ······ 69
本章小结 ················································ 72
同步训练 ················································ 72
第4 章 大数据存储 ································· 74
实例描述:HBase 在阿里巴巴集团中的
应用实践 ················································ 74
4.1 传统存储技术 ································· 75
4.1.1 存储的概念与作用 ······························ 75
4.1.2 存储体系结构 ······································ 78
4.1.3 存储解决方案分类 ······························ 79
4.2 数据库技术 ····································· 80
4.2.1 数据库的概念 ······································ 80
4.2.2 数据库技术的发展 ······························ 82
4.2.3 数据库分类 ·········································· 82
4.2.4 数据库体系架构 ·································· 84
4.3 云存储 ············································· 85
4.3.1 云存储的概念与特性 ·························· 85
4.3.2 云存储的结构模型 ······························ 86
4.3.3 云存储的应用模式 ······························ 87
4.4 新兴数据存储技术 ························· 88
4.4.1 新兴数据库技术 ·································· 88
4.4.2 数据库未来发展趋势 ·························· 90
4.4.3 大数据存储 ·········································· 92
4.4.4 数据中心与数据仓库 ·························· 92
进阶案例——国内外个人云存储产品
分析(个人云存储) ······························ 95
本章小结 ················································ 99
同步训练 ················································ 99
第5 章 大数据计算平台 ························ 103
实例描述:亚马逊大数据计算平台 ···· 103
5.1 云计算基本认知 ··························· 105
5.1.1 云计算定义与概念 ···························· 106
5.1.2 云计算平台种类 ································ 112
5.1.3 云计算的基础架构 ···························· 114
5.1.4 云平台的服务类型 ···························· 114
5.1.5 开源项目与商业化云平台 ················ 116
5.2 大数据存储与管理技术 ··············· 117
5.2.1 大数据存储的多样化 ························ 117
5.2.2 大数据管理技术 ································ 117
5.2.3 大数据处理关键技术 ························ 118
5.3 Hadoop 分布式平台 ····················· 120
5.3.1 Hadoop 的发展历史 ··························· 121
5.3.2 Hadoop 生态系统 ······························· 122
5.3.3 HDFS ·················································· 122
5.3.4 MapReduce ········································· 126
5.3.5 Hadoop 其他组件 ······························· 129
5.3.6 Hadoop 平台的搭建 ··························· 132
5.4 Spark ·············································· 134
5.4.1 Spark 平台架构 ·································· 134
5.4.2 Spark 的优势 ······································ 136
进阶案例——用大数据集群计算某地
气温变化 ·············································· 139
本章小结 ·············································· 141
同步训练 ·············································· 141
第6 章 大数据分析与挖掘 ···················· 144
实例描述:求职网站数据分析-用excel
工具进行数据分析 ································ 144
6.1 数据分析与数据挖掘的概念 ······· 147
6.1.1 数据分析 ············································ 148
6.1.2 数据挖掘 ············································ 148
6.2 大数据分析方法 ··························· 149
6.2.1 大数据采集技术 ································ 149
6.2.2 大数据预处理 ···································· 150
6.2.3 大数据存储与管理技术 ···················· 151
6.2.4 大数据分析与挖掘技术 ···················· 152
6.2.5 大数据可视化技术 ···························· 152
6.3 大数据分析应用工具 ··················· 153
6.3.1 传统的分析统计工具 ························ 153
6.3.2 新型的大数据分析实用工具 ············ 154
6.4 使用pandas 进行数据分析 ·········· 154
目录
3
6.4.1 数据对象 ············································ 155
6.4.2 文件读取 ············································ 156
6.4.3 文件存储 ············································ 157
6.4.4 分组与聚合 ········································ 158
6.5 数据挖掘 ······································· 160
6.5.1 数据挖掘之分类 ································ 160
6.5.2 数据挖掘之聚类 ································ 161
6.5.3 数据挖掘之关联规则 ························ 163
6.6 基于大数据的机器学习 ··············· 164
进阶案例——航空公司客户分析-基于
K-Means 聚类算法进行数据分析 ········ 165
本章小结 ·············································· 169
同步训练 ·············································· 169
第7 章 数据可视化 ······························· 171
实例描述——用excel 工具实现
数据可视化 ············································ 171
7.1 数据可视化之美 ··························· 173
7.2 数据可视化的作用 ······················· 175
7.2.1 数据可视化分类 ································ 175
7.2.2 数据可视化流程 ································ 176
7.2.3 数据可视化原则 ································ 176
7.2.4 数据可视化的作用 ···························· 177
7.3 数据可视化的经典图表 ··············· 178
7.3.1 折线图与柱状图 ································ 178
7.3.2 饼图与环状图 ···································· 179
7.3.3 雷达图与气泡图 ································ 180
7.3.4 词云图与地图 ···································· 182
7.4 实用的数据可视化的工具集合 ····· 184
7.4.1 Excel ··················································· 184
7.4.2 Tableau ················································ 186
7.4.3 Echarts ················································ 188
7.4.4 数据可视化编程工具 ························ 189
7.5 数据可视化的实现——
以Matplotlib 为例 ························ 191
7.5.1 使用Matplotlib 绘制柱形图 ············· 191
7.5.2 使用Matplotlib 绘制折线图 ············· 194
7.5.3 使用Matplotlib 绘制饼图 ················· 195
7.5.4 使用Matplotlib 绘制散点图 ············· 196
7.5.5 使用Matplotlib 绘制子图 ················· 197
进阶案例——2020 年国内生产总值
案例分析 ·············································· 199
本章小结 ·············································· 203
同步训练 ·············································· 203

读者评论

赶紧抢沙发哦!

我要评论

作者介绍

贺宁,从事大数据技术与应用专业相关教学和科研工作;主持建设全国第一批大数据技术与应用专业;主持阿里巴巴大数据分析与应用1+X证书初级书籍编写(出版中),主持江苏省高校自然科学基金面上项目课题1项,主持常州市龙城英才第八批项目1项,参与国家资源库课程建设2项,参与国家十三五研究课题《社区停车设施升级改造重点产品与装备研发》项目建设,主编《大数据可视化技术》书籍1本(已出版),个人申请软件著作权13项,指导学生申请实用新型专利2项,指导学生申请软件著作权15项。。

相关图书

人邮微信
本地服务
人邮微信
教师服务
二维码
读者服务
读者服务
返回顶部
返回顶部