Curriculum

  • 1

    INTRODUCTION TO THE COURSE: The Key Concepts and Software Tools

  • 2

    Reading in Data from Different Sources

    • Read in CSV & Excel Data

    • Read in Data from Online CSV

    • Read in Zipped File

    • Read Data from a Database

    • Read in JSON Data

    • Read in Data from PDF Documents

    • Read in Tables from PDF Documents

    • Conclusion to Section 2

  • 3

    Webscraping: Extract Data from Webpages

    • Read in Data From Online Google Sheets

    • Read in Data from Online HTML Tables-Part 1

    • Read in Data from Online HTML Tables-Part 2

    • Get and Clean Data from HTML Tables

    • Read Text Data from an HTML Page

    • Introduction to Selector Gadget

    • More Webscraping With rvest-IMDB Webpage

    • Another Way of Accessing Webpage Elements

    • Conclusions to Section 3

  • 4

    Introduction to APIs

    • What is an API?

    • Extract Text Data from Guardian Newspaper

  • 5

    Text Data Mining from Social Media

    • Extract Data from Facebook

    • Get More out Of Facebook

    • Set up a Twitter App for Mining Data from Twitter

    • Extract Tweets Using R

    • More Twitter Data Extraction Using R

    • Get Tweet Locations

    • Get Location Specific Trends

    • Learn More About the Followers of a Twitter Handle

    • Another Way of Extracting Information From Twitter- the rtweet Package

    • Geolocation Specific Tweets With "rtweet"

    • More Data Extraction Using rtweet

    • Locations of Tweets

    • Mining Github Using R

    • Set up the FourSquare App

    • Extract Reviews for Venues on FourSquare

    • Conclusions to Section 5

  • 6

    Exploring Text Data For Preliminary Ideas

    • Explore Tweet Data

    • A Brief Explanation

    • EDA With Text Data

    • Examine Multiple Document Corpus of Text

    • Brief Introduction to tidytext

    • Text Exploration & Visualization with tidytext

    • Explore Multiple Texts with tidytext

    • Count Unique Words in Tweets

    • Visualizing Text Data as TF-IDF

    • TF-IDF in Graphical Form

    • Conclusions to Section 6

  • 7

    Natural Language Processing: Sentiment Analysis

    • Wordclouds for Visualizing Tweet Sentiments: India's Demonetization Policy

    • Wordclouds for Visualizing Reviews

    • Tidy Wordclouds

    • Quanteda Wordcloud

    • Word Frequency in Text Data

    • Tweet Sentiments- Mugabe's Ouster

    • Tidy Sentiments- Sentiment Analysis Using tidytext

    • Examine the Polarity of Text

    • Examine the Polarity of Tweets

    • Topic Modelling a Document

    • Topic Modelling Multiple Documents

    • Topic Modelling Tweets Using Quanteda

    • Conclusions to Section 7

  • 8

    Text Data and Machine Learning

    • Clustering for Text Data

    • Clustering Tweets with Quanteda

    • Regression on Text Data

    • Identify Spam Emails with Supervised Classification

    • Introduction to RTextTools

    • More on RTextTools

    • The Doc2Vec Approach

    • Doc2Vec Approach For Predicting a Binary Outcome

    • Doc2Vec Approach for Multi-class Classification

  • 9

    Network Analysis

    • A Small (Social) Network

    • A More Theoretical Explanation

    • Build & Visualize a Network

    • Network of Emails

    • More on Network Visualization

    • Analysis of Tweet Network

    • Identify Word Pair Networks

    • Network of Words

About Your Instructor

Bestselling Instructor & Data Scientist (Cambridge University)

Minerva Singh

Hello. I am a PhD graduate from Cambridge University where I specialized in Tropical Ecology. I am also a Data Scientist on the side. As a part of my research I have to carry out extensive data analysis, including spatial data analysis.or this purpose I prefer to use a combination of freeware tools- R, QGIS and Python.I do most of my spatial data analysis work using R and QGIS. Apart from being free, these are very powerful tools for data visualization, processing and analysis. I also hold an MPhil degree in Geography and Environment from Oxford University. I have honed my statistical and data analysis skills through a number of MOOCs including The Analytics Edge (R based statistics and machine learning course offered by EdX), Statistical Learning (R based Machine Learning course offered by Standford online). In addition to spatial data analysis, I am also proficient in statistical analysis, machine learning and data mining. I also enjoy general programming, data visualization and web development. In addition to being a scientist and number cruncher, I am an avid traveler.

Pricing