Play with Data on your Terminal
Introduction to the dataset
Vivek Vijayan
Sep 3, 2022

Movie poster collage

The What, Where and What of datasets

So you are told to analyse some dataset. The first two questions you must ask are:

  1. What is the data about?
  2. Where is the data?
  3. What is the meaning of the dataset?

Let us get into the data that we are concerned about in this course.

What is the data about?

Movies and Shows - the cast, the crew, the ratings, production date and the works - almost everything about every movie in the world is with IMDb. For non-commercial and personal use a subset of the data is made available.

Where is the data?

Here datasets.imdbws.com. In this case the dataset is a HTTP URL which have multiple links to datasets inside such as datasets.imdbws.com/name.basics.tsv.gz

What is the meaning of the dataset?

We know where the data is and some high-level idea about what the data is all about. However, at this stage we must have a clear idea about each element in the dataset - which means the documentation. We have a good documentation provided by IMDb for the data set above here www.imdb.com/interfaces/

Have a doubt?
Post it here, our mentors will help you out.