Analysing Netflix shows with pgAdmin and PostgreSQL
Can you learn how to use pgAdmin with PostgreSQL by… watching movies? Francesco Tisiot thinks so. Grab some popcorn and read on!
PostgreSQL is the perfect database for a huge variety of business cases, from IoT-type fast insertions to bulk analytics workloads. We even make it possible to explore exoplanet data!
Database interaction can be performed at various levels: command line tools are great for people glued to their keyboards… but some people prefer a slick point-and-click UI that shows all the information without having to write tons of commands. In this blog post, we’ll look at how to use Aiven for PostgreSQL with pgAdmin, one of the most popular tools for administration and development.
Here’s what this post will cover:
- Aiven PostgreSQL service creation, since you can only query a database if it exists
- How to install pgAdmin and connect to a PostgreSQL database
- How to load a Netflix dataset into PostgreSQL via pgAdmin
- How to check Samuel L. Jackson statistics via pgAdmin queries
1. Create a PostgreSQL database
If you don’t have a PostgreSQL database ready, why not create one with Aiven’s console? Here’s how:
- In the Aiven Console, click + Create a new service
- As Product, select PostgreSQL
- Select a cloud provider, a region and a plan. Once created, you’ll be able to find the information about your database by double-clicking on the instance name. The Overview tab shows information like Database Name, Host and Port, together with the pre-created
avnadminuser and password.
We just need to wait a couple of minutes until the Nodes lights become green, indicating our PostgreSQL service is ready to be used.
2. Install and Connect pgAdmin
I chose pgAdmin because it’s one of the most popular options among PostgreSQL client tools. It can run either as a web or desktop application, and for the purposes of this blog post we’ll choose the latter. You can download it from the pgAdmin website, and the installation just takes a couple of clicks.
Once installed and launched, the tool asks you to set a Master Password. This is a good way to secure all the credentials we’re going to store.
Now it’s time to connect to PostgreSQL. To do that, click Create New Server and fill in the required parameters. On the General tab, set the connection name; on the Connection tab, enter the hostname, port, and maintenance database as well as username and password used to authenticate. (All this information available in Aiven Console on the Overview tab). Finally, on the SSL tab, select
SSL Mode = required; this is Aiven's default security method.
We should now be able to access our PostgreSQL default screen. It contains a set of visualisations around the number of sessions, transactions per second, in and out tuples and server activity. Very useful information for monitoring our database.
pgAdmin offers a wide set of administration features, from local database backups management to Schema Diff, a valid tool for enabling a DDL comparison across databases.
3. Load Netflix data
pgAdmin is not only a monitoring and administration tool. Since it also offers development experience, we can use it to create all kinds of objects in the database via GUI or pgsql. For example, you can create a table, upload in it a csv file and start querying it.
We’ll use some data related to Neflix shows, taken from Kaggle. You only need a free account to download it. We can then unzip it and we’ll use the file
netflix_titles.csv for the rest of the blog.
We have the dataset, what about the table structure? We can generate the table DDL with the handy
ddlgenerator tool as explained in a previous post all we have to do is to run the following from a terminal window
pip install ddl
ddlgenerator postgres ~/Downloads/netflix_titles.csv
The above command should provide the following output, which we can now copy and paste in pgAdmin query editor available by selecting Tools -> Query Tools and run it.
CREATE TABLE netflix_titles (
show_id VARCHAR(5) NOT NULL,
type VARCHAR(7) NOT NULL,
title VARCHAR(104) NOT NULL,
date_added TIMESTAMP WITHOUT TIME ZONE,
release_year BIGINT NOT NULL,
duration VARCHAR(10) NOT NULL,
listed_in VARCHAR(79) NOT NULL,
description VARCHAR(248) NOT NULL
Browse the object tree in the pgAdmin left-side panel to Default-db -> Schemas -> Public -> Tables, and you should see a table named
To load the csv file into the
netflix_titles table, do the following:
- Right-click on the
netflix_titlestable and select
Importin the Import/Export slider.
- Select the
netflix_titles.csvfile from the local computer
- Enable the
- Click OK.
You can see your import thread being started and populating the
netflix_titles table. Once finished, pgAdmin shows the text
4. Have fun with the data!
Now that we’ve loaded the data, we can have some fun querying it. Let’s create a new query editor and check which were the top 3 countries based on movie production according to the dataset with the following query:
group by country
order by 2 desc
The result, unsurprisingly, puts United States on top, followed by India — and then Null, probably due to problems during the data collection.
Now it’s time to explore the filmography of one of my favourite actors:
Samuel L. Jackson. Which actor was in the most movies with him?
select trim(s.actor) actor,
from netflix_titles nt,
unnest(string_to_array(nt._cast, ',')) s(actor)
where _cast like '%Samuel L. Jackson%'
and trim(s.actor) <> 'Samuel L. Jackson'
group by trim(s.actor)
order by 2 desc
The following query allows us to explode, with the
unnest function, the
_cast field containing a comma separated list of actors for each movie. The result tells us that Tim Roth and Walton Goggins are the two lucky actors who shot three movies with Samuel L. Jackson.
Finally, what are the 10 most used words (with more than 3 characters) in movie titles?
select title_word, count(*) nr_movies
from netflix_titles nt,
unnest(string_to_array(upper(nt.title), ' ')) s(title_word)
where char_length(title_word) > 3
group by title_word
order by 2 desc
unnest function over the
title field enables us to verify that, as expected, LOVE is the most frequently used word. But run this query yourself to find out which surprising word takes third place!
pgAdmin is a perfect fit for PostgreSQL
pgAdmin offers a great UI for PostgreSQL: monitoring the database, comparing changes across instances, managing local backups, querying, importing and exporting data. The tool offers a series of functionalities which make developer’s and administrators’ lives easier, and daily tasks accessible.
If you want more information here are few links:
- pgAdmin website where you can find downloads, documentation and FAQs
- Aiven for PostgreSQL providing details about our offering
- Getting started with PostgreSQL containing useful information on how to create a service and interact with it in a variety of languages
- Aiven’s console where you can build your favourite open source data platform
And some further reading:
- How to stand up multiregion PostgreSQL with Pulumi to improve the performance of your PostgreSQL database
- Aiven for PostgreSQL for your Go application
Not using Aiven services yet? Sign up now for your free trial at https://console.aiven.io/signup!
In the meantime, make sure you follow our changelog and blog RSS feeds or our LinkedIn and Twitter accounts to stay up-to-date with product and feature-related news.
Originally published at https://aiven.io.