1 SQL and sqlite3#

1.1 Database#

Database Management System (DBMS): the software system that manages and interacts with databases, providing functionalities such as data storage, retrieval, manipulation, and security (The library).

Database: the actual collection of data organized in a structured format (The bookshelves).

Relational databases (or SQL databases) and nonrelational databases (or NoSQL databases) are two major types of database management systems (DBMS) that differ in their structure, storage models, and suitability for different types of applications.

Feature

Relational Databases

Non-Relational Databases

Schema

Fixed schema with predefined tables and columns

Flexible schema without fixed structure

Query Language

Structured Query Language (SQL)

Various (e.g., MongoDB Query Language)

Scalability

Vertical scaling (adding more power to a single server)

Horizontal scaling (adding more servers)

Data Model

Tabular, structured data

Various (document, key-value, graph)

Data Relationships

Enforced through foreign keys and joins

Managed within the application or inherent to the database type (e.g., graph)

Use Cases

Financial systems; Enterprise management applications; E-commerce

Large scale; Big Data and Real-Time applications; Content management

Popular database

MySQL, PostgreSQL, Oracle Database, Microsoft SQL Server, SQLite

MongoDB, Redis, Neo4j (Graph database)*

*Graph databases are not based on tables with rows and columns, so they are not SQL databases or traditional relational databases.

1.2 sqlite3#

SQLite is an embedded relational database management system (RDBMS). It’s a software library that implements a self-contained, serverless, zero-configuration, transactional SQL database engine.

sqlite3 refers to both the C library used to interface with SQLite databases and the command-line tool provided with SQLite for interacting with SQLite databases. In addition, sqlite3 is a Python library that provides an interface for interacting with SQLite databases.

import sqlite3
import warnings
warnings.filterwarnings('ignore')

1.2.1 Connect / create db#

# Connect/create to the database.
conn = sqlite3.connect('MCU.db')

1.2.2 Create a cursor#

A cursor is a database object used in various database management systems (DBMS) to retrieve, manipulate, and navigate through the rows of a result set obtained from a query.

# Creates a cursor object usedto execute SQL queries and interact with the database.
cur = conn.cursor()
# Print the memory location (hexadecimal format) of the object.
print(cur)
<sqlite3.Cursor object at 0x105b835c0>

1.2.3 Execute a query#

# Create table with multi-lines (use '''  ''').
cur.execute(
    '''
    CREATE TABLE MCU_Movies (
    movie_id INTEGER PRIMARY KEY,
    title TEXT NOT NULL,
    year INTEGER,
    date DATE,
    time TIME,
    score REAL)
    '''
    )
<sqlite3.Cursor at 0x105b835c0>
# Create data with a Python list.
movie_data = [
(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9),
(1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7),
(1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0),
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0),
(1005, 'Captain America: The First Avenger', 2011, '2011-03-15', '06:33:00', 6.9),
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0),
(1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2),
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9),
(1009, 'Captain America: The Winter Soldier', 2014, '2014-11-12', '13:56:00', 7.7),
(1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0),
(1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3),
(1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3),
(1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8),
(1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5),
(1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6),
(1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4),
(1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9),
(1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3),
(1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4),
(1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1),
(1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9),
(1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)
]
# Insert data
# INSERT INTO: This is the standard SQL command to add new records (rows) into a specified table.
# OR IGNORE: This clause modifies the behaviour of the INSERT command. 
# If there is a conflict (such as a duplicate primary key or a unique constraint violation), 
# the OR IGNORE clause tells SQLite to ignore the insertion of that particular row instead of throwing an error. 
# This is useful for avoiding errors when you are not sure if the data might already exist in the table.
# The VALUES clause specifies the values to be inserted into the columns of the table.
# Each ? corresponds to a value for each column in the MCU_Movies table. 
# For example, if the table has four columns, the placeholders will be replaced with actual values when the query is executed.
cur.executemany("INSERT OR IGNORE INTO MCU_Movies VALUES(?,?,?,?,?,?)", movie_data)
<sqlite3.Cursor at 0x105b835c0>

1.2.4. Result fetchall and fetchone#

.fetchall() is used to retrieve each row sequentially from the result set.

.fetchone() is a method used in database programming to retrieve the next row of a query result set.

# Execute an SQL query to fetch all records from the MCU_Movies table (The table info has been saved in database MCU.db).
res = cur.execute('SELECT * FROM MCU_Movies')

# Retrieve and return all the rows from the executed query above.
res.fetchall()
[(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9),
 (1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7),
 (1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0),
 (1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0),
 (1005,
  'Captain America: The First Avenger',
  2011,
  '2011-03-15',
  '06:33:00',
  6.9),
 (1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0),
 (1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2),
 (1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9),
 (1009,
  'Captain America: The Winter Soldier',
  2014,
  '2014-11-12',
  '13:56:00',
  7.7),
 (1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0),
 (1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3),
 (1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3),
 (1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8),
 (1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5),
 (1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6),
 (1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4),
 (1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9),
 (1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3),
 (1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4),
 (1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1),
 (1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9),
 (1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)]
# Executes an SQL query to retrieve the 'title' column from the 'MCU_Movies' table.
res.execute('SELECT title FROM MCU_Movies')

# Fetches the first row of the result set returned by the query.
res.fetchone()
('Iron Man',)
# Executes an SQL query to retrieve the 'title' column from the 'MCU_Movies' table.
res.execute('SELECT title FROM MCU_Movies')

# Fetches all rows returned by the query.
res.fetchall()
[('Iron Man',),
 ('The Incredible Hulk',),
 ('Iron Man 2',),
 ('Thor',),
 ('Captain America: The First Avenger',),
 ('The Avengers',),
 ('Iron Man 3',),
 ('Thor: The Dark World',),
 ('Captain America: The Winter Soldier',),
 ('Guardians of the Galaxy',),
 ('Avengers: Age of Ultron',),
 ('Ant-Man',),
 ('Captain America: Civil War',),
 ('Doctor Strange',),
 ('Guardians of the Galaxy Vol. 2',),
 ('Spider-Man: Homecoming',),
 ('Thor: Ragnarok',),
 ('Black Panther',),
 ('Avengers: Infinity War',),
 ('Ant-Man and The Wasp',),
 ('Captain Marvel',),
 ('Avengers: Endgame',)]

1.2.5. Commit#

con.commit() is used in the context of database operations, specifically when working with SQL databases through a connection object (named con). This function is crucial for managing transactions.

# Commit the connection object to the database (transaction from table MCU_movies to MCU.db now)
conn.commit()

1.2.6. Close the cursor#

cur.close() effectively releases the resources that the cursor was using, like memory or database connections.

# Close the cursor.
cur.close()

1.2.7. Close the connection#

conn.close() is used to close a database connection in Python.

# Close the connection.
conn.close()

1.3 SQL syntax (PART 1)#

In SQL, a clause is a component of a query that specifies a particular operation or condition. Each clause performs a specific function in the query and contributes to the overall task of retrieving, modifying, or managing data within a database.

In this section, we will practice primary clauses in SQL and compare them to Pandas in python.

SQL Order of Execution:

  • FROM – Determines the source tables.

  • JOIN – Combines data from multiple tables.

  • WHERE – Filters rows before grouping.

  • GROUP BY – Groups the data for aggregation.

  • HAVING – Filters grouped records.

  • SELECT – Chooses which columns to return.

  • DISTINCT – Removes duplicate rows.

  • ORDER BY – Sorts the final result.

  • LIMIT/OFFSET – Restricts the number of rows returned.

SQL code example:

SELECT company, COUNT(*) AS num_employees -- 5 
FROM employees  -- 1
WHERE salary > 5000  -- 2
GROUP BY company -- 3
HAVING COUNT(*) > 5 -- 4
ORDER BY num_employees DESC  -- 6
LIMIT 10; -- 7

1.3.1 SELECT FROM#

Identifies the columns and tables to be retrieved from the database.

# As the connection has been closed we need to reconnect to the database created in the last section.
conn = sqlite3.connect('MCU.db')
# Create cur.
cur = conn.cursor()
cur
<sqlite3.Cursor at 0x105ea88c0>

Select all column values from the table#

# Select all columns (with values) from the table.
query = "SELECT * FROM MCU_Movies" 
# Executation and fetch all info. 
res = cur.execute(query)
res.fetchall()
[(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9),
 (1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7),
 (1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0),
 (1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0),
 (1005,
  'Captain America: The First Avenger',
  2011,
  '2011-03-15',
  '06:33:00',
  6.9),
 (1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0),
 (1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2),
 (1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9),
 (1009,
  'Captain America: The Winter Soldier',
  2014,
  '2014-11-12',
  '13:56:00',
  7.7),
 (1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0),
 (1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3),
 (1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3),
 (1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8),
 (1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5),
 (1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6),
 (1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4),
 (1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9),
 (1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3),
 (1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4),
 (1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1),
 (1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9),
 (1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)]
# Use pandas to open the table: retrieving the data from the query (`"SELECT * FROM MCU_Movies"`) and then save it to a DataFrame (`df`).
import pandas as pd
df = pd.read_sql_query(query, conn)
# To be notified, the DataFrame is saved in a different memory location with a different index compared to the cursor.
print(hex(id(df)))  
0x127b4ded0
# Df info.
df
movie_id title year date time score
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9
1 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
6 1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2
7 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9
8 1009 Captain America: The Winter Soldier 2014 2014-11-12 13:56:00 7.7
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8
13 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5
14 1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6
15 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4
16 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9
17 1018 Black Panther 2018 2018-04-23 03:05:00 7.3
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
19 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1
20 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4

Select one column values from the table#

res = cur.execute("SELECT title FROM MCU_Movies")
res.fetchall()
[('Iron Man',),
 ('The Incredible Hulk',),
 ('Iron Man 2',),
 ('Thor',),
 ('Captain America: The First Avenger',),
 ('The Avengers',),
 ('Iron Man 3',),
 ('Thor: The Dark World',),
 ('Captain America: The Winter Soldier',),
 ('Guardians of the Galaxy',),
 ('Avengers: Age of Ultron',),
 ('Ant-Man',),
 ('Captain America: Civil War',),
 ('Doctor Strange',),
 ('Guardians of the Galaxy Vol. 2',),
 ('Spider-Man: Homecoming',),
 ('Thor: Ragnarok',),
 ('Black Panther',),
 ('Avengers: Infinity War',),
 ('Ant-Man and The Wasp',),
 ('Captain Marvel',),
 ('Avengers: Endgame',)]
df.title
0                                Iron Man
1                     The Incredible Hulk
2                              Iron Man 2
3                                    Thor
4      Captain America: The First Avenger
5                            The Avengers
6                              Iron Man 3
7                    Thor: The Dark World
8     Captain America: The Winter Soldier
9                 Guardians of the Galaxy
10                Avengers: Age of Ultron
11                                Ant-Man
12             Captain America: Civil War
13                         Doctor Strange
14         Guardians of the Galaxy Vol. 2
15                 Spider-Man: Homecoming
16                         Thor: Ragnarok
17                          Black Panther
18                 Avengers: Infinity War
19                   Ant-Man and The Wasp
20                         Captain Marvel
21                      Avengers: Endgame
Name: title, dtype: object
df['title']
0                                Iron Man
1                     The Incredible Hulk
2                              Iron Man 2
3                                    Thor
4      Captain America: The First Avenger
5                            The Avengers
6                              Iron Man 3
7                    Thor: The Dark World
8     Captain America: The Winter Soldier
9                 Guardians of the Galaxy
10                Avengers: Age of Ultron
11                                Ant-Man
12             Captain America: Civil War
13                         Doctor Strange
14         Guardians of the Galaxy Vol. 2
15                 Spider-Man: Homecoming
16                         Thor: Ragnarok
17                          Black Panther
18                 Avengers: Infinity War
19                   Ant-Man and The Wasp
20                         Captain Marvel
21                      Avengers: Endgame
Name: title, dtype: object

1.3.2 WHERE#

Filters the records based on specified conditions.

This section also introduces df.query(), which is a powerful data manipulation library in python. This method provides a more readable and concise way to filter rows based on column values, compared to using traditional indexing and boolean indexing.

score >= 8#

res = cur.execute("SELECT * FROM MCU_Movies WHERE score >= 8")
res.fetchall()
[(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0),
 (1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0),
 (1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4),
 (1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)]
df[df.score >= 8]
movie_id title year date time score
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4
df.query('score >= 8')
movie_id title year date time score
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4

year >= 2017#

res = cur.execute("SELECT * FROM MCU_Movies WHERE year >= 2017")
res.fetchall()
[(1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6),
 (1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4),
 (1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9),
 (1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3),
 (1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4),
 (1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1),
 (1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9),
 (1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)]
df.year.dtype 
dtype('int64')
df[df.year >= 2017] # 2017 is integer 
movie_id title year date time score
14 1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6
15 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4
16 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9
17 1018 Black Panther 2018 2018-04-23 03:05:00 7.3
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
19 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1
20 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4

date < 2018-01-01#

res = cur.execute("SELECT * FROM MCU_Movies WHERE date < '2018-01-01'")
res.fetchall()
[(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9),
 (1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7),
 (1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0),
 (1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0),
 (1005,
  'Captain America: The First Avenger',
  2011,
  '2011-03-15',
  '06:33:00',
  6.9),
 (1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0),
 (1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2),
 (1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9),
 (1009,
  'Captain America: The Winter Soldier',
  2014,
  '2014-11-12',
  '13:56:00',
  7.7),
 (1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0),
 (1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3),
 (1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3),
 (1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8),
 (1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5),
 (1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6),
 (1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4),
 (1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9)]

Note:

When you read date or datetime columns from a SQL table into a pandas DataFrame and they are of object type (strings), you can still use them for filtering conditions directly, but it requires understanding how string comparisons work.

In pandas, if a date column is stored as an object type (string), you can still perform filtering operations because string comparisons in Python follow lexicographical order, which can be useful for date strings in the format ‘YYYY-MM-DD’ or ‘YYYY-MM-DD HH:MM:SS’.

This means you cannot use some datetime attribute like ‘date.dayofweek’ or ‘date.dayofyear’ unless the object type changed to pandas datetime type.

print(df.date.dtype) # Here, date and time column have not been changed to datetime.
object
type(df.date[1])
str
df[df.date < '2018-01-01']
movie_id title year date time score
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9
1 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
6 1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2
7 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9
8 1009 Captain America: The Winter Soldier 2014 2014-11-12 13:56:00 7.7
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8
13 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5
14 1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6
15 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4
16 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9
df.query("date < '2018-01-01'")
movie_id title year date time score
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9
1 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
6 1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2
7 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9
8 1009 Captain America: The Winter Soldier 2014 2014-11-12 13:56:00 7.7
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8
13 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5
14 1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6
15 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4
16 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9

time < 12:20:01#

res = cur.execute("SELECT * FROM MCU_Movies WHERE date < time < '12:20:01'")
res.fetchall()
[(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9),
 (1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7),
 (1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0),
 (1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0),
 (1005,
  'Captain America: The First Avenger',
  2011,
  '2011-03-15',
  '06:33:00',
  6.9),
 (1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0),
 (1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2),
 (1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9),
 (1009,
  'Captain America: The Winter Soldier',
  2014,
  '2014-11-12',
  '13:56:00',
  7.7),
 (1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0),
 (1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3),
 (1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3),
 (1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8),
 (1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5),
 (1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6),
 (1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4),
 (1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9),
 (1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3),
 (1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4),
 (1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1),
 (1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9),
 (1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)]
df[df.time < '12:20:01']
movie_id title year date time score
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8
15 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4
16 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9
17 1018 Black Panther 2018 2018-04-23 03:05:00 7.3
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4
df.query("time < '12:20:01'")
movie_id title year date time score
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8
15 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4
16 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9
17 1018 Black Panther 2018 2018-04-23 03:05:00 7.3
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4

title = Thor#

res = cur.execute("SELECT * FROM MCU_Movies WHERE title = 'Thor'")
res.fetchall()
[(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0)]
df[df.title == 'Thor']
movie_id title year date time score
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
df.query("title == 'Thor'")
movie_id title year date time score
3 1004 Thor 2011 2011-01-01 07:47:00 7.0

1.3.3 AND, OR, NOT used in WHERE#

Filters the records based on multi-conditions.

AND#

# Selects the 'title' of MCU movies released after 2018 with a score greater than 7.
res = cur.execute("SELECT title FROM MCU_Movies WHERE year > 2018 AND score > 7")
res.fetchall()
[('Avengers: Endgame',)]
df[(df.year > 2018) & (df.score > 7)].title
21    Avengers: Endgame
Name: title, dtype: object
df.query('year > 2018 & score > 7') .title
21    Avengers: Endgame
Name: title, dtype: object

OR#

# Selects movie titles from the MCU_Movies table where the release date is before January 1, 2012,  
# or the movie has a score greater than 8.
res = cur.execute("SELECT title FROM MCU_Movies WHERE date < '2012-01-01' OR score > 8")
res.fetchall()
[('Iron Man',),
 ('The Incredible Hulk',),
 ('Iron Man 2',),
 ('Thor',),
 ('Captain America: The First Avenger',),
 ('Avengers: Infinity War',),
 ('Avengers: Endgame',)]
df[(df.date < '2012-01-01') | (df.score > 8)].title
0                               Iron Man
1                    The Incredible Hulk
2                             Iron Man 2
3                                   Thor
4     Captain America: The First Avenger
18                Avengers: Infinity War
21                     Avengers: Endgame
Name: title, dtype: object
df.query('date < "2012-01-01" | score > 8').title # you can use '' or "" to quote the syntax. 
0                               Iron Man
1                    The Incredible Hulk
2                             Iron Man 2
3                                   Thor
4     Captain America: The First Avenger
18                Avengers: Infinity War
21                     Avengers: Endgame
Name: title, dtype: object

NOT#

# Execute query to fetch movie titles from MCU_Movies table where the release date is on or after '2016-01-01' 
# and the score is not equal to 6
res = cur.execute("SELECT title FROM MCU_Movies WHERE NOT date < '2016-01-01' AND score != 6")
res.fetchall()
[('Captain America: Civil War',),
 ('Doctor Strange',),
 ('Guardians of the Galaxy Vol. 2',),
 ('Spider-Man: Homecoming',),
 ('Thor: Ragnarok',),
 ('Black Panther',),
 ('Avengers: Infinity War',),
 ('Ant-Man and The Wasp',),
 ('Captain Marvel',),
 ('Avengers: Endgame',)]
df[~(df.date < '2016-01-01') & (df.score != 6)].title
12        Captain America: Civil War
13                    Doctor Strange
14    Guardians of the Galaxy Vol. 2
15            Spider-Man: Homecoming
16                    Thor: Ragnarok
17                     Black Panther
18            Avengers: Infinity War
19              Ant-Man and The Wasp
20                    Captain Marvel
21                 Avengers: Endgame
Name: title, dtype: object
df.query('~ (date < "2016-01-01" &  score != 6)').title
12        Captain America: Civil War
13                    Doctor Strange
14    Guardians of the Galaxy Vol. 2
15            Spider-Man: Homecoming
16                    Thor: Ragnarok
17                     Black Panther
18            Avengers: Infinity War
19              Ant-Man and The Wasp
20                    Captain Marvel
21                 Avengers: Endgame
Name: title, dtype: object

1.3.4 COUNT and DISTINCT#

The COUNT function in SQL returns the number of rows.

COUNT(DISTINCT column) returns the number of unique, non-null values in a specified column.

# Execute the SQL query to count the number of MCU movies with a score greater than 8
res = cur.execute("SELECT COUNT(*) FROM MCU_Movies WHERE score > 8")
res.fetchall()
[(2,)]
len(df[df.score > 8])
2
# SQL query to fetch unique years from the MCU_Movies table
res = cur.execute("SELECT DISTINCT year FROM MCU_Movies")
res.fetchall()
[(2008,),
 (2010,),
 (2011,),
 (2012,),
 (2013,),
 (2014,),
 (2015,),
 (2016,),
 (2017,),
 (2018,),
 (2019,)]
df.year.unique()
array([2008, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019])
# Executes a query to count the number of distinct/unique years in the MCU_Movies table
res = cur.execute("SELECT COUNT(DISTINCT year) FROM MCU_Movies")
res.fetchall()
[(11,)]
df.year.nunique()
11
len(df.year.unique())
11

1.3.5 MAX, MIN, AVG#

# Fetches the lowest score from the MCU_Movies table.
res = cur.execute("SELECT MIN(score) FROM MCU_Movies")
res.fetchall()
[(6.7,)]
df.score.min()
6.7
# Fetches the highest score from the MCU_Movies table.
res = cur.execute("SELECT MAX(score) FROM MCU_Movies")
res.fetchall()
[(8.4,)]
df.score.max()
8.4
# Executes a query to get the average score from the MCU_Movies table.
res = cur.execute("SELECT AVG(score) FROM MCU_Movies")
res.fetchall()
[(7.463636363636363,)]
 df.score.mean()
7.463636363636365

1.3.6 IN NOT IN#

IN operator checks if a value matches any value in a list or subquery.

NOT IN checks if a value does not match any value in the list or subquery.

IN#

# Fetch all movies from the 'MCU_Movies' table where the release year is either 2011, 2015, or 2016.
res = cur.execute("SELECT * FROM MCU_Movies WHERE year IN (2011, 2015, 2016)")
res.fetchall()
[(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0),
 (1005,
  'Captain America: The First Avenger',
  2011,
  '2011-03-15',
  '06:33:00',
  6.9),
 (1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3),
 (1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3),
 (1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8),
 (1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5)]
select_years = [2011, 2015, 2016]
df[df.year.isin(select_years)]
movie_id title year date time score
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8
13 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5
df.query('year in [2011, 2015, 2016]')
movie_id title year date time score
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8
13 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5
df.query('year in @select_years') # the @ symbol is used to denote a variable name when constructing a query string with parameters.
movie_id title year date time score
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8
13 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5

NOT IN#

res = cur.execute("SELECT * FROM MCU_Movies WHERE year NOT IN (2008, 2011, 2012, 2013, 2015, 2016, 2017)")
res.fetchall()
[(1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0),
 (1009,
  'Captain America: The Winter Soldier',
  2014,
  '2014-11-12',
  '13:56:00',
  7.7),
 (1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0),
 (1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3),
 (1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4),
 (1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1),
 (1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9),
 (1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)]
select_years = [2008, 2011, 2012, 2013, 2015, 2016, 2017]
df[~df.year.isin(select_years)]
movie_id title year date time score
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
8 1009 Captain America: The Winter Soldier 2014 2014-11-12 13:56:00 7.7
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
17 1018 Black Panther 2018 2018-04-23 03:05:00 7.3
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
19 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1
20 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4
df.query('year not in @select_years')
movie_id title year date time score
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
8 1009 Captain America: The Winter Soldier 2014 2014-11-12 13:56:00 7.7
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
17 1018 Black Panther 2018 2018-04-23 03:05:00 7.3
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
19 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1
20 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4

1.3.7 BETWEEN#

BETWEEN operator is used to filter the result set within a specified range, inclusive of the boundary values.

# Fetch all MCU movies with a score between 8 and 9.
res = cur.execute("SELECT * FROM MCU_Movies WHERE score BETWEEN 8 AND 9")
res.fetchall()
[(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0),
 (1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0),
 (1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4),
 (1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)]
df[df.score.between(8, 9)]
movie_id title year date time score
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4
df[(df.score >= 8) & (df.score <= 9)]
movie_id title year date time score
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4
df.query('score >= 8 and score <= 9')
movie_id title year date time score
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4

1.3.8 LIKE#

LIKE operator is used to search for a specified string pattern within a column.

# Selects all columns from the MCU_Movies table where the title starts with 'a'.
res = cur.execute("SELECT * FROM MCU_Movies WHERE title LIKE 'a%'") # upper and lower case is not sensitive in SQL
res.fetchall()
[(1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3),
 (1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3),
 (1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4),
 (1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1),
 (1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)]
df[df.title.str.startswith('A')] # upper and lower case is sensitive in python 
movie_id title year date time score
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
19 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4
df.query("title.str.startswith('A')")
movie_id title year date time score
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
19 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4
# Selects all columns from the MCU_Movies table where the title ends with 'k'.
res = cur.execute("SELECT * FROM MCU_Movies WHERE title LIKE '%k'")
res.fetchall()
[(1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7),
 (1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9)]
df[df.title.str.endswith('k')]
movie_id title year date time score
1 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7
16 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9
# Selects all columns from the MCU_Movies table where the title starts with 'T' and ends with 'r'.
res = cur.execute("SELECT * FROM MCU_Movies WHERE title LIKE 'T__r'")
res.fetchall()
[(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0)]

Python symbols

^: Start with

T: Followed by a literal ‘T’

..: Followed by any two characters

r: Followed by a literal ‘r’

$: End with

df[df.title.str.contains(r'^T..r$')]
movie_id title year date time score
3 1004 Thor 2011 2011-01-01 07:47:00 7.0

1.3.9 ORDER BY#

ORDER BY in SQL is used to sort the result set of a query based on specified columns in ascending or descending order.

# Retrieves all columns from the 'MCU_Movies' table, sorting the results in descending order by the 'title' column.
res = cur.execute("SELECT * FROM MCU_Movies ORDER BY title DESC")
res.fetchall()
[(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9),
 (1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9),
 (1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0),
 (1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7),
 (1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0),
 (1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4),
 (1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2),
 (1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0),
 (1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9),
 (1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6),
 (1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0),
 (1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5),
 (1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9),
 (1009,
  'Captain America: The Winter Soldier',
  2014,
  '2014-11-12',
  '13:56:00',
  7.7),
 (1005,
  'Captain America: The First Avenger',
  2011,
  '2011-03-15',
  '06:33:00',
  6.9),
 (1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8),
 (1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3),
 (1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4),
 (1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4),
 (1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3),
 (1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1),
 (1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3)]
df.sort_values(by='title', ascending=False)
movie_id title year date time score
7 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9
16 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
1 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
15 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4
6 1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9
14 1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0
13 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5
20 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9
8 1009 Captain America: The Winter Soldier 2014 2014-11-12 13:56:00 7.7
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8
17 1018 Black Panther 2018 2018-04-23 03:05:00 7.3
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3
19 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3

1.3.10 LIMIT / OFFSET#

LIMIT / OFFSET controls the number of rows returned by a query and to specify where to start returning rows from, respectively.

# Selects all columns from the first 3 rows of the MCU_Movies table
res = cur.execute("SELECT * FROM MCU_Movies LIMIT 3")
res.fetchall()
[(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9),
 (1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7),
 (1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0)]
df.head(3)
movie_id title year date time score
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9
1 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
df.iloc[0:3]
movie_id title year date time score
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9
1 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
# Fetch 3 rows starting from the 6th row (offset 5) in the MCU_Movies table
res = cur.execute("SELECT * FROM MCU_Movies LIMIT 3 OFFSET 5")
res.fetchall()
[(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0),
 (1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2),
 (1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9)]
df.iloc[5:8]
movie_id title year date time score
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0
6 1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2
7 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9

1.4 SQL syntax (PART 2)#

1.4.1 PRAGMA#

The PRAGMA statement in SQL is used to query or modify various internal parameters of the SQLite database engine.

# Connect to the created database.
conn = sqlite3.connect('MCU.db')
# Creates a cursor object usedto execute SQL queries and interact with the database as cursor colsed in the last section.
cur = conn.cursor()
# Execute the PRAGMA table_info command to retrieve metadata in the 'MCU_movies' table
res = cur.execute("PRAGMA table_info(MCU_movies)")
res.fetchall()
[(0, 'movie_id', 'INTEGER', 0, None, 1),
 (1, 'title', 'TEXT', 1, None, 0),
 (2, 'year', 'INTEGER', 0, None, 0),
 (3, 'date', 'DATE', 0, None, 0),
 (4, 'time', 'TIME', 0, None, 0),
 (5, 'score', 'REAL', 0, None, 0)]

1.4.2 Primary key and Foregin key#

EXAMPLE: ADIM.db
Students
StudentID (SI)
StudentName
HomePhoneNumber
HomeAddress
FeeStatus
Note: PK: SI
Modules
ModuleID (MI)
ModuleName
Lecturers
NumberofCredits
HELevel
Note: PK: MI
Enrolments
EnrolmentID (EI)
StudentID (SI)
ModuleID (MI)
EnrollmentDate
AdditionalInformation
Note: PK: EI, FK: SI, MI

A PRIMARY KEY (PK) constraint uniquely identifies each record in a table (It ensures that no two rows have the same value).

  • Primary keys must contain UNIQUE values and cannot contain NULL values;

  • A table can have only one primary key (ususally one column);


CREATE TABLE Students (
    StudentID INT PRIMARY KEY,  -- Student ID is the primary key
    StudentName VARCHAR(50)
);

ALTER TABLE Students 
ADD PRIMARY KEY (StudentID);

  • And a primary key can also consist of multiple columns (called a composite key) when a single column isn’t enough to uniquely identify a row.

CREATE TABLE Enrollments (
    StudentID INT,
    ModuleID INT,
    EnrollmentDate DATE,
    PRIMARY KEY (StudentID, ModuleID)  -- Composite primary key
);

A Foreign Key (FK) is a column (or a set of columns) in one table that establishes a relationship with the with the Primary Key of another table. Can be NULL, Multiple foreign keys allowed.


CREATE TABLE Enrollments (
    EnrollmentID INT PRIMARY KEY,
    StudentID INT,
    CourseID INT,
    FOREIGN KEY (StudentID) REFERENCES Students(StudentID), -- StudentID references the primary key in Students table
    FOREIGN KEY (ModuleID) REFERENCES Modules(ModuleID)     -- Foreign key for ModuleID
);

ALTER TABLE Enrollments 
ADD FOREIGN KEY (StudentID) REFERENCES Students(StudentID);

ALTER TABLE Enrollments 
ADD CONSTRAINT fk_student 
FOREIGN KEY (StudentID) REFERENCES Students(StudentID);
ALTER TABLE Enrollments
DROP CONSTRAINT fk_student;

** SQL Constraints

Check Primary key in a table#

# No primary key found in MCU_movies
res = cur.execute("PRAGMA primary_key_list(MCU_movies)")
res.fetchall()
[]

Check Foreign key in a table#

# No foreign key found in MCU_movies
res = cur.execute("PRAGMA foreign_key_list(MCU_movies)")
res.fetchall()
[]

Linking tables using keys#

# Create table Actors 
res = cur.execute('''
    CREATE TABLE Actors (
    actor_id INTEGER PRIMARY KEY,
    actor_name TEXT,
    movie_id INTEGER,
    gender TEXT,
    nationality TEXT,
    date_birth DATE,
    FOREIGN KEY (movie_id) REFERENCES MCU_Movies(movie_id) 
)''')
res.fetchall()
[]

Checking tables in a database#

The sqlite_master table is a special system table that stores information about all the schema objects in the database, such as tables, indexes, triggers, and views.

# Here we have table Actors without any values
res = cur.execute("SELECT name FROM sqlite_master WHERE type='table'") 
res.fetchall()
[('MCU_Movies',), ('Actors',)]
## actor table information
actor_data = [
(1, 'Robert Downey Jr.', 1001, 'Male', 'American', '1965-04-04'),
 (2, 'Chris Evans', 1005, 'Male', 'American', '1981-06-13'),
 (3, 'Mark Ruffalo', 1006, 'Male', 'American', '1967-11-22'),
 (4, 'Chris Hemsworth', 1004, 'Male', 'Australian', '1983-08-11'),
 (5, 'Scarlett Johansson', 1006, 'Female', 'American', '1984-11-22'),
 (6, 'Jeremy Renner', 1006, 'Male', 'American', '1971-01-07'),
 (7, 'Tom Hiddleston', 1004, 'Male', 'British', '1981-02-09'),
 (8, 'Clark Gregg', 1001, 'Male', 'American', '1962-04-02'),
 (9, 'Paul Rudd', 1012, 'Male', 'American', '1969-04-06'),
 (10, 'Benedict Cumberbatch', 1014, 'Male', 'British', '1976-07-19'),
 (11, 'Tom Holland', 1008, 'Male', 'British', '1996-06-01'),
 (12, 'Elizabeth Olsen', 1022, 'Female', 'American', '1989-02-16'),
 (13, 'Anthony Mackie', 1017, 'Male', 'American', '1978-09-23'),
 (14, 'Chadwick Boseman', 1008, 'Male', 'American', '1976-11-29'),
 (15, 'Chris Pratt', 1017, 'Male', 'American', '1979-06-21'),
 (16, 'Zoe Saldana', 1004, 'Female', 'American', '1978-06-19'),
 (17, 'Karen Gillan', 1008, 'Female', 'British', '1987-11-28'),
 (18, 'Dave Bautista', 1013, 'Male', 'American', '1969-01-18'),
 (19, 'Vin Diesel', 1013, 'Male', 'American', '1967-07-18'),
 (20, 'Bradley Cooper', 1006, 'Male', 'American', '1975-01-05'),
 (21, 'Brie Larson', 1008, 'Female', 'American', '1989-10-01'),
 (22, 'Michael B. Jordan', 1022, 'Male', 'American', '1987-02-09'),
 (23, 'Tessa Thompson', 1002, 'Female', 'American', '1983-10-03'),
 (24, 'Samuel L. Jackson', 1012, 'Male', 'American', '1948-12-21'),
 (25, 'Gwyneth Paltrow', 1013, 'Female', 'American', '1972-09-27'),
 (26, 'Don Cheadle', 1014, 'Male', 'American', '1964-11-29'),
 (27, 'Paul Bettany', 1021, 'Male', 'British', '1971-05-27'),
 (28, 'Sebastian Stan', 1007, 'Male', 'Romanian', '1982-08-13'),
 (29, 'Josh Brolin', 1019, 'Male', 'American', '1968-02-12'),
 (30, 'Karen Gillan', 1019, 'Female', 'British', '1987-11-28'),
 (31, 'Pom Klementieff', 1002, 'Female', 'French', '1986-05-03'),
 (32, 'Evangeline Lilly', 1020, 'Female', 'Canadian', '1979-08-03'),
 (33, 'Tilda Swinton', 1019, 'Female', 'British', '1960-11-05'),
 (34, 'Jeff Goldblum', 1021, 'Male', 'American', '1952-10-22'),
 (35, 'Jon Favreau', 1018, 'Male', 'American', '1966-10-19'),
 (36, 'Ben Kingsley', 1013, 'Male', 'British', '1943-12-31'),
 (37, 'Cobie Smulders', 1022, 'Female', 'Canadian', '1982-04-03'),
 (38, 'Hayley Atwell', 1003, 'Female', 'British', '1982-04-05'),
 (39, 'Natalie Portman', 1008, 'Female', 'Israeli', '1981-06-09'),
 (40, 'Rachel McAdams', 1004, 'Female', 'Canadian', '1978-11-17'),
 (41, 'Benedict Wong', 1011, 'Male', 'British', '1971-07-03'),
 (42, 'Letitia Wright', 1004, 'Female', 'British', '1993-10-31'),
 (43, 'Winston Duke', 1018, 'Male', 'Tobagonian', '1986-11-15'),
 (44, 'Danai Gurira', 1016, 'Female', 'American', '1978-02-14'),
 (45, 'Lee Pace', 1002, 'Male', 'American', '1979-03-25'),
 (46, 'Djimon Hounsou', 1014, 'Male', 'Beninese', '1964-04-24'),
 (47, 'Clark Gregg', 1006, 'Male', 'American', '1962-04-02'),
 (48, 'Kat Dennings', 1018, 'Female', 'American', '1986-06-13'),
 (49, 'Stellan Skarsgård', 1014, 'Male', 'Swedish', '1951-06-13'),
 (50, 'Idris Elba', 1015, 'Male', 'British', '1972-09-06')]
cur.executemany("INSERT OR IGNORE INTO Actors VALUES(?,?,?,?,?,?)", actor_data) # Define the SQL statement with placeholders for multiple rows

conn.commit() # Commit the changes to the database
# Actors table values
res = cur.execute("SELECT * FROM Actors")
res.fetchall()
[(1, 'Robert Downey Jr.', 1001, 'Male', 'American', '1965-04-04'),
 (2, 'Chris Evans', 1005, 'Male', 'American', '1981-06-13'),
 (3, 'Mark Ruffalo', 1006, 'Male', 'American', '1967-11-22'),
 (4, 'Chris Hemsworth', 1004, 'Male', 'Australian', '1983-08-11'),
 (5, 'Scarlett Johansson', 1006, 'Female', 'American', '1984-11-22'),
 (6, 'Jeremy Renner', 1006, 'Male', 'American', '1971-01-07'),
 (7, 'Tom Hiddleston', 1004, 'Male', 'British', '1981-02-09'),
 (8, 'Clark Gregg', 1001, 'Male', 'American', '1962-04-02'),
 (9, 'Paul Rudd', 1012, 'Male', 'American', '1969-04-06'),
 (10, 'Benedict Cumberbatch', 1014, 'Male', 'British', '1976-07-19'),
 (11, 'Tom Holland', 1008, 'Male', 'British', '1996-06-01'),
 (12, 'Elizabeth Olsen', 1022, 'Female', 'American', '1989-02-16'),
 (13, 'Anthony Mackie', 1017, 'Male', 'American', '1978-09-23'),
 (14, 'Chadwick Boseman', 1008, 'Male', 'American', '1976-11-29'),
 (15, 'Chris Pratt', 1017, 'Male', 'American', '1979-06-21'),
 (16, 'Zoe Saldana', 1004, 'Female', 'American', '1978-06-19'),
 (17, 'Karen Gillan', 1008, 'Female', 'British', '1987-11-28'),
 (18, 'Dave Bautista', 1013, 'Male', 'American', '1969-01-18'),
 (19, 'Vin Diesel', 1013, 'Male', 'American', '1967-07-18'),
 (20, 'Bradley Cooper', 1006, 'Male', 'American', '1975-01-05'),
 (21, 'Brie Larson', 1008, 'Female', 'American', '1989-10-01'),
 (22, 'Michael B. Jordan', 1022, 'Male', 'American', '1987-02-09'),
 (23, 'Tessa Thompson', 1002, 'Female', 'American', '1983-10-03'),
 (24, 'Samuel L. Jackson', 1012, 'Male', 'American', '1948-12-21'),
 (25, 'Gwyneth Paltrow', 1013, 'Female', 'American', '1972-09-27'),
 (26, 'Don Cheadle', 1014, 'Male', 'American', '1964-11-29'),
 (27, 'Paul Bettany', 1021, 'Male', 'British', '1971-05-27'),
 (28, 'Sebastian Stan', 1007, 'Male', 'Romanian', '1982-08-13'),
 (29, 'Josh Brolin', 1019, 'Male', 'American', '1968-02-12'),
 (30, 'Karen Gillan', 1019, 'Female', 'British', '1987-11-28'),
 (31, 'Pom Klementieff', 1002, 'Female', 'French', '1986-05-03'),
 (32, 'Evangeline Lilly', 1020, 'Female', 'Canadian', '1979-08-03'),
 (33, 'Tilda Swinton', 1019, 'Female', 'British', '1960-11-05'),
 (34, 'Jeff Goldblum', 1021, 'Male', 'American', '1952-10-22'),
 (35, 'Jon Favreau', 1018, 'Male', 'American', '1966-10-19'),
 (36, 'Ben Kingsley', 1013, 'Male', 'British', '1943-12-31'),
 (37, 'Cobie Smulders', 1022, 'Female', 'Canadian', '1982-04-03'),
 (38, 'Hayley Atwell', 1003, 'Female', 'British', '1982-04-05'),
 (39, 'Natalie Portman', 1008, 'Female', 'Israeli', '1981-06-09'),
 (40, 'Rachel McAdams', 1004, 'Female', 'Canadian', '1978-11-17'),
 (41, 'Benedict Wong', 1011, 'Male', 'British', '1971-07-03'),
 (42, 'Letitia Wright', 1004, 'Female', 'British', '1993-10-31'),
 (43, 'Winston Duke', 1018, 'Male', 'Tobagonian', '1986-11-15'),
 (44, 'Danai Gurira', 1016, 'Female', 'American', '1978-02-14'),
 (45, 'Lee Pace', 1002, 'Male', 'American', '1979-03-25'),
 (46, 'Djimon Hounsou', 1014, 'Male', 'Beninese', '1964-04-24'),
 (47, 'Clark Gregg', 1006, 'Male', 'American', '1962-04-02'),
 (48, 'Kat Dennings', 1018, 'Female', 'American', '1986-06-13'),
 (49, 'Stellan Skarsgård', 1014, 'Male', 'Swedish', '1951-06-13'),
 (50, 'Idris Elba', 1015, 'Male', 'British', '1972-09-06')]
# Check the foreign keys in Actors
res = cur.execute("PRAGMA foreign_key_list(Actors)")
foreign_key_info = res.fetchall()

for fk in foreign_key_info:
    print(f"Foreign Key ID: {fk[0]}")
    print(f"Foreign Key Sequence: {fk[1]}")
    print(f"Referenced Table: {fk[2]}")
    print(f"Foreign Key Column (in Actors): {fk[3]}")
    print(f"Referenced Column (in MCU_Movies): {fk[4]}")
    print(f"ON DELETE Action: {fk[5]} (NO ACTION means no change to Actors when MCU_Movies entry is deleted)")
    print(f"ON UPDATE Action: {fk[6]} (NO ACTION means no change to Actors when MCU_Movies entry is updated)")
    print(f"Constraint Type: {fk[7]} (NONE means no enforcement on the foreign key constraint)")
Foreign Key ID: 0
Foreign Key Sequence: 0
Referenced Table: MCU_Movies
Foreign Key Column (in Actors): movie_id
Referenced Column (in MCU_Movies): movie_id
ON DELETE Action: NO ACTION (NO ACTION means no change to Actors when MCU_Movies entry is deleted)
ON UPDATE Action: NO ACTION (NO ACTION means no change to Actors when MCU_Movies entry is updated)
Constraint Type: NONE (NONE means no enforcement on the foreign key constraint)

cur.description is a useful attribute for accessing metadata about query results, helping you to dynamically process and handle database query results based on column information.

# id: The ID of the foreign key constraint.
# seq: The sequence number of the column within the foreign key constraint.
# table: The name of the table that the foreign key references.
# from: The name of the column in the current table (in this case, Actors) that is part of the foreign key.
# to: The name of the column in the referenced table that the foreign key points to.
# on_update: The action that occurs if the referenced column is updated.
# on_delete: The action that occurs if the referenced column is deleted.
# match: The matching algorithm used for enforcing the foreign key constraint.
cur.description
(('id', None, None, None, None, None, None),
 ('seq', None, None, None, None, None, None),
 ('table', None, None, None, None, None, None),
 ('from', None, None, None, None, None, None),
 ('to', None, None, None, None, None, None),
 ('on_update', None, None, None, None, None, None),
 ('on_delete', None, None, None, None, None, None),
 ('match', None, None, None, None, None, None))
cur.close()
conn.close()

1.4.3 JOIN#

The JOIN in SQL is used to combine rows from two or more tables based on a related column between them.

# create connection with db
conn = sqlite3.connect('MCU.db')
# create cur
cur = conn.cursor()
# load movie table as df_m
import pandas as pd
df_m = pd.read_sql_query('SELECT * FROM MCU_movies', conn)
df_m.head()
movie_id title year date time score
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9
1 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0
3 1004 Thor 2011 2011-01-01 07:47:00 7.0
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9
# load actor table as df_a
df_a = pd.read_sql_query('SELECT * FROM Actors', conn)
df_a.head()
actor_id actor_name movie_id gender nationality date_birth
0 1 Robert Downey Jr. 1001 Male American 1965-04-04
1 2 Chris Evans 1005 Male American 1981-06-13
2 3 Mark Ruffalo 1006 Male American 1967-11-22
3 4 Chris Hemsworth 1004 Male Australian 1983-08-11
4 5 Scarlett Johansson 1006 Female American 1984-11-22

INNER JOIN#

res = cur.execute('''
    SELECT  *
    FROM MCU_movies
    INNER JOIN Actors
    ON MCU_movies.movie_id = Actors.movie_id
    ORDER BY MCU_movies.movie_id
 ''')
for i in res.fetchall():
    print(i)
(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9, 1, 'Robert Downey Jr.', 1001, 'Male', 'American', '1965-04-04')
(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9, 8, 'Clark Gregg', 1001, 'Male', 'American', '1962-04-02')
(1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7, 23, 'Tessa Thompson', 1002, 'Female', 'American', '1983-10-03')
(1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7, 31, 'Pom Klementieff', 1002, 'Female', 'French', '1986-05-03')
(1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7, 45, 'Lee Pace', 1002, 'Male', 'American', '1979-03-25')
(1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0, 38, 'Hayley Atwell', 1003, 'Female', 'British', '1982-04-05')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 4, 'Chris Hemsworth', 1004, 'Male', 'Australian', '1983-08-11')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 7, 'Tom Hiddleston', 1004, 'Male', 'British', '1981-02-09')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 16, 'Zoe Saldana', 1004, 'Female', 'American', '1978-06-19')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 40, 'Rachel McAdams', 1004, 'Female', 'Canadian', '1978-11-17')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 42, 'Letitia Wright', 1004, 'Female', 'British', '1993-10-31')
(1005, 'Captain America: The First Avenger', 2011, '2011-03-15', '06:33:00', 6.9, 2, 'Chris Evans', 1005, 'Male', 'American', '1981-06-13')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 3, 'Mark Ruffalo', 1006, 'Male', 'American', '1967-11-22')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 5, 'Scarlett Johansson', 1006, 'Female', 'American', '1984-11-22')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 6, 'Jeremy Renner', 1006, 'Male', 'American', '1971-01-07')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 20, 'Bradley Cooper', 1006, 'Male', 'American', '1975-01-05')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 47, 'Clark Gregg', 1006, 'Male', 'American', '1962-04-02')
(1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2, 28, 'Sebastian Stan', 1007, 'Male', 'Romanian', '1982-08-13')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 11, 'Tom Holland', 1008, 'Male', 'British', '1996-06-01')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 14, 'Chadwick Boseman', 1008, 'Male', 'American', '1976-11-29')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 17, 'Karen Gillan', 1008, 'Female', 'British', '1987-11-28')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 21, 'Brie Larson', 1008, 'Female', 'American', '1989-10-01')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 39, 'Natalie Portman', 1008, 'Female', 'Israeli', '1981-06-09')
(1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3, 41, 'Benedict Wong', 1011, 'Male', 'British', '1971-07-03')
(1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3, 9, 'Paul Rudd', 1012, 'Male', 'American', '1969-04-06')
(1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3, 24, 'Samuel L. Jackson', 1012, 'Male', 'American', '1948-12-21')
(1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8, 18, 'Dave Bautista', 1013, 'Male', 'American', '1969-01-18')
(1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8, 19, 'Vin Diesel', 1013, 'Male', 'American', '1967-07-18')
(1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8, 25, 'Gwyneth Paltrow', 1013, 'Female', 'American', '1972-09-27')
(1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8, 36, 'Ben Kingsley', 1013, 'Male', 'British', '1943-12-31')
(1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5, 10, 'Benedict Cumberbatch', 1014, 'Male', 'British', '1976-07-19')
(1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5, 26, 'Don Cheadle', 1014, 'Male', 'American', '1964-11-29')
(1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5, 46, 'Djimon Hounsou', 1014, 'Male', 'Beninese', '1964-04-24')
(1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5, 49, 'Stellan Skarsgård', 1014, 'Male', 'Swedish', '1951-06-13')
(1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6, 50, 'Idris Elba', 1015, 'Male', 'British', '1972-09-06')
(1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4, 44, 'Danai Gurira', 1016, 'Female', 'American', '1978-02-14')
(1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9, 13, 'Anthony Mackie', 1017, 'Male', 'American', '1978-09-23')
(1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9, 15, 'Chris Pratt', 1017, 'Male', 'American', '1979-06-21')
(1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3, 35, 'Jon Favreau', 1018, 'Male', 'American', '1966-10-19')
(1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3, 43, 'Winston Duke', 1018, 'Male', 'Tobagonian', '1986-11-15')
(1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3, 48, 'Kat Dennings', 1018, 'Female', 'American', '1986-06-13')
(1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4, 29, 'Josh Brolin', 1019, 'Male', 'American', '1968-02-12')
(1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4, 30, 'Karen Gillan', 1019, 'Female', 'British', '1987-11-28')
(1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4, 33, 'Tilda Swinton', 1019, 'Female', 'British', '1960-11-05')
(1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1, 32, 'Evangeline Lilly', 1020, 'Female', 'Canadian', '1979-08-03')
(1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9, 27, 'Paul Bettany', 1021, 'Male', 'British', '1971-05-27')
(1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9, 34, 'Jeff Goldblum', 1021, 'Male', 'American', '1952-10-22')
(1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4, 12, 'Elizabeth Olsen', 1022, 'Female', 'American', '1989-02-16')
(1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4, 22, 'Michael B. Jordan', 1022, 'Male', 'American', '1987-02-09')
(1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4, 37, 'Cobie Smulders', 1022, 'Female', 'Canadian', '1982-04-03')

.join() function

df_m.set_index('movie_id').join(df_a.set_index('movie_id'), on='movie_id', how='inner', validate='m:m')
title year date time score actor_id actor_name gender nationality date_birth
movie_id
1001 Iron Man 2008 2008-01-01 15:22:00 7.9 1 Robert Downey Jr. Male American 1965-04-04
1001 Iron Man 2008 2008-01-01 15:22:00 7.9 8 Clark Gregg Male American 1962-04-02
1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 23 Tessa Thompson Female American 1983-10-03
1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 31 Pom Klementieff Female French 1986-05-03
1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 45 Lee Pace Male American 1979-03-25
1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 38 Hayley Atwell Female British 1982-04-05
1004 Thor 2011 2011-01-01 07:47:00 7.0 4 Chris Hemsworth Male Australian 1983-08-11
1004 Thor 2011 2011-01-01 07:47:00 7.0 7 Tom Hiddleston Male British 1981-02-09
1004 Thor 2011 2011-01-01 07:47:00 7.0 16 Zoe Saldana Female American 1978-06-19
1004 Thor 2011 2011-01-01 07:47:00 7.0 40 Rachel McAdams Female Canadian 1978-11-17
1004 Thor 2011 2011-01-01 07:47:00 7.0 42 Letitia Wright Female British 1993-10-31
1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9 2 Chris Evans Male American 1981-06-13
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 3 Mark Ruffalo Male American 1967-11-22
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 5 Scarlett Johansson Female American 1984-11-22
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 6 Jeremy Renner Male American 1971-01-07
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 20 Bradley Cooper Male American 1975-01-05
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 47 Clark Gregg Male American 1962-04-02
1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2 28 Sebastian Stan Male Romanian 1982-08-13
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 11 Tom Holland Male British 1996-06-01
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 14 Chadwick Boseman Male American 1976-11-29
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 17 Karen Gillan Female British 1987-11-28
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 21 Brie Larson Female American 1989-10-01
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 39 Natalie Portman Female Israeli 1981-06-09
1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3 41 Benedict Wong Male British 1971-07-03
1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 9 Paul Rudd Male American 1969-04-06
1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 24 Samuel L. Jackson Male American 1948-12-21
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 18 Dave Bautista Male American 1969-01-18
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 19 Vin Diesel Male American 1967-07-18
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 25 Gwyneth Paltrow Female American 1972-09-27
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 36 Ben Kingsley Male British 1943-12-31
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 10 Benedict Cumberbatch Male British 1976-07-19
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 26 Don Cheadle Male American 1964-11-29
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 46 Djimon Hounsou Male Beninese 1964-04-24
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 49 Stellan Skarsgård Male Swedish 1951-06-13
1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6 50 Idris Elba Male British 1972-09-06
1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4 44 Danai Gurira Female American 1978-02-14
1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 13 Anthony Mackie Male American 1978-09-23
1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 15 Chris Pratt Male American 1979-06-21
1018 Black Panther 2018 2018-04-23 03:05:00 7.3 35 Jon Favreau Male American 1966-10-19
1018 Black Panther 2018 2018-04-23 03:05:00 7.3 43 Winston Duke Male Tobagonian 1986-11-15
1018 Black Panther 2018 2018-04-23 03:05:00 7.3 48 Kat Dennings Female American 1986-06-13
1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 29 Josh Brolin Male American 1968-02-12
1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 30 Karen Gillan Female British 1987-11-28
1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 33 Tilda Swinton Female British 1960-11-05
1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1 32 Evangeline Lilly Female Canadian 1979-08-03
1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 27 Paul Bettany Male British 1971-05-27
1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 34 Jeff Goldblum Male American 1952-10-22
1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 12 Elizabeth Olsen Female American 1989-02-16
1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 22 Michael B. Jordan Male American 1987-02-09
1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 37 Cobie Smulders Female Canadian 1982-04-03

pd.merge() function

pd.merge(df_m, df_a, left_on='movie_id', right_on='movie_id', how='inner')
movie_id title year date time score actor_id actor_name gender nationality date_birth
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 1 Robert Downey Jr. Male American 1965-04-04
1 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 8 Clark Gregg Male American 1962-04-02
2 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 23 Tessa Thompson Female American 1983-10-03
3 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 31 Pom Klementieff Female French 1986-05-03
4 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 45 Lee Pace Male American 1979-03-25
5 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 38 Hayley Atwell Female British 1982-04-05
6 1004 Thor 2011 2011-01-01 07:47:00 7.0 4 Chris Hemsworth Male Australian 1983-08-11
7 1004 Thor 2011 2011-01-01 07:47:00 7.0 7 Tom Hiddleston Male British 1981-02-09
8 1004 Thor 2011 2011-01-01 07:47:00 7.0 16 Zoe Saldana Female American 1978-06-19
9 1004 Thor 2011 2011-01-01 07:47:00 7.0 40 Rachel McAdams Female Canadian 1978-11-17
10 1004 Thor 2011 2011-01-01 07:47:00 7.0 42 Letitia Wright Female British 1993-10-31
11 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9 2 Chris Evans Male American 1981-06-13
12 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 3 Mark Ruffalo Male American 1967-11-22
13 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 5 Scarlett Johansson Female American 1984-11-22
14 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 6 Jeremy Renner Male American 1971-01-07
15 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 20 Bradley Cooper Male American 1975-01-05
16 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 47 Clark Gregg Male American 1962-04-02
17 1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2 28 Sebastian Stan Male Romanian 1982-08-13
18 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 11 Tom Holland Male British 1996-06-01
19 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 14 Chadwick Boseman Male American 1976-11-29
20 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 17 Karen Gillan Female British 1987-11-28
21 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 21 Brie Larson Female American 1989-10-01
22 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 39 Natalie Portman Female Israeli 1981-06-09
23 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3 41 Benedict Wong Male British 1971-07-03
24 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 9 Paul Rudd Male American 1969-04-06
25 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 24 Samuel L. Jackson Male American 1948-12-21
26 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 18 Dave Bautista Male American 1969-01-18
27 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 19 Vin Diesel Male American 1967-07-18
28 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 25 Gwyneth Paltrow Female American 1972-09-27
29 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 36 Ben Kingsley Male British 1943-12-31
30 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 10 Benedict Cumberbatch Male British 1976-07-19
31 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 26 Don Cheadle Male American 1964-11-29
32 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 46 Djimon Hounsou Male Beninese 1964-04-24
33 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 49 Stellan Skarsgård Male Swedish 1951-06-13
34 1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6 50 Idris Elba Male British 1972-09-06
35 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4 44 Danai Gurira Female American 1978-02-14
36 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 13 Anthony Mackie Male American 1978-09-23
37 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 15 Chris Pratt Male American 1979-06-21
38 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 35 Jon Favreau Male American 1966-10-19
39 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 43 Winston Duke Male Tobagonian 1986-11-15
40 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 48 Kat Dennings Female American 1986-06-13
41 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 29 Josh Brolin Male American 1968-02-12
42 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 30 Karen Gillan Female British 1987-11-28
43 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 33 Tilda Swinton Female British 1960-11-05
44 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1 32 Evangeline Lilly Female Canadian 1979-08-03
45 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 27 Paul Bettany Male British 1971-05-27
46 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 34 Jeff Goldblum Male American 1952-10-22
47 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 12 Elizabeth Olsen Female American 1989-02-16
48 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 22 Michael B. Jordan Male American 1987-02-09
49 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 37 Cobie Smulders Female Canadian 1982-04-03

LEFT JOIN#

res = cur.execute('''
    SELECT  *
    FROM MCU_movies
    LEFT JOIN Actors
    ON MCU_movies.movie_id = Actors.movie_id
    ORDER BY MCU_movies.movie_id
 ''')
for i in res.fetchall():
    print(i)
(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9, 8, 'Clark Gregg', 1001, 'Male', 'American', '1962-04-02')
(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9, 1, 'Robert Downey Jr.', 1001, 'Male', 'American', '1965-04-04')
(1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7, 45, 'Lee Pace', 1002, 'Male', 'American', '1979-03-25')
(1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7, 31, 'Pom Klementieff', 1002, 'Female', 'French', '1986-05-03')
(1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7, 23, 'Tessa Thompson', 1002, 'Female', 'American', '1983-10-03')
(1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0, 38, 'Hayley Atwell', 1003, 'Female', 'British', '1982-04-05')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 4, 'Chris Hemsworth', 1004, 'Male', 'Australian', '1983-08-11')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 42, 'Letitia Wright', 1004, 'Female', 'British', '1993-10-31')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 40, 'Rachel McAdams', 1004, 'Female', 'Canadian', '1978-11-17')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 7, 'Tom Hiddleston', 1004, 'Male', 'British', '1981-02-09')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 16, 'Zoe Saldana', 1004, 'Female', 'American', '1978-06-19')
(1005, 'Captain America: The First Avenger', 2011, '2011-03-15', '06:33:00', 6.9, 2, 'Chris Evans', 1005, 'Male', 'American', '1981-06-13')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 20, 'Bradley Cooper', 1006, 'Male', 'American', '1975-01-05')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 47, 'Clark Gregg', 1006, 'Male', 'American', '1962-04-02')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 6, 'Jeremy Renner', 1006, 'Male', 'American', '1971-01-07')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 3, 'Mark Ruffalo', 1006, 'Male', 'American', '1967-11-22')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 5, 'Scarlett Johansson', 1006, 'Female', 'American', '1984-11-22')
(1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2, 28, 'Sebastian Stan', 1007, 'Male', 'Romanian', '1982-08-13')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 21, 'Brie Larson', 1008, 'Female', 'American', '1989-10-01')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 14, 'Chadwick Boseman', 1008, 'Male', 'American', '1976-11-29')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 17, 'Karen Gillan', 1008, 'Female', 'British', '1987-11-28')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 39, 'Natalie Portman', 1008, 'Female', 'Israeli', '1981-06-09')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 11, 'Tom Holland', 1008, 'Male', 'British', '1996-06-01')
(1009, 'Captain America: The Winter Soldier', 2014, '2014-11-12', '13:56:00', 7.7, None, None, None, None, None, None)
(1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0, None, None, None, None, None, None)
(1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3, 41, 'Benedict Wong', 1011, 'Male', 'British', '1971-07-03')
(1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3, 9, 'Paul Rudd', 1012, 'Male', 'American', '1969-04-06')
(1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3, 24, 'Samuel L. Jackson', 1012, 'Male', 'American', '1948-12-21')
(1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8, 36, 'Ben Kingsley', 1013, 'Male', 'British', '1943-12-31')
(1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8, 18, 'Dave Bautista', 1013, 'Male', 'American', '1969-01-18')
(1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8, 25, 'Gwyneth Paltrow', 1013, 'Female', 'American', '1972-09-27')
(1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8, 19, 'Vin Diesel', 1013, 'Male', 'American', '1967-07-18')
(1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5, 10, 'Benedict Cumberbatch', 1014, 'Male', 'British', '1976-07-19')
(1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5, 46, 'Djimon Hounsou', 1014, 'Male', 'Beninese', '1964-04-24')
(1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5, 26, 'Don Cheadle', 1014, 'Male', 'American', '1964-11-29')
(1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5, 49, 'Stellan Skarsgård', 1014, 'Male', 'Swedish', '1951-06-13')
(1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6, 50, 'Idris Elba', 1015, 'Male', 'British', '1972-09-06')
(1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4, 44, 'Danai Gurira', 1016, 'Female', 'American', '1978-02-14')
(1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9, 13, 'Anthony Mackie', 1017, 'Male', 'American', '1978-09-23')
(1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9, 15, 'Chris Pratt', 1017, 'Male', 'American', '1979-06-21')
(1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3, 35, 'Jon Favreau', 1018, 'Male', 'American', '1966-10-19')
(1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3, 48, 'Kat Dennings', 1018, 'Female', 'American', '1986-06-13')
(1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3, 43, 'Winston Duke', 1018, 'Male', 'Tobagonian', '1986-11-15')
(1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4, 29, 'Josh Brolin', 1019, 'Male', 'American', '1968-02-12')
(1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4, 30, 'Karen Gillan', 1019, 'Female', 'British', '1987-11-28')
(1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4, 33, 'Tilda Swinton', 1019, 'Female', 'British', '1960-11-05')
(1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1, 32, 'Evangeline Lilly', 1020, 'Female', 'Canadian', '1979-08-03')
(1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9, 34, 'Jeff Goldblum', 1021, 'Male', 'American', '1952-10-22')
(1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9, 27, 'Paul Bettany', 1021, 'Male', 'British', '1971-05-27')
(1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4, 37, 'Cobie Smulders', 1022, 'Female', 'Canadian', '1982-04-03')
(1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4, 12, 'Elizabeth Olsen', 1022, 'Female', 'American', '1989-02-16')
(1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4, 22, 'Michael B. Jordan', 1022, 'Male', 'American', '1987-02-09')
df_m.set_index('movie_id').join(df_a.set_index('movie_id'), on='movie_id', how='left', validate='m:m')
title year date time score actor_id actor_name gender nationality date_birth
movie_id
1001 Iron Man 2008 2008-01-01 15:22:00 7.9 1.0 Robert Downey Jr. Male American 1965-04-04
1001 Iron Man 2008 2008-01-01 15:22:00 7.9 8.0 Clark Gregg Male American 1962-04-02
1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 23.0 Tessa Thompson Female American 1983-10-03
1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 31.0 Pom Klementieff Female French 1986-05-03
1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 45.0 Lee Pace Male American 1979-03-25
1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 38.0 Hayley Atwell Female British 1982-04-05
1004 Thor 2011 2011-01-01 07:47:00 7.0 4.0 Chris Hemsworth Male Australian 1983-08-11
1004 Thor 2011 2011-01-01 07:47:00 7.0 7.0 Tom Hiddleston Male British 1981-02-09
1004 Thor 2011 2011-01-01 07:47:00 7.0 16.0 Zoe Saldana Female American 1978-06-19
1004 Thor 2011 2011-01-01 07:47:00 7.0 40.0 Rachel McAdams Female Canadian 1978-11-17
1004 Thor 2011 2011-01-01 07:47:00 7.0 42.0 Letitia Wright Female British 1993-10-31
1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9 2.0 Chris Evans Male American 1981-06-13
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 3.0 Mark Ruffalo Male American 1967-11-22
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 5.0 Scarlett Johansson Female American 1984-11-22
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 6.0 Jeremy Renner Male American 1971-01-07
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 20.0 Bradley Cooper Male American 1975-01-05
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 47.0 Clark Gregg Male American 1962-04-02
1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2 28.0 Sebastian Stan Male Romanian 1982-08-13
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 11.0 Tom Holland Male British 1996-06-01
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 14.0 Chadwick Boseman Male American 1976-11-29
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 17.0 Karen Gillan Female British 1987-11-28
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 21.0 Brie Larson Female American 1989-10-01
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 39.0 Natalie Portman Female Israeli 1981-06-09
1009 Captain America: The Winter Soldier 2014 2014-11-12 13:56:00 7.7 NaN NaN NaN NaN NaN
1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0 NaN NaN NaN NaN NaN
1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3 41.0 Benedict Wong Male British 1971-07-03
1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 9.0 Paul Rudd Male American 1969-04-06
1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 24.0 Samuel L. Jackson Male American 1948-12-21
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 18.0 Dave Bautista Male American 1969-01-18
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 19.0 Vin Diesel Male American 1967-07-18
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 25.0 Gwyneth Paltrow Female American 1972-09-27
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 36.0 Ben Kingsley Male British 1943-12-31
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 10.0 Benedict Cumberbatch Male British 1976-07-19
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 26.0 Don Cheadle Male American 1964-11-29
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 46.0 Djimon Hounsou Male Beninese 1964-04-24
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 49.0 Stellan Skarsgård Male Swedish 1951-06-13
1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6 50.0 Idris Elba Male British 1972-09-06
1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4 44.0 Danai Gurira Female American 1978-02-14
1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 13.0 Anthony Mackie Male American 1978-09-23
1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 15.0 Chris Pratt Male American 1979-06-21
1018 Black Panther 2018 2018-04-23 03:05:00 7.3 35.0 Jon Favreau Male American 1966-10-19
1018 Black Panther 2018 2018-04-23 03:05:00 7.3 43.0 Winston Duke Male Tobagonian 1986-11-15
1018 Black Panther 2018 2018-04-23 03:05:00 7.3 48.0 Kat Dennings Female American 1986-06-13
1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 29.0 Josh Brolin Male American 1968-02-12
1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 30.0 Karen Gillan Female British 1987-11-28
1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 33.0 Tilda Swinton Female British 1960-11-05
1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1 32.0 Evangeline Lilly Female Canadian 1979-08-03
1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 27.0 Paul Bettany Male British 1971-05-27
1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 34.0 Jeff Goldblum Male American 1952-10-22
1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 12.0 Elizabeth Olsen Female American 1989-02-16
1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 22.0 Michael B. Jordan Male American 1987-02-09
1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 37.0 Cobie Smulders Female Canadian 1982-04-03
pd.merge(df_m, df_a, left_on='movie_id', right_on='movie_id', how='left')
movie_id title year date time score actor_id actor_name gender nationality date_birth
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 1.0 Robert Downey Jr. Male American 1965-04-04
1 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 8.0 Clark Gregg Male American 1962-04-02
2 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 23.0 Tessa Thompson Female American 1983-10-03
3 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 31.0 Pom Klementieff Female French 1986-05-03
4 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 45.0 Lee Pace Male American 1979-03-25
5 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 38.0 Hayley Atwell Female British 1982-04-05
6 1004 Thor 2011 2011-01-01 07:47:00 7.0 4.0 Chris Hemsworth Male Australian 1983-08-11
7 1004 Thor 2011 2011-01-01 07:47:00 7.0 7.0 Tom Hiddleston Male British 1981-02-09
8 1004 Thor 2011 2011-01-01 07:47:00 7.0 16.0 Zoe Saldana Female American 1978-06-19
9 1004 Thor 2011 2011-01-01 07:47:00 7.0 40.0 Rachel McAdams Female Canadian 1978-11-17
10 1004 Thor 2011 2011-01-01 07:47:00 7.0 42.0 Letitia Wright Female British 1993-10-31
11 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9 2.0 Chris Evans Male American 1981-06-13
12 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 3.0 Mark Ruffalo Male American 1967-11-22
13 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 5.0 Scarlett Johansson Female American 1984-11-22
14 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 6.0 Jeremy Renner Male American 1971-01-07
15 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 20.0 Bradley Cooper Male American 1975-01-05
16 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 47.0 Clark Gregg Male American 1962-04-02
17 1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2 28.0 Sebastian Stan Male Romanian 1982-08-13
18 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 11.0 Tom Holland Male British 1996-06-01
19 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 14.0 Chadwick Boseman Male American 1976-11-29
20 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 17.0 Karen Gillan Female British 1987-11-28
21 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 21.0 Brie Larson Female American 1989-10-01
22 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 39.0 Natalie Portman Female Israeli 1981-06-09
23 1009 Captain America: The Winter Soldier 2014 2014-11-12 13:56:00 7.7 NaN NaN NaN NaN NaN
24 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0 NaN NaN NaN NaN NaN
25 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3 41.0 Benedict Wong Male British 1971-07-03
26 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 9.0 Paul Rudd Male American 1969-04-06
27 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 24.0 Samuel L. Jackson Male American 1948-12-21
28 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 18.0 Dave Bautista Male American 1969-01-18
29 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 19.0 Vin Diesel Male American 1967-07-18
30 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 25.0 Gwyneth Paltrow Female American 1972-09-27
31 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 36.0 Ben Kingsley Male British 1943-12-31
32 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 10.0 Benedict Cumberbatch Male British 1976-07-19
33 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 26.0 Don Cheadle Male American 1964-11-29
34 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 46.0 Djimon Hounsou Male Beninese 1964-04-24
35 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 49.0 Stellan Skarsgård Male Swedish 1951-06-13
36 1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6 50.0 Idris Elba Male British 1972-09-06
37 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4 44.0 Danai Gurira Female American 1978-02-14
38 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 13.0 Anthony Mackie Male American 1978-09-23
39 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 15.0 Chris Pratt Male American 1979-06-21
40 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 35.0 Jon Favreau Male American 1966-10-19
41 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 43.0 Winston Duke Male Tobagonian 1986-11-15
42 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 48.0 Kat Dennings Female American 1986-06-13
43 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 29.0 Josh Brolin Male American 1968-02-12
44 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 30.0 Karen Gillan Female British 1987-11-28
45 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 33.0 Tilda Swinton Female British 1960-11-05
46 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1 32.0 Evangeline Lilly Female Canadian 1979-08-03
47 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 27.0 Paul Bettany Male British 1971-05-27
48 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 34.0 Jeff Goldblum Male American 1952-10-22
49 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 12.0 Elizabeth Olsen Female American 1989-02-16
50 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 22.0 Michael B. Jordan Male American 1987-02-09
51 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 37.0 Cobie Smulders Female Canadian 1982-04-03

RIGHT JOIN#

You may get an error message in sqlite3: RIGHT and FULL OUTER JOINs are not currently supported.

By swapping the tables in the LEFT JOIN, you get the same result as a RIGHT JOIN.

# OperationalError: RIGHT and FULL OUTER JOINs are not currently supported
# res = cur.execute('''
#     SELECT  *
#     FROM MCU_movies
#     RIGHT JOIN Actors
#     ON MCU_movies.movie_id = Actors.movie_id
#     ORDER BY MCU_movies.movie_id
#  ''')
# for i in res.fetchall():
#     print(i)
# swapping left and ringht table
res = cur.execute('''
    SELECT  *
    FROM Actors
    LEFT JOIN MCU_movies
    ON MCU_movies.movie_id = Actors.movie_id
    ORDER BY Actors.actor_id
 ''')
for i in res.fetchall():
    print(i)
(1, 'Robert Downey Jr.', 1001, 'Male', 'American', '1965-04-04', 1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9)
(2, 'Chris Evans', 1005, 'Male', 'American', '1981-06-13', 1005, 'Captain America: The First Avenger', 2011, '2011-03-15', '06:33:00', 6.9)
(3, 'Mark Ruffalo', 1006, 'Male', 'American', '1967-11-22', 1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0)
(4, 'Chris Hemsworth', 1004, 'Male', 'Australian', '1983-08-11', 1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0)
(5, 'Scarlett Johansson', 1006, 'Female', 'American', '1984-11-22', 1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0)
(6, 'Jeremy Renner', 1006, 'Male', 'American', '1971-01-07', 1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0)
(7, 'Tom Hiddleston', 1004, 'Male', 'British', '1981-02-09', 1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0)
(8, 'Clark Gregg', 1001, 'Male', 'American', '1962-04-02', 1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9)
(9, 'Paul Rudd', 1012, 'Male', 'American', '1969-04-06', 1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3)
(10, 'Benedict Cumberbatch', 1014, 'Male', 'British', '1976-07-19', 1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5)
(11, 'Tom Holland', 1008, 'Male', 'British', '1996-06-01', 1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9)
(12, 'Elizabeth Olsen', 1022, 'Female', 'American', '1989-02-16', 1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)
(13, 'Anthony Mackie', 1017, 'Male', 'American', '1978-09-23', 1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9)
(14, 'Chadwick Boseman', 1008, 'Male', 'American', '1976-11-29', 1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9)
(15, 'Chris Pratt', 1017, 'Male', 'American', '1979-06-21', 1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9)
(16, 'Zoe Saldana', 1004, 'Female', 'American', '1978-06-19', 1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0)
(17, 'Karen Gillan', 1008, 'Female', 'British', '1987-11-28', 1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9)
(18, 'Dave Bautista', 1013, 'Male', 'American', '1969-01-18', 1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8)
(19, 'Vin Diesel', 1013, 'Male', 'American', '1967-07-18', 1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8)
(20, 'Bradley Cooper', 1006, 'Male', 'American', '1975-01-05', 1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0)
(21, 'Brie Larson', 1008, 'Female', 'American', '1989-10-01', 1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9)
(22, 'Michael B. Jordan', 1022, 'Male', 'American', '1987-02-09', 1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)
(23, 'Tessa Thompson', 1002, 'Female', 'American', '1983-10-03', 1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7)
(24, 'Samuel L. Jackson', 1012, 'Male', 'American', '1948-12-21', 1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3)
(25, 'Gwyneth Paltrow', 1013, 'Female', 'American', '1972-09-27', 1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8)
(26, 'Don Cheadle', 1014, 'Male', 'American', '1964-11-29', 1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5)
(27, 'Paul Bettany', 1021, 'Male', 'British', '1971-05-27', 1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9)
(28, 'Sebastian Stan', 1007, 'Male', 'Romanian', '1982-08-13', 1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2)
(29, 'Josh Brolin', 1019, 'Male', 'American', '1968-02-12', 1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4)
(30, 'Karen Gillan', 1019, 'Female', 'British', '1987-11-28', 1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4)
(31, 'Pom Klementieff', 1002, 'Female', 'French', '1986-05-03', 1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7)
(32, 'Evangeline Lilly', 1020, 'Female', 'Canadian', '1979-08-03', 1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1)
(33, 'Tilda Swinton', 1019, 'Female', 'British', '1960-11-05', 1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4)
(34, 'Jeff Goldblum', 1021, 'Male', 'American', '1952-10-22', 1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9)
(35, 'Jon Favreau', 1018, 'Male', 'American', '1966-10-19', 1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3)
(36, 'Ben Kingsley', 1013, 'Male', 'British', '1943-12-31', 1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8)
(37, 'Cobie Smulders', 1022, 'Female', 'Canadian', '1982-04-03', 1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4)
(38, 'Hayley Atwell', 1003, 'Female', 'British', '1982-04-05', 1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0)
(39, 'Natalie Portman', 1008, 'Female', 'Israeli', '1981-06-09', 1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9)
(40, 'Rachel McAdams', 1004, 'Female', 'Canadian', '1978-11-17', 1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0)
(41, 'Benedict Wong', 1011, 'Male', 'British', '1971-07-03', 1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3)
(42, 'Letitia Wright', 1004, 'Female', 'British', '1993-10-31', 1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0)
(43, 'Winston Duke', 1018, 'Male', 'Tobagonian', '1986-11-15', 1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3)
(44, 'Danai Gurira', 1016, 'Female', 'American', '1978-02-14', 1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4)
(45, 'Lee Pace', 1002, 'Male', 'American', '1979-03-25', 1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7)
(46, 'Djimon Hounsou', 1014, 'Male', 'Beninese', '1964-04-24', 1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5)
(47, 'Clark Gregg', 1006, 'Male', 'American', '1962-04-02', 1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0)
(48, 'Kat Dennings', 1018, 'Female', 'American', '1986-06-13', 1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3)
(49, 'Stellan Skarsgård', 1014, 'Male', 'Swedish', '1951-06-13', 1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5)
(50, 'Idris Elba', 1015, 'Male', 'British', '1972-09-06', 1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6)
df_m.set_index('movie_id').join(df_a.set_index('movie_id'), on='movie_id', how='right', validate='m:m')
title year date time score actor_id actor_name gender nationality date_birth
movie_id
1001 Iron Man 2008 2008-01-01 15:22:00 7.9 1 Robert Downey Jr. Male American 1965-04-04
1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9 2 Chris Evans Male American 1981-06-13
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 3 Mark Ruffalo Male American 1967-11-22
1004 Thor 2011 2011-01-01 07:47:00 7.0 4 Chris Hemsworth Male Australian 1983-08-11
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 5 Scarlett Johansson Female American 1984-11-22
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 6 Jeremy Renner Male American 1971-01-07
1004 Thor 2011 2011-01-01 07:47:00 7.0 7 Tom Hiddleston Male British 1981-02-09
1001 Iron Man 2008 2008-01-01 15:22:00 7.9 8 Clark Gregg Male American 1962-04-02
1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 9 Paul Rudd Male American 1969-04-06
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 10 Benedict Cumberbatch Male British 1976-07-19
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 11 Tom Holland Male British 1996-06-01
1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 12 Elizabeth Olsen Female American 1989-02-16
1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 13 Anthony Mackie Male American 1978-09-23
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 14 Chadwick Boseman Male American 1976-11-29
1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 15 Chris Pratt Male American 1979-06-21
1004 Thor 2011 2011-01-01 07:47:00 7.0 16 Zoe Saldana Female American 1978-06-19
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 17 Karen Gillan Female British 1987-11-28
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 18 Dave Bautista Male American 1969-01-18
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 19 Vin Diesel Male American 1967-07-18
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 20 Bradley Cooper Male American 1975-01-05
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 21 Brie Larson Female American 1989-10-01
1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 22 Michael B. Jordan Male American 1987-02-09
1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 23 Tessa Thompson Female American 1983-10-03
1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 24 Samuel L. Jackson Male American 1948-12-21
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 25 Gwyneth Paltrow Female American 1972-09-27
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 26 Don Cheadle Male American 1964-11-29
1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 27 Paul Bettany Male British 1971-05-27
1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2 28 Sebastian Stan Male Romanian 1982-08-13
1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 29 Josh Brolin Male American 1968-02-12
1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 30 Karen Gillan Female British 1987-11-28
1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 31 Pom Klementieff Female French 1986-05-03
1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1 32 Evangeline Lilly Female Canadian 1979-08-03
1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 33 Tilda Swinton Female British 1960-11-05
1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 34 Jeff Goldblum Male American 1952-10-22
1018 Black Panther 2018 2018-04-23 03:05:00 7.3 35 Jon Favreau Male American 1966-10-19
1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 36 Ben Kingsley Male British 1943-12-31
1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 37 Cobie Smulders Female Canadian 1982-04-03
1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 38 Hayley Atwell Female British 1982-04-05
1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 39 Natalie Portman Female Israeli 1981-06-09
1004 Thor 2011 2011-01-01 07:47:00 7.0 40 Rachel McAdams Female Canadian 1978-11-17
1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3 41 Benedict Wong Male British 1971-07-03
1004 Thor 2011 2011-01-01 07:47:00 7.0 42 Letitia Wright Female British 1993-10-31
1018 Black Panther 2018 2018-04-23 03:05:00 7.3 43 Winston Duke Male Tobagonian 1986-11-15
1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4 44 Danai Gurira Female American 1978-02-14
1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 45 Lee Pace Male American 1979-03-25
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 46 Djimon Hounsou Male Beninese 1964-04-24
1006 The Avengers 2012 2012-05-10 08:30:00 8.0 47 Clark Gregg Male American 1962-04-02
1018 Black Panther 2018 2018-04-23 03:05:00 7.3 48 Kat Dennings Female American 1986-06-13
1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 49 Stellan Skarsgård Male Swedish 1951-06-13
1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6 50 Idris Elba Male British 1972-09-06
pd.merge(df_m, df_a, left_on='movie_id', right_on='movie_id', how='right')
movie_id title year date time score actor_id actor_name gender nationality date_birth
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 1 Robert Downey Jr. Male American 1965-04-04
1 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9 2 Chris Evans Male American 1981-06-13
2 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 3 Mark Ruffalo Male American 1967-11-22
3 1004 Thor 2011 2011-01-01 07:47:00 7.0 4 Chris Hemsworth Male Australian 1983-08-11
4 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 5 Scarlett Johansson Female American 1984-11-22
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 6 Jeremy Renner Male American 1971-01-07
6 1004 Thor 2011 2011-01-01 07:47:00 7.0 7 Tom Hiddleston Male British 1981-02-09
7 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 8 Clark Gregg Male American 1962-04-02
8 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 9 Paul Rudd Male American 1969-04-06
9 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 10 Benedict Cumberbatch Male British 1976-07-19
10 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 11 Tom Holland Male British 1996-06-01
11 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 12 Elizabeth Olsen Female American 1989-02-16
12 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 13 Anthony Mackie Male American 1978-09-23
13 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 14 Chadwick Boseman Male American 1976-11-29
14 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 15 Chris Pratt Male American 1979-06-21
15 1004 Thor 2011 2011-01-01 07:47:00 7.0 16 Zoe Saldana Female American 1978-06-19
16 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 17 Karen Gillan Female British 1987-11-28
17 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 18 Dave Bautista Male American 1969-01-18
18 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 19 Vin Diesel Male American 1967-07-18
19 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 20 Bradley Cooper Male American 1975-01-05
20 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 21 Brie Larson Female American 1989-10-01
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 22 Michael B. Jordan Male American 1987-02-09
22 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 23 Tessa Thompson Female American 1983-10-03
23 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 24 Samuel L. Jackson Male American 1948-12-21
24 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 25 Gwyneth Paltrow Female American 1972-09-27
25 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 26 Don Cheadle Male American 1964-11-29
26 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 27 Paul Bettany Male British 1971-05-27
27 1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2 28 Sebastian Stan Male Romanian 1982-08-13
28 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 29 Josh Brolin Male American 1968-02-12
29 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 30 Karen Gillan Female British 1987-11-28
30 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 31 Pom Klementieff Female French 1986-05-03
31 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1 32 Evangeline Lilly Female Canadian 1979-08-03
32 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 33 Tilda Swinton Female British 1960-11-05
33 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 34 Jeff Goldblum Male American 1952-10-22
34 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 35 Jon Favreau Male American 1966-10-19
35 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 36 Ben Kingsley Male British 1943-12-31
36 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 37 Cobie Smulders Female Canadian 1982-04-03
37 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 38 Hayley Atwell Female British 1982-04-05
38 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 39 Natalie Portman Female Israeli 1981-06-09
39 1004 Thor 2011 2011-01-01 07:47:00 7.0 40 Rachel McAdams Female Canadian 1978-11-17
40 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3 41 Benedict Wong Male British 1971-07-03
41 1004 Thor 2011 2011-01-01 07:47:00 7.0 42 Letitia Wright Female British 1993-10-31
42 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 43 Winston Duke Male Tobagonian 1986-11-15
43 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4 44 Danai Gurira Female American 1978-02-14
44 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 45 Lee Pace Male American 1979-03-25
45 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 46 Djimon Hounsou Male Beninese 1964-04-24
46 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 47 Clark Gregg Male American 1962-04-02
47 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 48 Kat Dennings Female American 1986-06-13
48 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 49 Stellan Skarsgård Male Swedish 1951-06-13
49 1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6 50 Idris Elba Male British 1972-09-06

OUTER JOIN#

we use LEFT JOIN and UNION to create the OUTER JOIN in sqlite3

# OperationalError: RIGHT and FULL OUTER JOINs are not currently supported
res = cur.execute('''
    SELECT M.title, A.actor_name
    FROM (SELECT * FROM MCU_movies LIMIT 5) AS M 
    LEFT JOIN (SELECT * FROM Actors LIMIT 5) AS A 
    ON M.movie_id = A.movie_id
    UNION
    SELECT M.title, A.actor_name
    FROM (SELECT * FROM Actors LIMIT 5) AS A
    LEFT JOIN (SELECT * FROM MCU_movies LIMIT 5) AS M
    ON M.movie_id = A.movie_id
    ORDER BY A.actor_name ASC 
 ''')
for i in res.fetchall():
    print(i)
('Iron Man 2', None)
('The Incredible Hulk', None)
('Captain America: The First Avenger', 'Chris Evans')
('Thor', 'Chris Hemsworth')
(None, 'Mark Ruffalo')
('Iron Man', 'Robert Downey Jr.')
(None, 'Scarlett Johansson')
df_m.set_index('movie_id').iloc[0:5].join(df_a.set_index('movie_id').iloc[0:5], how='outer', validate='m:m').sort_values(['actor_name'], ascending=True)[['title','actor_name']]
title actor_name
movie_id
1005 Captain America: The First Avenger Chris Evans
1004 Thor Chris Hemsworth
1006 NaN Mark Ruffalo
1001 Iron Man Robert Downey Jr.
1006 NaN Scarlett Johansson
1002 The Incredible Hulk NaN
1003 Iron Man 2 NaN
pd.merge(df_m.iloc[0:5], df_a.iloc[0:5], left_on='movie_id', right_on='movie_id', how='outer').sort_values(['actor_name'], ascending=True)[['title','actor_name']]
title actor_name
4 Captain America: The First Avenger Chris Evans
3 Thor Chris Hemsworth
5 NaN Mark Ruffalo
0 Iron Man Robert Downey Jr.
6 NaN Scarlett Johansson
1 The Incredible Hulk NaN
2 Iron Man 2 NaN

CROSS#

res = cur.execute('''
    SELECT M.title, A.actor_name
    FROM (SELECT * FROM MCU_movies LIMIT 3) AS M 
    CROSS JOIN (SELECT * FROM Actors LIMIT 3) AS A 
 ''')

for i in res.fetchall():
    print(i)
('Iron Man', 'Robert Downey Jr.')
('Iron Man', 'Chris Evans')
('Iron Man', 'Mark Ruffalo')
('The Incredible Hulk', 'Robert Downey Jr.')
('The Incredible Hulk', 'Chris Evans')
('The Incredible Hulk', 'Mark Ruffalo')
('Iron Man 2', 'Robert Downey Jr.')
('Iron Man 2', 'Chris Evans')
('Iron Man 2', 'Mark Ruffalo')
df_m.iloc[0:3].join(df_a.iloc[0:3], validate='m:m', how='cross', lsuffix='_l', rsuffix='_r')
movie_id_l title year date time score actor_id actor_name movie_id_r gender nationality date_birth
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 1 Robert Downey Jr. 1001 Male American 1965-04-04
1 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 2 Chris Evans 1005 Male American 1981-06-13
2 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 3 Mark Ruffalo 1006 Male American 1967-11-22
3 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 1 Robert Downey Jr. 1001 Male American 1965-04-04
4 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 2 Chris Evans 1005 Male American 1981-06-13
5 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 3 Mark Ruffalo 1006 Male American 1967-11-22
6 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 1 Robert Downey Jr. 1001 Male American 1965-04-04
7 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 2 Chris Evans 1005 Male American 1981-06-13
8 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 3 Mark Ruffalo 1006 Male American 1967-11-22
pd.merge(df_m[0:3], df_a[0:3], how='cross')
movie_id_x title year date time score actor_id actor_name movie_id_y gender nationality date_birth
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 1 Robert Downey Jr. 1001 Male American 1965-04-04
1 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 2 Chris Evans 1005 Male American 1981-06-13
2 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 3 Mark Ruffalo 1006 Male American 1967-11-22
3 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 1 Robert Downey Jr. 1001 Male American 1965-04-04
4 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 2 Chris Evans 1005 Male American 1981-06-13
5 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 3 Mark Ruffalo 1006 Male American 1967-11-22
6 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 1 Robert Downey Jr. 1001 Male American 1965-04-04
7 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 2 Chris Evans 1005 Male American 1981-06-13
8 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 3 Mark Ruffalo 1006 Male American 1967-11-22

1.4.4 UNION#

The UNION in SQL combines the results of two or more SELECT statements into a single result set.

res = cur.execute('''
    SELECT movie_id
    FROM (SELECT movie_id FROM MCU_movies LIMIT 5)
    UNION   
    SELECT movie_id
    FROM (SELECT movie_id FROM Actors LIMIT 5);
    ''')

for i in res.fetchall():
    print(i)
(1001,)
(1002,)
(1003,)
(1004,)
(1005,)
(1006,)
pd.concat([df_m.iloc[0:5][['movie_id']], df_a.iloc[0:5][['movie_id']]]).movie_id.unique()
array([1001, 1002, 1003, 1004, 1005, 1006])

1.4.5 CASE#

The CASE statement is used to perform conditional logic within queries, allowing for conditional evaluation and result selection based on specified conditions.

res = cur.execute('''
SELECT *,
  CASE
  WHEN score >= 8 THEN 'High'
  WHEN score >= 7 THEN 'Medium'
  ELSE 'Low'
  END AS quality
  FROM MCU_movies
  ''')
for i in res.fetchall():
    print(i)
(1001, 'Iron Man', 2008, '2008-01-01', '15:22:00', 7.9, 'Medium')
(1002, 'The Incredible Hulk', 2008, '2008-03-04', '21:16:00', 6.7, 'Low')
(1003, 'Iron Man 2', 2010, '2010-01-01', '05:38:00', 7.0, 'Medium')
(1004, 'Thor', 2011, '2011-01-01', '07:47:00', 7.0, 'Medium')
(1005, 'Captain America: The First Avenger', 2011, '2011-03-15', '06:33:00', 6.9, 'Low')
(1006, 'The Avengers', 2012, '2012-05-10', '08:30:00', 8.0, 'High')
(1007, 'Iron Man 3', 2013, '2013-07-20', '19:12:00', 7.2, 'Medium')
(1008, 'Thor: The Dark World', 2013, '2013-09-25', '18:27:00', 6.9, 'Low')
(1009, 'Captain America: The Winter Soldier', 2014, '2014-11-12', '13:56:00', 7.7, 'Medium')
(1010, 'Guardians of the Galaxy', 2014, '2014-12-03', '22:25:00', 8.0, 'High')
(1011, 'Avengers: Age of Ultron', 2015, '2015-02-21', '20:50:00', 7.3, 'Medium')
(1012, 'Ant-Man', 2015, '2015-04-30', '17:01:00', 7.3, 'Medium')
(1013, 'Captain America: Civil War', 2016, '2016-06-06', '10:04:00', 7.8, 'Medium')
(1014, 'Doctor Strange', 2016, '2016-08-16', '23:11:00', 7.5, 'Medium')
(1015, 'Guardians of the Galaxy Vol. 2', 2017, '2017-10-20', '12:41:00', 7.6, 'Medium')
(1016, 'Spider-Man: Homecoming', 2017, '2017-12-14', '02:39:00', 7.4, 'Medium')
(1017, 'Thor: Ragnarok', 2017, '2017-11-03', '11:54:00', 7.9, 'Medium')
(1018, 'Black Panther', 2018, '2018-04-23', '03:05:00', 7.3, 'Medium')
(1019, 'Avengers: Infinity War', 2018, '2018-04-27', '14:09:00', 8.4, 'High')
(1020, 'Ant-Man and The Wasp', 2018, '2018-07-06', '16:29:00', 7.1, 'Medium')
(1021, 'Captain Marvel', 2019, '2019-03-08', '14:19:00', 6.9, 'Low')
(1022, 'Avengers: Endgame', 2019, '2019-01-01', '01:30:00', 8.4, 'High')
# use lambda in pandas
df_m['quality'] = df_m.score.apply(lambda x: 'High' if x >= 8 else ('Medium' if x >= 7 else 'Low'))
df_m
movie_id title year date time score quality
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 Medium
1 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 Low
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 Medium
3 1004 Thor 2011 2011-01-01 07:47:00 7.0 Medium
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9 Low
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 High
6 1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2 Medium
7 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 Low
8 1009 Captain America: The Winter Soldier 2014 2014-11-12 13:56:00 7.7 Medium
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0 High
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3 Medium
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 Medium
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 Medium
13 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 Medium
14 1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6 Medium
15 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4 Medium
16 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 Medium
17 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 Medium
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 High
19 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1 Medium
20 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 Low
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 High
# use a definition in python
def quality_level(x):
    if x >= 8:
        y ='High'
    elif x >= 7:
        y = 'Medium'
    else: 
        y = 'Low'
    return y
df_m['quality_1'] = df_m.score.apply(lambda x: quality_level(x))
df_m
movie_id title year date time score quality quality_1
0 1001 Iron Man 2008 2008-01-01 15:22:00 7.9 Medium Medium
1 1002 The Incredible Hulk 2008 2008-03-04 21:16:00 6.7 Low Low
2 1003 Iron Man 2 2010 2010-01-01 05:38:00 7.0 Medium Medium
3 1004 Thor 2011 2011-01-01 07:47:00 7.0 Medium Medium
4 1005 Captain America: The First Avenger 2011 2011-03-15 06:33:00 6.9 Low Low
5 1006 The Avengers 2012 2012-05-10 08:30:00 8.0 High High
6 1007 Iron Man 3 2013 2013-07-20 19:12:00 7.2 Medium Medium
7 1008 Thor: The Dark World 2013 2013-09-25 18:27:00 6.9 Low Low
8 1009 Captain America: The Winter Soldier 2014 2014-11-12 13:56:00 7.7 Medium Medium
9 1010 Guardians of the Galaxy 2014 2014-12-03 22:25:00 8.0 High High
10 1011 Avengers: Age of Ultron 2015 2015-02-21 20:50:00 7.3 Medium Medium
11 1012 Ant-Man 2015 2015-04-30 17:01:00 7.3 Medium Medium
12 1013 Captain America: Civil War 2016 2016-06-06 10:04:00 7.8 Medium Medium
13 1014 Doctor Strange 2016 2016-08-16 23:11:00 7.5 Medium Medium
14 1015 Guardians of the Galaxy Vol. 2 2017 2017-10-20 12:41:00 7.6 Medium Medium
15 1016 Spider-Man: Homecoming 2017 2017-12-14 02:39:00 7.4 Medium Medium
16 1017 Thor: Ragnarok 2017 2017-11-03 11:54:00 7.9 Medium Medium
17 1018 Black Panther 2018 2018-04-23 03:05:00 7.3 Medium Medium
18 1019 Avengers: Infinity War 2018 2018-04-27 14:09:00 8.4 High High
19 1020 Ant-Man and The Wasp 2018 2018-07-06 16:29:00 7.1 Medium Medium
20 1021 Captain Marvel 2019 2019-03-08 14:19:00 6.9 Low Low
21 1022 Avengers: Endgame 2019 2019-01-01 01:30:00 8.4 High High

1.4.6 GROUP BY and HAVING#

  • The GROUP BY statement groups rows that have the same values into summary rows, which is often used with aggregate functions (COUNT(), MAX(), MIN(), SUM(), AVG()) to group the result-set by one or more columns.

  • The HAVING clause is used in SQL because the WHERE keyword cannot be used with aggregate functions.

# Execute an SQL query to retrieve the year and score of MCU movies and filtering only the years where the average movie score is greater than 7.5.

res = cur.execute('''
SELECT year, AVG(score) AS avg_score 
FROM MCU_movies
GROUP BY year
HAVING AVG(score) > 7.5
''')
res.fetchall()
[(2012, 8.0),
 (2014, 7.85),
 (2016, 7.65),
 (2017, 7.633333333333333),
 (2018, 7.6000000000000005),
 (2019, 7.65)]
df_m.groupby('year')[['score']].mean().query('score > 7.5')
score
year
2012 8.000000
2014 7.850000
2016 7.650000
2017 7.633333
2018 7.600000
2019 7.650000