We use the CAST() function to show the timestamp as a human-readable time and date.
SELECT
rating,
CAST(timeRecorded as timestamp)
FROM
movieRatings;
We use the CAST() function to show the timestamp as a human-readable time and date.
SELECT
rating,
CAST(timeRecorded as timestamp)
FROM
movieRatings;
DROP TABLE IF EXISTS movieRatings;
CREATE TABLE movieRatings (
userId INT,
movieId INT,
rating FLOAT,
timeRecorded INT
) USING csv OPTIONS (
PATH “/mnt/training/movies/20m/ratings.csv”,
header “true”
);
SELECT firstName
FROM PeopleDistinctNames
JOIN SSADistinctNames ON firstName = ssaFirstName
SELECT count(DISTINCT firstName)
FROM SSANames;
CREATE OR REPLACE TEMPORARY VIEW PeopleSavings AS
SELECT
firstName,
lastName,
year(birthDate) as birthYear,
salary,
salary * 0.2 AS savings
FROM
People10M;
park SQL is a component of Apache Spark that enables querying structured data using SQL syntax, either through SQL queries or DataFrame APIs. Here’s a brief overview of some basic queries you can perform using Spark SQL:
SELECT col1, col2 FROM table_name;SELECT * FROM table_name WHERE condition;SELECT COUNT(*), AVG(salary) FROM employee_table;SELECT department, AVG(salary) FROM employee_table GROUP BY department;SELECT * FROM table1 JOIN table2 ON table1.key = table2.key;SELECT * FROM table_name ORDER BY column_name ASC/DESC;SELECT * FROM table1 WHERE col1 IN (SELECT col2 FROM table2);ELECT department, employee_id, salary, AVG(salary) OVER (PARTITION BY department) AS avg_salary_department FROM employee_table;WITH cte AS ( SELECT department, AVG(salary) AS avg_salary FROM employee_table GROUP BY department ) SELECT * FROM cte WHERE avg_salary > 50000;SELECT col1 FROM table1 UNION SELECT col2 FROM table2;These are some of the basic SQL queries you can perform using Spark SQL. Keep in mind that Spark SQL supports a wide range of SQL functionalities, and you can use it to handle complex data manipulation and analysis tasks.
DROP TABLE IF EXISTS People10M;
CREATE TABLE People10M
USING parquet
OPTIONS (
path “/mnt/training/dataframes/people-10m.parquet”,
header “true”);