MariaDB: Enable remote connections

To enable remote connections to a MariaDB server, you typically need to follow these steps:

  1. Configure MariaDB to Listen on All Interfaces: By default, MariaDB might be configured to listen only on the localhost (127.0.0.1), which means it will not accept connections from remote machines. To change this, you need to edit the MariaDB configuration file.Locate the MariaDB configuration file, which is usually named my.cnf or my.ini depending on your operating system and MariaDB version.Add or modify the bind-address parameter in the [mysqld] section of the configuration file to listen on all interfaces:[mysqld] bind-address = 0.0.0.0
  2. Grant Remote Access Privileges: After configuring MariaDB to listen on all interfaces, you need to grant remote access privileges to the user account you want to use for remote connections. By default, remote access is not granted for security reasons.Connect to your MariaDB server using a MySQL client such as mysql or phpMyAdmin:bashCopy codemysql -u username -p Replace username with your MySQL username.Then, run the following SQL command to grant remote access to the user. Replace remote_user with the actual username and remote_host with the IP address or hostname of the remote machine:GRANT ALL PRIVILEGES ON *.* TO 'remote_user'@'remote_host' IDENTIFIED BY 'password' WITH GRANT OPTION; Replace 'password' with the password for the user account.Note: Using ALL PRIVILEGES is quite permissive. You may want to limit the privileges to the specific databases or tables the user needs access to.
  3. Firewall Configuration: Ensure that your firewall allows incoming connections on the MariaDB port (usually 3306). You might need to open this port if it’s blocked.
  4. Restart MariaDB: After making changes to the configuration file, restart the MariaDB service to apply the changes.sudo systemctl restart mariadb Use the appropriate command for your operating system if you’re not using systemd.

After following these steps, your MariaDB server should be configured to accept remote connections from the specified user account. Make sure to consider security implications and follow best practices when enabling remote access.

AWS: Databases on AWS

Databases are purpose-built on AWS, which means that each AWS database service is built for a specific use case or set of use cases. Using a database that is a best fit for the use case can save a lot of time in development hours. In the past, it was common to use relational databases for everything because they were the most commonly operated database on premises. With AWS, you can run different types of databases more easily without managing the infrastructure yourself. This can lead to making decisions that are more aligned with the use case and aren’t limited to in-house skill for database administration.

For this weeks customer, Morgan chose Amazon DynamoDB as the database choice because the customer is using it as a simple lookup table, there is no need for complex SQL queries or joins across tables, and the serverless nature of the table makes it easy to operate over time.

For a high-level overview of the AWS database services, see AWS Cloud Databases.

Amazon Aurora

Amazon Aurora is a fully managed relational database engine that’s compatible with MySQL and PostgreSQL. You can use the code, tools, and applications for your existing MySQL and PostgreSQL databases with Aurora.

Aurora wasn’t chosen for this architecture because the customer doesn’t need the complex, enterprise-database features that Aurora offers.

As an enterprise-level database, Aurora can—with some workloads—deliver up to five times the throughput of MySQL and up to three times the throughput of PostgreSQL without requiring changes to most of your existing applications.

Aurora includes a high-performance storage subsystem. Its MySQL-compatible and PostgreSQL-compatible database engines are customized to take advantage of that fast, distributed storage. The underlying storage grows automatically as needed. An Aurora cluster volume can grow to a maximum size of 128 tebibytes (TiB). Aurora also automates and standardizes database clustering and replication, which are typically among the most challenging aspects of database configuration and administration.

Aurora is part of the managed database service Amazon Relational Database Service (Amazon RDS). Amazon RDS is a web service that makes it easier to set up, operate, and scale a relational database in the cloud. Aurora Serverless v2is an on-demand, automatic scaling configuration for Aurora.

Aurora Serverless v2 helps automate the processes of monitoring the workload and adjusting the capacity for your databases. Capacity is adjusted automatically based on application demand. You’re charged only for the resources that your database clusters consume. Thus, Aurora Serverless v2 can help you to stay within budget and reduce the need to pay for computer resources that you don’t use.

This type of automation is especially valuable for multitenant databases, distributed databases, development and test systems, and other environments with highly variable and unpredictable workloads.

Amazon RDS Proxy

By using Amazon RDS Proxy, your applications can pool and share database connections to improve their ability to scale. RDS Proxy makes applications more resilient to database failures by automatically connecting to a standby DB instance, while preserving application connections. By using RDS Proxy, you can also enforce AWS Identity and Access Management (IAM) authentication for databases, and securely store credentials in AWS Secrets Manager.

With RDS Proxy, you can handle unpredictable surges in database traffic that otherwise might cause issues because of oversubscribing connections or creating new connections at a fast rate. RDS Proxy establishes a database connection pool and reuses connections in this pool without the memory and CPU overhead of opening a new database connection each time. To protect the database against oversubscription, you can control the number of database connections that are created.

RDS Proxy queues or throttles application connections that can’t be served immediately from the pool of connections. Although latencies might increase, your application can continue to scale without abruptly failing or overwhelming the database.

Amazon DynamoDB

Amazon DynamoDB is a fully managed NoSQL database service that provides fast and predictable performance with seamless scalability. By using DynamoDB, you can offload the administrative burdens of operating and scaling a distributed database so that you can reduce your need to handle hardware provisioning, setup and configuration, replication, software patching, or cluster scaling. DynamoDB also offers encryption at rest, which reduces your operational burden and the complexity involved in protecting sensitive data.

With DynamoDB, you can create database tables that can store and retrieve virtually any amount of data and serve virtually any level of request traffic. You can scale up or scale down your tables’ throughput capacity with minimal downtime or performance degradation.

If you are an application developer, you might have some experience using a relational database management system (RDBMS) and Structured Query Language (SQL). As you begin working with Amazon DynamoDB, you will encounter many similarities, but also many things that are different.

NoSQL is a term used to describe nonrelational database systems that are highly available, scalable, and optimized for high performance. Instead of the relational model, NoSQL databases (such as DynamoDB) use alternate models for data management, such as key-value pairs or document storage.

In DynamoDB, tables, items, and attributes are the core components that you work with. A table is a collection of items, and each item is a collection of attributes. DynamoDB uses primary keys to uniquely identify each item in a table, and secondary indexes to provide more querying flexibility. You can use DynamoDB Streams to capture data modification events in DynamoDB tables.

AWS: Choose the Right AWS Database Service

AWS Database Services

AWS has a variety of different database options for different use cases. Use the table below to get a quick look at the AWS database portfolio.

Database TypeUse CasesAWS Service
RelationalTraditional applications, ERP, CRM, e-commerceAmazon RDS, Amazon Aurora, Amazon Redshift
Key-valueHigh-traffic web apps, e-commerce systems, gaming applicationsAmazon DynamoDB
In-memoryCaching, session management, gaming leaderboards, geospatial applicationsAmazon ElastiCache for Memcached, Amazon ElastiCache for Redis
DocumentContent management, catalogs, user profilesAmazon DocumentDB (with MongoDB compatibility)
Wide columnHigh-scale industrial apps for equipment maintenance, fleet management, and route optimizationAmazon Keyspaces (for Apache Cassandra)
GraphFraud detection, social networking, recommendation enginesAmazon Neptune
Time seriesIoT applications, DevOps, industrial telemetryAmazon Timestream
LedgerSystems of record, supply chain, registrations, banking transactionsAmazon QLDB

Breaking Up Applications and Databases

As the industry changes, applications and databases change too. Today, with larger applications, you no longer see just one database supporting it. Instead, these applications are being broken into smaller services, each with their own purpose-built database supporting it.

This shift removes the idea of a one-size-fits-all database and replaces it with a complimentary database strategy. You can give each database the appropriate functionality, performance, and scale that the workload requires.

Resources:

Linux: How to replace a bad disk on a Linux RAID configuration

Replacing a failed disk in a Linux RAID configuration involves several steps to ensure that the array remains operational and data integrity is maintained. Below is a step-by-step guide on how to replace a bad disk in a Linux RAID configuration using the mdadm utility:

  1. Identify the Failed Disk:
    • Use the mdadm --detail /dev/mdX command to display detailed information about the RAID array.
    • Look for the state of each device in the array to identify the failed disk.
    • Note the device name (e.g., /dev/sdX) of the failed disk.
  2. Prepare the New Disk:
    • Insert the new disk into the system and ensure it is recognized by the operating system.
    • Partition the new disk using a partitioning tool like fdisk or parted. Create a Linux RAID (type FD) partition on the new disk.
  3. Add the New Disk to the RAID Array:
    • Use the mdadm --manage /dev/mdX --add /dev/sdX1 command to add the new disk to the RAID array.
    • Replace /dev/mdX with the name of your RAID array and /dev/sdX1 with the partition name of the new disk.
    • This command starts the process of rebuilding the RAID array onto the new disk.
  4. Monitor the Rebuild Process:
    • Monitor the rebuild process using the mdadm --detail /dev/mdX command.
    • Check the progress and status of the rebuild operation to ensure it completes successfully.
    • The rebuild process may take some time depending on the size of the RAID array and the performance of the disks.
  5. Verify RAID Array Status:
    • After the rebuild process completes, verify the status of the RAID array using the mdadm --detail /dev/mdX command.
    • Ensure that all devices in the array are in the “active sync” state and that there are no errors or warnings.
  6. Update Configuration Files:
    • Update configuration files such as /etc/mdadm/mdadm.conf to ensure that the new disk is recognized and configured correctly in the RAID array.
  7. Perform Testing and Verification:
    • Perform thorough testing to ensure that the RAID array is functioning correctly and that data integrity is maintained.
    • Test read and write operations on the array to verify its performance and reliability.
  8. Optional: Remove the Failed Disk:
    • Once the rebuild process is complete and the RAID array is fully operational, you can optionally remove the failed disk from the array using the mdadm --manage /dev/mdX --remove /dev/sdX1 command.
    • This step is optional but can help clean up the configuration and remove any references to the failed disk.

By following these steps, you can safely replace a bad disk in a Linux RAID configuration using the mdadm utility while maintaining data integrity and ensuring the continued operation of the RAID array.

What is RAID and how do you configure it in Linux?

RAID (Redundant Array of Independent Disks) is a technology used to combine multiple physical disk drives into a single logical unit for data storage, with the goal of improving performance, reliability, or both. RAID arrays distribute data across multiple disks, providing redundancy and/or improved performance compared to a single disk.

There are several RAID levels, each with its own characteristics and benefits. Some common RAID levels include RAID 0, RAID 1, RAID 5, RAID 6, and RAID 10. Each RAID level uses a different method to distribute and protect data across the disks in the array.

Here’s a brief overview of some common RAID levels:

  1. RAID 0 (Striping):
    • RAID 0 offers improved performance by striping data across multiple disks without any redundancy.
    • It requires a minimum of two disks.
    • Data is distributed evenly across all disks in the array, which can improve read and write speeds.
    • However, there is no redundancy, so a single disk failure can result in data loss for the entire array.
  2. RAID 1 (Mirroring):
    • RAID 1 provides redundancy by mirroring data across multiple disks.
    • It requires a minimum of two disks.
    • Data written to one disk is simultaneously written to another disk, providing redundancy in case of disk failure.
    • RAID 1 offers excellent data protection but doesn’t provide any performance benefits compared to RAID 0.
  3. RAID 5 (Striping with Parity):
    • RAID 5 combines striping with parity data to provide both improved performance and redundancy.
    • It requires a minimum of three disks.
    • Data is striped across multiple disks, and parity information is distributed across all disks.
    • If one disk fails, data can be reconstructed using parity information stored on the remaining disks.
  4. RAID 6 (Striping with Dual Parity):
    • RAID 6 is similar to RAID 5 but includes an additional level of redundancy.
    • It requires a minimum of four disks.
    • RAID 6 can tolerate the failure of up to two disks simultaneously without data loss.
    • It provides higher fault tolerance than RAID 5 but may have slightly lower performance due to the additional parity calculations.
  5. RAID 10 (Striping and Mirroring):
    • RAID 10 combines striping and mirroring to provide both improved performance and redundancy.
    • It requires a minimum of four disks.
    • Data is striped across mirrored sets of disks, offering both performance and redundancy benefits of RAID 0 and RAID 1.

To configure RAID in Linux, you typically use software-based RAID management tools provided by the operating system. The most commonly used tool for configuring RAID in Linux is mdadm (Multiple Device Administration), which is a command-line utility for managing software RAID devices.

Here’s a basic outline of the steps to configure RAID using mdadm in Linux:

  1. Install mdadm (if not already installed):sudo apt-get install mdadm # For Debian/Ubuntu sudo yum install mdadm # For CentOS/RHEL
  2. Prepare the disks:
    • Ensure that the disks you plan to use for RAID are connected and recognized by the system.
    • Partition the disks using a partitioning tool like fdisk or parted. Create Linux RAID (type FD) partitions on each disk.
  3. Create RAID arrays:
    • Use the mdadm command to create RAID arrays based on the desired RAID level.
    • For example, to create a RAID 1 array with two disks (/dev/sda and /dev/sdb):sudo mdadm --create /dev/md0 --level=1 --raid-devices=2 /dev/sda1 /dev/sdb1
  4. Format and mount the RAID array:
    • Once the RAID array is created, format it with a filesystem of your choice (e.g., ext4) using the mkfs command.
    • Mount the RAID array to a mount point in the filesystem.
  5. Update configuration files:
    • Update configuration files such as /etc/mdadm/mdadm.conf to ensure that the RAID array configuration is persistent across reboots.
  6. Monitor and manage RAID arrays:
    • Use mdadm commands to monitor and manage RAID arrays, such as adding or removing disks, checking array status, and replacing failed disks.

These are general steps for configuring software RAID using mdadm in Linux. The exact commands and procedures may vary depending on the specific RAID level and configuration requirements. It’s essential to refer to the documentation and guides specific to your Linux distribution and RAID configuration.

Linux: systemd target units examples

Here is a list of some systemd target units along with examples of how to use them:

  1. multi-user.target:
    • This target is used for a multi-user system without a graphical interface. It includes services required for a text-based or command-line environment.
    • Example: To switch to the multi-user target, you can use the following command: sudo systemctl isolate multi-user.target
  2. graphical.target:
    • Represents a multi-user system with a graphical interface (GUI). It includes services required for a graphical desktop environment.
    • Example: To switch to the graphical target, you can use the following command:sudo systemctl isolate graphical.target
  3. rescue.target:
    • Similar to runlevel 1 or single-user mode in traditional SysVinit systems. It provides a minimal environment with a root shell for system recovery and maintenance tasks.
    • Example: To switch to the rescue target, you can use the following command:sudo systemctl isolate rescue.target
  4. emergency.target:
    • Provides the most minimal environment possible, intended for emergencies where the system is in an unusable state. It drops the system into a single-user shell without starting any services.
    • Example: To switch to the emergency target, you can use the following command:sudo systemctl emergency
  5. shutdown.target:
    • Used to gracefully shut down the system. All services are stopped, and the system is powered off or rebooted, depending on the shutdown command used.
    • Example: To initiate a shutdown using this target, you can use the following command:sudo systemctl shutdown
  6. network.target:
    • Represents the availability of the network. Other services that depend on network connectivity may be started after this target is reached.
    • Example: To view the status of the network target, you can use the following command:systemctl status network.target
  7. sockets.target:
    • Represents the availability of system sockets. Services that provide network services via sockets may be started after this target is reached.
    • Example: To view the status of the sockets target, you can use the following command:systemctl status sockets.target

These are some of the systemd target units along with examples of how to use them. Depending on your specific distribution and configuration, there may be additional targets or custom targets defined. You can explore more targets and their usage by referring to the systemd documentation or using the systemctl list-units --type=target command.

Databricks: Create a table in Databricks using an external PostgreSQL data source,

o create a table in Databricks using an external PostgreSQL data source, you can use the CREATE TABLE SQL statement with the USING clause to specify the data source. Here’s a basic example:

CREATE TABLE your_table_name
USING jdbc
OPTIONS (
url 'jdbc:postgresql://your_postgresql_host:port/your_database',
dbtable 'your_table_in_postgresql',
user 'your_username',
password 'your_password'
);

In this SQL statement:

  • your_table_name is the name you want to assign to your table in Databricks.
  • jdbc specifies that you’re using the JDBC data source.
  • url is the JDBC connection URL for your PostgreSQL database.
  • dbtable is the name of the table in your PostgreSQL database that you want to create a Databricks table from.
  • user is the username to connect to your PostgreSQL database.
  • password is the password associated with the username.

Replace the placeholders (your_...) with your actual values.

Make sure you have the appropriate JDBC driver installed in your Databricks cluster. You can upload the JDBC driver JAR file to your cluster’s storage or use Maven coordinates if the driver is available on Maven repositories.

Here’s an example using Maven coordinates for the PostgreSQL JDBC driver:

CREATE TABLE your_table_name
USING jdbc
OPTIONS (
url 'jdbc:postgresql://your_postgresql_host:port/your_database',
dbtable 'your_table_in_postgresql',
user 'your_username',
password 'your_password',
driver 'org.postgresql.Driver'
);

Replace org.postgresql.Driver with the appropriate driver class name for your PostgreSQL JDBC driver.

After running this SQL statement in a Databricks notebook or SQL cell, the table your_table_name will be created in Databricks, and its schema and data will be synchronized with the specified table in your PostgreSQL database.