PostgreSQL

PostgreSQL, also known as Postgres, is a powerful and feature-rich open-source relational database management system (RDBMS).

It is widely regarded for its reliability, scalability, and extensive capabilities. Designed to handle large volumes of structured and semi-structured data, PostgreSQL follows the relational database model and ensures data integrity through ACID compliance.

With its extensive set of built-in data types, indexing options, and advanced query capabilities, PostgreSQL empowers developers to efficiently store, retrieve, and manipulate data. It offers flexibility and extensibility, allowing users to define custom data types, functions, and operators using multiple programming languages.

PostgreSQL supports various replication methods for high availability and fault tolerance, and its advanced features include full-text search, JSON and NoSQL support, and robust security measures. Used by organizations of all sizes, PostgreSQL is a preferred choice for applications that demand reliability, scalability, and advanced data management capabilities.

A few of our integrations are built on PostgreSQL databases:

Features

Feature nameSupported
Column HashingTrueColumn level
BlockingTrueColumn level
IncrementalTrue
Custom dataTrue
HistoryFalse
ReSyncTrueTable level
TemplatesFalse

πŸ”§ Setup Guide

Step 1 - Connector Configuration

From your PostgreSQL Database, access and paste the required fields:

  • Name (The name you want to refer to regarding this connection in the Weld app)
  • Host (IP Address of your server)
  • Database (Name of the database you want to connect to)

The host can be found using PSQL commands such as:

SELECT *
FROM pg_settings
WHERE name = 'port';

DB inside private network or dynamic host/IP

Our PostgreSQL connector operates with industry standard access architecture in mind. Weld > SSH > DB

The preferred way of connecting to a PostgreSQL database source is for the user to provide a public hostname or a static IP towards an SSH server Weld can connect to.

The SSH server would be a part of the user's internal network, and/or be able to resolve the database DNS alias or IP internally. This approach is not invasive and leaves the user with full control over their own infrastructure.

Step 2 - Data To Sync

  1. Select the tables you wish to include in the sync.
    You can view the schema, remove columns or hash sensitive information.
  2. Update Sync mode, which can be set to:
    • Incremental Sync
    • Full Sync at Midnight

Incremental Sync

Once the initial sync is complete, Weld performs incremental updates of any new or modified data from your source database.

The hidden PostgreSQL system column xmin can be used to select only the new or changed rows since the last update.

The XMIN method has the following limitations:

  • XMIN can't track row deletions because it relies on a updated_at column. There is no way to tell which rows in the destination are no longer present in the source database. It Requires a full table scan to detect and remove deleted rows.
  • The xmin column is updated on a row basis. Therefore, regardless of whether you selected that column in the connector schema, if any value of a column in a row changes, the xmin value also changes which would cause the connector to sync that row.

Full Sync at Midnight

A limitation of incremental syncs are that they do not capture deleted rows. If you expect rows to be deleted often from your tables then you can set a table to run a full sync at midnight which will remove those deleted rows.

Step 3 - Configure Sync

  1. Select how often you would like the data to sync.
  2. Provide a unique destination table name.

Weld will take over from here and commence syncing data from your PostgreSQL database.


πŸ”§ Troubleshooting

1. Heroku managed PostgreSQL Databases

From Heroku Docs:

Heroku Postgres databases are designed to be used with a Heroku app. All Heroku Postgres databases have a corresponding Heroku application. The database URL is managed by Heroku and will change under some circumstances such as:

  • User-initiated database credential rotations using heroku pg:credentials:rotate.
  • Catastrophic hardware failures that require Heroku Postgres staff to recover your database on new hardware.
  • Security issues or threats that require Heroku Postgres staff to rotate database credentials.
  • Automated failover events on HA-enabled plans.

However, they can be used with external services like Weld, if the user is able to dynamically resolve the DNS within their infrastructure. The Heroku Docs contain directions on how best to achieve that.

2. Permissions or ACL errors

In some cases you might see an error message similar to this:

(aclcheck_error) permission denied for table X

In simplest terms, that means that we don't have access to query the table or view. Please revisit the permissions given to the user to ensure it has the read access.

If your database might use column-based permissions, the resolution is similar - find out what columns the user is not allowed to read, and then simply exclude them from the sync through the Weld UI.

3. Replica delays and timeouts

If the connector is connected to a replica, you may see an error message similar to these:

Canceling statement due to conflict with recovery.

OR

(40001) User query might have needed to see row versions that must be removed.

This exclusively happens when communicating with a replica, and in most cases when the replica or the table is under pressure. In broad terms, this means that a replica is unable to serve a response for our query, because the underlying data on the primary server might have changed in the meantime.

When the replica server needs to sync-up with the master, the max_standby_streaming_delay parameter determines how long the replica should wait before canceling any queries that may have needed to see row versions that are about to be removed. Read more on the parameter and how to configure it in the official Postgres documentation.

The main strategy used to overcome this issue is to increase the value of the max_standby_streaming_delay parameter.

Was this page helpful?