Michael

Posted on Jun 14 • Originally published at gbase.cn

GBase 8a Application Connectivity Guide: JDBC and Python Best Practices

#gbase #database #数据库 #python

This guide is for developers integrating GBase 8a into their applications. It covers JDBC driver setup, connection pool tuning, batch write optimization, Python connectivity, and common troubleshooting tips.

1. JDBC Setup

Driver Dependency

The JDBC driver is located in the installation package at $GCLUSTER_BASE/server/connector/jdbc/gbase-connector-java-*.jar.

Maven:

<dependency>
    <groupId>com.gbase</groupId>
    <artifactId>gbase-connector-java</artifactId>
    <version>9.5.0</version>
</dependency>

JDBC URL Parameters

String url = "jdbc:gbase://10.168.10.26:5258/sales_db"
    + "?characterEncoding=utf8"
    + "&useUnicode=true"
    + "&connectTimeout=10000"           // TCP connection timeout 10s
    + "&socketTimeout=3600000"          // SQL execution timeout 1h
    + "&rewriteBatchedStatements=true"  // Merge batch inserts
    + "&serverTimezone=Asia/Shanghai";
Connection conn = DriverManager.getConnection(url, "app_user", "password");

socketTimeout must be long enough for analytical queries; otherwise the client disconnects while the query still runs on the gbase database.
rewriteBatchedStatements=true merges multiple INSERT rows into a single network packet, boosting throughput 5–20×. Combine with gbase_buffer_insert=1000 on the server side.

2. Connection Pool Configuration (Druid Example)

spring:
  datasource:
    druid:
      initial-size: 5
      min-idle: 5
      max-active: 20
      max-wait: 60000
      test-while-idle: true
      test-on-borrow: false
      validation-query: SELECT 1
      time-between-eviction-runs-millis: 60000
      min-evictable-idle-time-millis: 300000

Key rules:

max-active should not exceed max_connections divided by the number of gcluster nodes.
Enable test-while-idle and set the eviction interval shorter than gcluster's Wait_timeout (default 3600s) to avoid dead connections.

3. Batch Insert Best Practices

conn.setAutoCommit(false);
PreparedStatement pstmt = conn.prepareStatement(
    "INSERT INTO orders VALUES (?, ?, ?, ?)");
int batchSize = 1000;
int count = 0;
for (Order order : orderList) {
    pstmt.setLong(1, order.getOrderId());
    // ... set other parameters
    pstmt.addBatch();
    if (++count % batchSize == 0) {
        pstmt.executeBatch();
        conn.commit();
        pstmt.clearBatch();
    }
}
if (count % batchSize != 0) {
    pstmt.executeBatch();
    conn.commit();
}

For over 1 million rows, prefer LOAD DATA INFILE — it offers significantly higher throughput than batch INSERT.

4. Python Connectivity

PyMySQL

import pymysql
conn = pymysql.connect(
    host='10.168.10.26', port=5258,
    user='app_user', password='your_password',
    database='sales_db', charset='utf8',
    connect_timeout=10, read_timeout=3600,
    autocommit=True
)
with conn.cursor() as cursor:
    cursor.execute("SELECT dept_id, SUM(sale_amount) FROM sales ...")
    for row in cursor.fetchall():
        print(row)
conn.close()

Batch Insert with executemany

data = [(10001, 1, 1500.00, '2024-06-01'), ...]
with conn.cursor() as cursor:
    cursor.executemany(
        "INSERT INTO orders VALUES (%s, %s, %s, %s)", data
    )
conn.commit()

SQLAlchemy Integration

from sqlalchemy import create_engine
engine = create_engine(
    "mysql+pymysql://user:password@10.168.10.26:5258/sales_db?charset=utf8",
    pool_size=5, max_overflow=10, pool_recycle=1800,
    connect_args={"read_timeout": 3600, "write_timeout": 60}
)
df = pd.read_sql("SELECT * FROM orders WHERE order_date >= '2024-06-01'", engine)

5. Common Connection Issues

Issue	Cause	Solution
Communications link failure	Connection pool holds a dead connection	Enable test-while-idle, set pool-recycle
Packet too large	Single INSERT batch exceeds limit	Increase `max_allowed_packet` or reduce batch size
Python garbled characters	Missing charset declaration	`charset='utf8'`, `use_unicode=True`
SocketTimeoutException	`socketTimeout` too short	Use different timeouts for OLTP and analytical queries

6. Quick Reference

Scenario	Recommendation
Max pool size	≤ max_connections / gcluster nodes
socketTimeout	Analytical: 3600000ms; OLTP: 30000ms
Batch INSERT	rewriteBatchedStatements=true, 1000 rows/batch
Prevent connection eviction	testWhileIdle=true, poolRecycle=1800
Python batch writes	executemany() or LOAD DATA

Whether you are building a Java service with connection pooling or a Python analytics pipeline, following these configuration patterns will help you keep your gbase database connections stable and performant.

DEV Community