Overview
之前项目的人遗留的数据散落在多种数据库中,既有MySQL
,MongoDB
,也有Cassandra
和PostgreSQL
。在Python3
版本的jupyter
中连接PostgreSQL
需要安装psycopg2
,而psycopg2
在Python2
中则是已经集成好的。
1. 安装python3-psycopg2
和libpq-dev
先在Linux上安装好这两个包,
sudo apt-get install python3-psycopg2
sudo apt-get install libpq-dev
再安装psycopg2
,
pip3 install psycopg2
否则就会报错如下:
ERROR: Command errored out with exit status 1:
command: /home/yangbingjiao/anaconda3/bin/python -c 'import sys, setuptools, tokenize; sys.argv[0] = '"'"'/tmp/pip-install-wxj7s4lx/psycopg2/setup.py'"'"'; __file__='"'"'/tmp/pip-install-wxj7s4lx/psycopg2/setup.py'"'"';f=getattr(tokenize, '"'"'open'"'"', open)(__file__);code=f.read().replace('"'"'\r\n'"'"', '"'"'\n'"'"');f.close();exec(compile(code, __file__, '"'"'exec'"'"'))' egg_info --egg-base pip-egg-info
cwd: /tmp/pip-install-wxj7s4lx/psycopg2/
Complete output (23 lines):
running egg_info
creating pip-egg-info/psycopg2.egg-info
writing pip-egg-info/psycopg2.egg-info/PKG-INFO
writing dependency_links to pip-egg-info/psycopg2.egg-info/dependency_links.txt
writing top-level names to pip-egg-info/psycopg2.egg-info/top_level.txt
writing manifest file 'pip-egg-info/psycopg2.egg-info/SOURCES.txt'
Error: pg_config executable not found.
pg_config is required to build psycopg2 from source. Please add the directory
containing pg_config to the $PATH or specify the full executable path with the
option:
python setup.py build_ext --pg-config /path/to/pg_config build ...
or with the pg_config option in 'setup.cfg'.
If you prefer to avoid building psycopg2 from source, please install the PyPI
'psycopg2-binary' package instead.
For further information please check the 'doc/src/install.rst' file (also at
<http://initd.org/psycopg/docs/install.html>).
----------------------------------------
ERROR: Command errored out with exit status 1: python setup.py egg_info Check the logs for full command output.
2.连接PostgreSQL
数据库
import psycopg2
import pandas as pd
conn = psycopg2.connect(database="database", user="user", password="password", host="192.168.1.230", port="5432")
cur = conn.cursor()
pql = """
SELECT *
FROM database.table
WHERE create_time >= '2019-09-23' AND create_time < '2019-09-30'
order by create_time
limit 10
"""
cur.execute(pql)
rows = cur.fetchall()
df = pd.DataFrame(rows)
conn.close()
本文主要参考了以下文章:
Python3连接PostgreSQL(10.5)数据库