Last Updated: 12 Jan 2023 Views: 3121 (After reading, tell us below if this answer was helpful)

How do I access WRDS via Python?


Most of the time, databases in WRDS are accessed via the standard web interface. If you have an individual WRDS account (not a class account), you may prefer to access the databases directly via Python scripts.  This will allow you to write large, efficient queries that combine data sets, then download just the resulting data. It is particularly useful for BoardEx which has many linked tables of data (for example, the gender of a person is kept in a different table to the employment history, linked by person ID). See our attached file for an example script joining two BoardEx tables.

If you organise your Python code in a Jupyter Notebook, you can keep track of your steps and then repeat them at a later date. It is very important that your research is reproducible and using Jupyter notebooks is a very effective method to help.

You must choose your access method, use either the WRDS Cloud or Python on your own computer.

 

Accessing with JupyterHub on WRDS Cloud

WRDS Cloud provides their own web-based Jupyter environment called JupyterHub. This is the quickest and easiest way to access WRDS databases with Python scripts. You can download any files you create in WRDS Cloud.

 

Accessing with Python/Jupyter on your own computer

You will need to install the wrds library using Pip at the (Conda) command line:
pip install wrds

The data will be returned as Panda data sets. You should already have the pandas library installed.

Create a session within Python:

import wrds
db = wrds.Connection(wrds_username = '<your_wrds_username>')

You will be asked to type your WRDS username and password.

 

Structure

WRDS datasets are arranged into libraries (such as crsp or boardex), each library contains many tables. You will need to know which library to use, see the WRDS web pages for each database for their names. You will not have access to all the data, please log into the WRDS web pages to see details of our subscription.

From here, use the database connection to perform tasks such as listing tables within a library, then sort the results:

sorted(db.list_tables(library = 'crsp'))

You can choose a specific table and then perform a retrieval request on it. You may wish to use a list of companies. You can load a text file of company IDs into Python then use them in your request.

 

Tutorials and help

The following blog posts and webpages provide more support with accessing WRDS datasets via Python:

Special thanks to Prof Joao Quariguasi Frota Net for helping with this question.