Beginner seeks advice on PUMS returns: Are R and tidycensus the best route?

I am a new user. My task is nothing new, as I've seen such statistics that I seek by quintiles or quartiles in a few reports, but the latest ACS would be best. I used the search for topics on this forum, and didn't see a specific answer for my task about quintiles.

Before I even start my first use of R, tidycensus, and the like, I concluded that I should ask the commuinity first, as advised by Census staff.  As I consider that your time is valuable, I will highly appreciate any advice you might offer to get me on the correct path on this project. For background, the project is research for changing U.S. and other national housing inventories (so it's not so expensive for average earners). I am former appointed government official for a regional government (Cook County in Chicago, Illinois) and worked on political campaigns, with a few mentions on Wikipedia, for refences to my previous work.

Other than significantly altering the housing inventory in the next few decades, I am not trying to do much, that is, as far as a census query goes, I hope. For now, I only need a few ACS PUMS variables like VALP, for example, separated out by numeric parameters, like by the income numbers provided in Table B19080. It would be great to have the low to high range and median for each U.S. quintile. I will add up states, regions, etc, if I need to in order to get U.S. totals, or use a U.S. total for metro regions, for example, explaining that as such, depending on what PUMS data can return.

I planned to use R and tidycensus to return queries for VALP, for example using 5 queries, using the quintile numeric value ranges from Table B19080. 

Would you recomend R?

And titdycensus?

These both were recommended by Corey Sparks of the Univ. of Texas at San Antonio, who touted them on the census.gov webinar of last April, 2022.

How much time would a paid service like Stata save me? I didn't yet ask for a SAS price quote. Stata's cost is $840 for their beginner service. Would that Stata package do the job? Or would I have to still expend a fair amount of effort on Stata for the data returns?

I would just as soon spend some extra time on R, if you advise that it won't take me forever, because I hope that knowing the more about R, tidy census and other such options from GitHub and the like would be beneficial in the long run, and save me the 800 bucks, so I could support my own time, and support others like Kyle Walker with a few bucks with his ongoing tidycensus work.

As a bumbling beginner, I welcome any advice. Thank you.

Troy Deckert

Parents
  • Hi Troy!  Thanks for considering tidycensus.  One advantage of going the R / tidycensus route is that Matt and I have built tooling to help with some of the more challenging aspects of working with PUMS data, such as getting replicate weights / calculating standard errors.  

    I don't believe there is a time savings to using Stata unless you already know how to use Stata.  Since the introduction of the tidyverse framework, I find the learning curve for Stata and R to be comparable.  

    For getting started with PUMS data in R as an absolute beginner, I'd recommend the following sequence:

    You might also look at IPUMS USA which has a really smooth interface for data downloads and comprehensive documentation.  You'd still need Stata or R to analyze the data, but IPUMS has a number of tutorials to help with that too.  Good luck getting started!

  • Hi Troy - this response may be too late to be helpful to you, but I always suggest folks try swirl to start learning to use R (https://swirlstats.com/students.html). It walks you through using R and there are many courses available now (http://swirlstats.com/scn/title.html). 'Getting and Cleaning Data' is really good! 

Reply Children