Alexander, Rohan, and Paul Hodgetts. 2021.
AustralianPoliticians: Provides Datasets About Australian Politicians.
https://CRAN.R-project.org/package=AustralianPoliticians.
Annas, George. 2003.
“HIPAA Regulations: A New Era of Medical-Record Privacy?” New England Journal of Medicine 348 (15): 1486–90.
https://doi.org/10.1056/NEJMlim035027.
Arel-Bundock, Vincent, Ryan Briggs, Hristos Doucouliagos, Marco Mendoza Aviña, and T. D. Stanley. 2022.
“Quantitative Political Science Research Is Greatly Underpowered.” https://osf.io/bzj9y/.
Asquith, Brian, Brad Hershbein, Tracy Kugler, Shane Reed, Steven Ruggles, Jonathan Schroeder, Steve Yesiltepe, and David Van Riper. 2022.
“Assessing the Impact of Differential Privacy on Measures of Population and Racial Residential Segregation.” Harvard Data Science Review.
https://doi.org/10.1162/99608f92.5cd8024e.
Athey, Susan, Guido Imbens, Jonas Metzger, and Evan Munro. 2021.
“Using Wasserstein Generative Adversarial Networks for the Design of Monte Carlo Simulations.” Journal of Econometrics.
https://doi.org/10.1016/j.jeconom.2020.09.013.
Bandy, Jack, and Nicholas Vincent. 2021.
“Addressing ‘Documentation Debt’ in Machine Learning Research: A Retrospective Datasheet for BookCorpus.” arXiv.
https://doi.org/10.48550/arXiv.2105.05241.
Bender, Emily, Timnit Gebru, Angelina McMillan-Major, and Shmargaret Shmitchell. 2021.
“On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?” In
Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.
ACM.
https://doi.org/10.1145/3442188.3445922.
Biderman, Stella, Kieran Bicheno, and Leo Gao. 2022.
“Datasheet for the Pile.” https://arxiv.org/abs/2201.07311.
Bowen, Claire McKay. 2022.
Protecting Your Privacy in a Data-Driven World. 1st ed. Chapman; Hall/CRC.
https://doi.org/10.1201/9781003122043.
Buneman, Peter, Sanjeev Khanna, and Tan Wang-Chiew. 2001.
“Why and Where: A Characterization of Data Provenance.” In
Database Theory ICDT 2001, 316–30. Springer Berlin Heidelberg.
https://doi.org/10.1007/3-540-44503-x_20.
Buolamwini, Joy, and Timnit Gebru. 2018. “Gender Shades: Intersectional Accuracy Disparities in Commercial Gender Classification.” In Conference on Fairness, Accountability and Transparency, 77–91.
Byrd, James Brian, Anna Greene, Deepashree Venkatesh Prasad, Xiaoqian Jiang, and Casey Greene. 2020.
“Responsible, Practical Genomic Data Sharing That Accelerates Research.” Nature Reviews Genetics 21 (10): 615–29.
https://doi.org/10.1038/s41576-020-0257-5.
Carleton, Chris. 2021.
wccarleton/conflict-europe: Acce (version v1.0.0). Zenodo.
https://doi.org/10.5281/zenodo.4550688.
Carleton, Chris, Dave Campbell, and Mark Collard. 2021.
“A Reassessment of the Impact of Temperature Change on European Conflict During the Second Millennium CE Using a Bespoke Bayesian Time-Series Model.” Climatic Change 165 (1): 1–16.
https://doi.org/10.1007/s10584-021-03022-2.
Christensen, Garret, Allan Dafoe, Edward Miguel, Don Moore, and Andrew Rose. 2019.
“A Study of the Impact of Data Sharing on Article Citations Using Journal Policies as a Natural Experiment.” PLoS One 14 (12): e0225883.
https://doi.org/10.1371/journal.pone.0225883.
Christensen, Garret, Jeremy Freese, and Edward Miguel. 2019. Transparent and Reproducible Social Science Research. California: University of California Press.
Cohen, Glenn, and Michelle Mello. 2018.
“HIPAA and Protecting Health Information in the 21st Century.” JAMA 320 (3): 231.
https://doi.org/10.1001/jama.2018.5630.
Council of European Union. 2016.
“General Data Protection Regulation 2016/679.” https://eur-lex.europa.eu/eli/reg/2016/679/oj.
Dwork, Cynthia, Frank McSherry, Kobbi Nissim, and Adam Smith. 2006.
“Calibrating Noise to Sensitivity in Private Data Analysis.” In
Theory of Cryptography Conference, 265–84. Springer.
https://doi.org/10.1007/11681878_14.
Dwork, Cynthia, and Aaron Roth. 2013.
“The Algorithmic Foundations of Differential Privacy.” Foundations and Trends in Theoretical Computer Science 9 (3-4): 211–407.
https://doi.org/10.1561/0400000042.
Flynn, Michael. 2022.
troopdata: Tools for Analyzing Cross-National Military Deployment and Basing Data.
https://CRAN.R-project.org/package=troopdata.
Gebru, Timnit, Jamie Morgenstern, Briana Vecchione, Jennifer Wortman Vaughan, Hanna Wallach, Hal Daumé III, and Kate Crawford. 2021.
“Datasheets for Datasets.” Communications of the ACM 64 (12): 86–92.
https://doi.org/10.1145/3458723.
Geuenich, Michael, Jinyu Hou, Sunyun Lee, Shanza Ayub, Hartland Jackson, and Kieran Campbell. 2021a.
“Automated Assignment of Cell Identity from Single-Cell Multiplexed Imaging and Proteomic Data.” Cell Systems 12 (12): 1173–86.
https://doi.org/10.1016/j.cels.2021.08.012.
———. 2021b.
“Replication Materials: "Automated Assignment of Cell Identity from Single-Cell Multiplexed Imaging and Proteomic Data".” https://doi.org/10.5281/ZENODO.5156049.
Greenberg, Bernard, Abdel-Latif Abul-Ela, Walt Simmons, and Daniel Horvitz. 1969.
“The Unrelated Question Randomized Response Model: Theoretical Framework.” Journal of the American Statistical Association 64 (326): 520–39.
https://doi.org/10.1080/01621459.1969.10500991.
Hart, Edmund, Pauline Barmby, David LeBauer, François Michonneau, Sarah Mount, Patrick Mulrooney, Timothée Poisot, Kara Woo, Naupaka Zimmerman, and Jeffrey Hollister. 2016.
“Ten Simple Rules for Digital Data Storage.” PLOS Computational Biology 12 (10): e1005097.
https://doi.org/10.1371/journal.pcbi.1005097.
Hawes, Michael. 2020.
“Implementing Differential Privacy: Seven Lessons From the 2020 United States Census.” Harvard Data Science Review 2 (2).
https://doi.org/10.1162/99608f92.353c6f99.
Heil, Benjamin, Michael Hoffman, Florian Markowetz, Su-In Lee, Casey Greene, and Stephanie Hicks. 2021.
“Reproducibility Standards for Machine Learning in the Life Sciences.” Nature Methods 18 (10): 1132–35.
https://doi.org/10.1038/s41592-021-01256-7.
Hester, Jim, Hadley Wickham, and Gábor Csárdi. 2021.
fs: Cross-Platform File System Operations Based on “libuv”.
https://CRAN.R-project.org/package=fs.
Hotz, Joseph, Christopher Bollinger, Tatiana Komarova, Charles Manski, Robert Moffitt, Denis Nekipelov, Aaron Sojourner, and Bruce Spencer. 2022.
“Balancing Data Privacy and Usability in the Federal Statistical System.” Proceedings of the National Academy of Sciences 119 (31): 1–10.
https://doi.org/10.1073/pnas.2104906119.
Izrailev, Sergei. 2022.
tictoc: Functions for Timing R Scripts, as Well as Implementations of "Stack" and "List" Structures.
https://CRAN.R-project.org/package=tictoc.
Kenny, Christopher T., Shiro Kuriwaki, Cory McCartan, Evan T. R. Rosenman, Tyler Simko, and Kosuke Imai. 2021.
“The use of differential privacy for census data and its impact on redistricting: The case of the 2020 U.S. Census.” Science Advances 7 (41).
https://doi.org/10.1126/sciadv.abk3283.
———. 2022.
“Comment: The Essential Role of Policy Evaluation for the 2020 Census Disclosure Avoidance System.” Harvard Data Science Review.
https://doi.org/10.48550/arXiv.2210.08383.
Knuth, Donald. 1998. Art of Computer Programming, Volume 2: Seminumerical Algorithms. 2nd ed.
Koenecke, Allison, and Hal Varian. 2020.
“Synthetic Data Generation for Economists.” https://arxiv.org/abs/2011.01374.
Kuriwaki, Shiro, Will Beasley, and Thomas Leeper. 2022. dataverse: R Client for Dataverse 4+ Repositories.
Lewis, Crystal. 2023.
Data Management in Large-Scale Education Research.
https://datamgmtinedresearch.com/index.html.
Lima, Renato de, Oliver Phillips, Alvaro Duque, Sebastian Tello, Stuart Davies, Alexandre Adalardo de Oliveira, Sandra Muller, et al. 2022.
“Making Forest Data Fair and Open.” Nature Ecology & Evolution 6 (April): 656–58.
https://doi.org/10.1038/s41559-022-01738-7.
Lin, Herbert. 2014. “A Proposal to Reduce Government Overclassification of Information Related to National Security.” Journal of National Security Law and Policy 7: 443–63.
Mammoliti, Anthony, Petr Smirnov, Minoru Nakano, Zhaleh Safikhani, Christopher Eeles, Heewon Seo, Sisira Kadambat Nair, et al. 2021.
“Orchestrating and Sharing Large Multimodal Data for Transparent and Reproducible Research.” Nature Communications 12 (1).
https://doi.org/10.1038/s41467-021-25974-w.
Miceli, Milagros, Julian Posada, and Tianling Yang. 2022.
“Studying up Machine Learning Data.” Proceedings of the ACM on Human-Computer Interaction 6 (January): 1–14.
https://doi.org/10.1145/3492853.
Michener, William. 2015.
“Ten Simple Rules for Creating a Good Data Management Plan.” PLoS Computational Biology 11 (10): e1004525.
https://doi.org/10.1371/journal.pcbi.1004525.
Oberski, Daniel, and Frauke Kreuter. 2020.
“Differential Privacy and Social Science: An Urgent Puzzle.” Harvard Data Science Review 2 (1).
https://doi.org/10.1162/99608f92.63a22079.
Ooms, Jeroen. 2022.
openssl: Toolkit for Encryption, Signatures and Certificates Based on OpenSSL.
https://CRAN.R-project.org/package=openssl.
Patki, Neha, Roy Wedge, and Kalyan Veeramachaneni. 2016.
“The Synthetic Data Vault.” In
2016 IEEE International Conference on Data Science and Advanced Analytics (DSAA), 399–410.
https://doi.org/10.1109/DSAA.2016.49.
Paullada, Amandalynne, Inioluwa Deborah Raji, Emily Bender, Emily Denton, and Alex Hanna. 2021.
“Data and Its (Dis)contents: A Survey of Dataset Development and Use in Machine Learning Research.” Patterns 2 (11): 100336.
https://doi.org/10.1016/j.patter.2021.100336.
Piller, Charles. 2022.
“Blots on a Field?” Science 377 (6604): 358–63.
https://doi.org/10.1126/science.ade0209.
R Special Interest Group on Databases (R-SIG-DB), Hadley Wickham, and Kirill Müller. 2022.
DBI: R Database Interface.
https://CRAN.R-project.org/package=DBI.
Richardson, Neal, Ian Cook, Nic Crane, Dewey Dunnington, Romain François, Jonathan Keane, Dragoș Moldovan-Grünfeld, Jeroen Ooms, and Apache Arrow. 2022.
arrow: Integration to Apache Arrow.
https://CRAN.R-project.org/package=arrow.
Robinson, Emily, and Jacqueline Nolis. 2020.
Build a Career in Data Science. Shelter Island: Manning Publications.
https://livebook.manning.com/book/build-a-career-in-data-science.
Ross, Casey. 2022.
“How a Decades-Old Database Became a Hugely Profitable Dossier on the Health of 270 Million Americans.” Stat, February.
https://www.statnews.com/2022/02/01/ibm-watson-health-marketscan-data/.
Ruggles, Steven, Catherine Fitch, Diana Magnuson, and Jonathan Schroeder. 2019.
“Differential Privacy and Census Data: Implications for Social and Economic Research.” AEA Papers and Proceedings 109 (May): 403–8.
https://doi.org/10.1257/pandp.20191107.
Simonsohn, Uri. 2013.
“Just Post It: The Lesson from Two Cases of Fabricated Data Detected by Statistics Alone.” Psychological Science 24 (10): 1875–88.
https://doi.org/10.1177/0956797613480366.
Suriyakumar, Vinith, Nicolas Papernot, Anna Goldenberg, and Marzyeh Ghassemi. 2021.
“Chasing Your Long Tails.” In
Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency.
Acm.
https://doi.org/10.1145/3442188.3445934.
Tang, Jun, Aleksandra Korolova, Xiaolong Bai, Xueqiang Wang, and Xiaofeng Wang. 2017.
“Privacy Loss in Apple’s Implementation of Differential Privacy on MacOS 10.12.” arXiv.
https://doi.org/10.48550/arXiv.1709.02753.
Tierney, Nicholas, and Karthik Ram. 2020.
“A Realistic Guide to Making Data Available Alongside Code to Improve Reproducibility.” https://arxiv.org/abs/2002.11626.
———. 2021.
“Common-Sense Approaches to Sharing Tabular Data Alongside Publication.” Patterns 2 (12): 100368.
https://doi.org/10.1016/j.patter.2021.100368.
Wicherts, Jelte, Marjan Bakker, and Dylan Molenaar. 2011.
“Willingness to Share Research Data Is Related to the Strength of the Evidence and the Quality of Reporting of Statistical Results.” PLoS ONE 6 (11): e26828.
https://doi.org/10.1371/journal.pone.0026828.
Wickham, Hadley. 2019.
babynames: US Baby Names 1880-2017.
https://CRAN.R-project.org/package=babynames.
———. 2022.
R Packages. 2nd ed. O’Reilly Media.
https://r-pkgs.org.
Wickham, Hadley, Maximilian Girlich, and Edgar Ruiz. 2022.
dbplyr: A “dplyr” Back End for Databases.
https://CRAN.R-project.org/package=dbplyr.
Wickham, Hadley, and Lionel Henry. 2022.
purrr: Functional Programming Tools.
https://CRAN.R-project.org/package=purrr.
Wilkinson, Mark, Michel Dumontier, IJsbrand Jan Aalbersberg, Gabrielle Appleton, Myles Axton, Arie Baak, Niklas Blomberg, et al. 2016.
“The FAIR Guiding Principles for Scientific Data Management and Stewardship.” Scientific Data 3 (1): 1–9.
https://doi.org/10.1038/sdata.2016.18.
Zhang, Susan, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, et al. 2022.
“OPT: Open Pre-Trained Transformer Language Models.” arXiv.
https://doi.org/10.48550/arXiv.2205.01068.
Zook, Matthew, Solon Barocas, danah boyd, Kate Crawford, Emily Keller, Seeta Peña Gangadharan, Alyssa Goodman, et al. 2017.
“Ten Simple Rules for Responsible Big Data Research.” PLoS Computational Biology 13 (3): e1005399.
https://doi.org/10.1371/journal.pcbi.1005399.