Top 1157 papers published in the topic of Data collection in 2016

Showing papers on "Data collection published in 2016"

Journal Article•10.1177/1049732315617444•

Sample Size in Qualitative Interview Studies: Guided by Information Power

[...]

Kirsti Malterud¹, Volkert Siersma¹, Ann Dorrit Guassora¹•Institutions (1)

10 Jul 2016-Qualitative Health Research

TL;DR: It is suggested that the size of a sample with sufficient information power depends on (a) the aim of the study, (b) sample specificity, (c) use of established theory, (d) quality of dialogue, and (e) analysis strategy.

...read moreread less

Abstract: Sample sizes must be ascertained in qualitative studies like in quantitative studies but not by the same means. The prevailing concept for sample size in qualitative studies is "saturation." Saturation is closely tied to a specific methodology, and the term is inconsistently applied. We propose the concept "information power" to guide adequate sample size for qualitative studies. Information power indicates that the more information the sample holds, relevant for the actual study, the lower amount of participants is needed. We suggest that the size of a sample with sufficient information power depends on (a) the aim of the study, (b) sample specificity, (c) use of established theory, (d) quality of dialogue, and (e) analysis strategy. We present a model where these elements of information and their relevant dimensions are related to information power. Application of this model in the planning and during data collection of a qualitative study is discussed.

...read moreread less

7,400 citations

Journal Article•10.1111/2041-210X.12577•

A protocol for conducting and presenting results of regression-type analyses

[...]

Alain F. Zuur, Elena N. Ieno

01 Jun 2016-Methods in Ecology and Evolution

TL;DR: A 10‐step protocol to streamline analysis of data that will enhance understanding of the data, the statistical models and the results, and optimize communication with the reader with respect to both the procedure and the outcomes is offered.

...read moreread less

Abstract: Summary Scientific investigation is of value only insofar as relevant results are obtained and communicated, a task that requires organizing, evaluating, analysing and unambiguously communicating the significance of data. In this context, working with ecological data, reflecting the complexities and interactions of the natural world, can be a challenge. Recent innovations for statistical analysis of multifaceted interrelated data make obtaining more accurate and meaningful results possible, but key decisions of the analyses to use, and which components to present in a scientific paper or report, may be overwhelming. We offer a 10-step protocol to streamline analysis of data that will enhance understanding of the data, the statistical models and the results, and optimize communication with the reader with respect to both the procedure and the outcomes. The protocol takes the investigator from study design and organization of data (formulating relevant questions, visualizing data collection, data exploration, identifying dependency), through conducting analysis (presenting, fitting and validating the model) and presenting output (numerically and visually), to extending the model via simulation. Each step includes procedures to clarify aspects of the data that affect statistical analysis, as well as guidelines for written presentation. Steps are illustrated with examples using data from the literature. Following this protocol will reduce the organization, analysis and presentation of what may be an overwhelming information avalanche into sequential and, more to the point, manageable, steps. It provides guidelines for selecting optimal statistical tools to assess data relevance and significance, for choosing aspects of the analysis to include in a published report and for clearly communicating information.

...read moreread less

726 citations

Journal Article•10.1177/1098214016630406•

Evaluating Bang for the Buck A Cost-Effectiveness Comparison Between Individual Interviews and Focus Groups Based on Thematic Saturation Levels

[...]

Emily Namey¹, Greg Guest¹, Kevin McKenna², Kevin McKenna¹, Mario Chen¹ - Show less +1 more•Institutions (2)

Durham University¹, Duke University²

28 Apr 2016-American Journal of Evaluation

TL;DR: In this paper, the authors performed an inductive thematic analysis of data from 40 IDIs and 40 FGs on the health-seeking behaviors of African American men (N = 350) in Durham, North Carolina.

...read moreread less

Abstract: Evaluators often use qualitative research methods, yet there is little evidence on the comparative cost-effectiveness of the two most commonly employed qualitative methods—in-depth interviews (IDIs) and focus groups (FGs). We performed an inductive thematic analysis of data from 40 IDIs and 40 FGs on the health-seeking behaviors of African American men (N = 350) in Durham, North Carolina. We used a bootstrap simulation to generate 10,000 random samples from each data set and calculated the number of data collection events necessary to reach different levels of thematic saturation. The median number of data collection events required to reach 80% and 90% saturation was 8 and 16, respectively, for IDIs and 3 and 5 for FGs. Interviews took longer but were more cost-effective at both levels. At the median, IDIs cost 20–36% less to reach thematic saturation. Evaluators can consider these empirically based cost-effectiveness data when selecting a qualitative data collection method.

...read moreread less

325 citations

Journal Article•10.1007/S12185-015-1894-X•

Introduction of Transplant Registry Unified Management Program 2 (TRUMP2): scripts for TRUMP data analyses, part I (variables other than HLA-related data).

[...]

Yoshiko Atsuta¹•Institutions (1)

Nagoya University¹

01 Jan 2016-International Journal of Hematology

TL;DR: The introduction of the Second-Generation Transplant Registry Unified Management Program (TRUMP2) is intended to improve data quality and more efficient data management, and expand possible uses of data, as it is capable of building a more complex relational database.

...read moreread less

Abstract: Collection and analysis of information on diseases and post-transplant courses of allogeneic hematopoietic stem cell transplant recipients have played important roles in improving therapeutic outcomes in hematopoietic stem cell transplantation. Efficient, high-quality data collection systems are essential. The introduction of the Second-Generation Transplant Registry Unified Management Program (TRUMP2) is intended to improve data quality and more efficient data management. The TRUMP2 system will also expand possible uses of data, as it is capable of building a more complex relational database. The construction of an accessible data utilization system for adequate data utilization by researchers would promote greater research activity. Study approval and management processes and authorship guidelines also need to be organized within this context. Quality control of processes for data manipulation and analysis will also affect study outcomes. Shared scripts have been introduced to define variables according to standard definitions for quality control and improving efficiency of registry studies using TRUMP data.

...read moreread less

308 citations

Journal Article•10.1109/TITS.2015.2480157•

Big Data for Social Transportation

[...]

Xinhu Zheng¹, Wei Chen², Pu Wang³, Dayong Shen⁴, Songhang Chen, Xiao Wang, Qingpeng Zhang⁵, Liuqing Yang⁶ - Show less +4 more•Institutions (6)

University of Minnesota¹, Zhejiang University², Central South University³, National University of Defense Technology⁴, City University of Hong Kong⁵, Colorado State University⁶

01 Mar 2016-IEEE Transactions on Intelligent Transportation Systems

TL;DR: This paper overviews data sources, analytical approaches, and application systems for social transportation, and suggests a few future research directions for this new social transportation field.

...read moreread less

Abstract: Big data for social transportation brings us unprecedented opportunities for resolving transportation problems for which traditional approaches are not competent and for building the next-generation intelligent transportation systems. Although social data have been applied for transportation analysis, there are still many challenges. First, social data evolve with time and contain abundant information, posing a crucial need for data collection and cleaning. Meanwhile, each type of data has specific advantages and limitations for social transportation, and one data type alone is not capable of describing the overall state of a transportation system. Systematic data fusing approaches or frameworks for combining social signal data with different features, structures, resolutions, and precision are needed. Second, data processing and mining techniques, such as natural language processing and analysis of streaming data, require further revolutions in effective utilization of real-time traffic information. Third, social data are connected to cyber and physical spaces. To address practical problems in social transportation, a suite of schemes are demanded for realizing big data in social transportation systems, such as crowdsourcing, visual analysis, and task-based services. In this paper, we overview data sources, analytical approaches, and application systems for social transportation, and we also suggest a few future research directions for this new social transportation field.

...read moreread less

261 citations

Journal Article•10.1136/EB-2016-102306•

What is grounded theory

[...]

Helen Noble, Gary Mitchell

01 Apr 2016-Evidence-Based Nursing

TL;DR: Grounded theory (GT) is a research method concerned with the generation of theory which is ‘grounded’ in data that has been systematically collected and analysed and used to uncover such things as social relationships and behaviours of groups, known as social processes.

...read moreread less

Abstract: Grounded theory (GT) is a research method concerned with the generation of theory,1 which is ‘grounded’ in data that has been systematically collected and analysed.2 It is used to uncover such things as social relationships and behaviours of groups, known as social processes.3 It was developed in California, USA by Glaser and Strauss during their study—‘Awareness of Dying’.1 It is a general methodology for developing theory that is grounded in data which is systematically gathered and analysed. First the area of interest is identified. Theoretical preconceptions should be avoided, although it is accepted this is difficult in practice. Analytical procedures and sampling strategies are then used and the study is finished when theoretical sampling reached5 all discussed below. Data collected may be qualitative or quantitative or a combination of both. Data collection methods often include in-depth interviews using open-ended questions. Questions can be adjusted as theory emerges. Observational methods and focus groups may also be used. Glaser and Strauss (1967) first mentioned theoretical sampling and described a process of generating theory from data which includes collecting the data, then coding …

...read moreread less

239 citations

Journal Article•10.1016/J.ISPRSJPRS.2015.11.006•

Rethinking big data: A review on the data quality and usage issues

[...]

Jianzheng Liu¹, Jie Li¹, Weifeng Li¹, Jiansheng Wu²•Institutions (2)

University of Hong Kong¹, Peking University²

01 May 2016-Isprs Journal of Photogrammetry and Remote Sensing

TL;DR: It is proposed that big data research should closely follow good scientific practice to provide reliable and scientific “stories”, as well as explore and develop techniques and methods to mitigate or rectify those ‘big-errors’ brought by big data.

...read moreread less

Abstract: The recent explosive publications of big data studies have well documented the rise of big data and its ongoing prevalence. Different types of “big data” have emerged and have greatly enriched spatial information sciences and related fields in terms of breadth and granularity. Studies that were difficult to conduct in the past time due to data availability can now be carried out. However, big data brings lots of “big errors” in data quality and data usage, which cannot be used as a substitute for sound research design and solid theories. We indicated and summarized the problems faced by current big data studies with regard to data collection, processing and analysis: inauthentic data collection, information incompleteness and noise of big data, unrepresentativeness, consistency and reliability, and ethical issues. Cases of empirical studies are provided as evidences for each problem. We propose that big data research should closely follow good scientific practice to provide reliable and scientific “stories”, as well as explore and develop techniques and methods to mitigate or rectify those ‘big-errors’ brought by big data.

...read moreread less

236 citations

Monograph•10.4135/9781483391700•

The Practice of Survey Research: Theory and Applications

[...]

Erin Ruel, William E. Wagner, Brian Joseph Gillespie

1 Jan 2016

TL;DR: Data Archiving Seeking External Funding for Your Survey Project Archiving Original and Final Data Data Format Archiving and Making Data Publicly Available Archives

...read moreread less

Abstract: Part I: Before the Survey Chapter 1: Introduction to Survey Research What is survey research? How do you know if a survey is good? Surveys about Teachers: Are They Good Teachers? Applications of Survey Research Technology and Survey Research The Ethics of Survey Research Key Decisions about Survey Research: What's Ahead Part II: Questionnaire Design Chapter 2: Types of Surveys Omnibus Surveys Administration Importance of Proper Training of Interviewers Types of Surveys Chapter 3: The Survey Instrument The Cover Letter The Survey Instrument Appearance, Formatting, and Design Chapter 4: Survey Question Construction Concept Measurement: Traits, Assessments, and Sentiments Complex Concepts Question Development Responses to Questions Measurement Error Chapter 5: Validity and Reliability Reliability Correlation Types of Reliability and Estimation Validity Chapter 6: Pre-testing and Pilot Testing Pre-testing Pilot Testing Pre-test and Pilot Test Limitations Part III: Implementing a Survey Chapter 7: Selecting a Sample: Probability Sampling Sampling Terminology Sampling Theory Probability Sampling Techniques Chapter 8: Non-Probability Sampling and Sampling Hard to Find Populations Convenience Sampling Quota Sampling Purposive Sampling Sampling Hard to Find Populations Chapter 9: Improving Response Rates and Retention Response Rates and Non-response Errors Non-response Bias Attrition in Panel Studies Methods to Increase Participant Contact and Participant Cooperation Chapter 10: Technologies to Develop and Implement Surveys Mail Survey Administration Computer-assisted Interviewing Post-survey Data Entry and Data Cleaning Technology Contracting with a Research Center to Conduct the Survey Chapter 11: Data Collection The Self-administered Cross-sectional Mailed Survey The Self-administered Longitudinal Panel Mailed Survey Interviewer-administered Surveys Self-administered Web Surveys Comparing Survey Types Storage Needs Part IV: Post-Survey Data Management and Analysis Chapter 12: Data Entry Data Entry Documentation Chapter 13: Data Cleaning Simple Cross-sectional Data Cleaning Cosmetic Cleaning Skip Patterns Multiple-response Questions, Other Specify and Open-ended Questions Cleaning for Diagnostics Interviewer Effects/Mode Effects Cleaning Longitudinal Data The Codebook Chapter 14: Data Analysis for a Policy Report Policy Reports Descriptive Statistics Analysis for a Policy Report Summary The Report Write-up Chapter 15: More Advanced Data Analysis Explanatory Research Questions Journal Article Format Regression Analysis Addressing Missing Data OLS Regressions of the Positive Relocation Scale Creating Tables for the Journal Article and Writing-up the Results Chapter 16: Data Archiving Seeking External Funding for Your Survey Project Archiving Original and Final Data Data Format Archiving and Making Data Publicly Available Archives Epilogue Survey Administration checklist Survey Design and Organization Checklist Writing Good Questions Checklist Piloting or Pre-Testing the Survey Checklist Choosing a Sample Checklist Improving Response Rates Checklist

...read moreread less

231 citations

Journal Article•10.1016/J.JCLINEPI.2016.08.003•

Registry-based randomized controlled trials- what are the advantages, challenges, and areas for future research?

[...]

Guowei Li¹, Guowei Li², Tolulope T. Sajobi³, Bijoy K Menon³, Lawrence Korngut³, Mark Lowerison³, Matthew T. James³, Stephen B. Wilton⁴, Tyler Williamson³, Stephanie J. Gill³, Lauren L. Drogos⁴, Eric E. Smith³, Sunita Vohra⁵, Michael D. Hill³, Lehana Thabane², Lehana Thabane¹ - Show less +12 more•Institutions (5)

McMaster University¹, St. Joseph's Healthcare Hamilton², University of Calgary³, Libin Cardiovascular Institute of Alberta⁴, University of Alberta⁵

01 Dec 2016-Journal of Clinical Epidemiology

TL;DR: The advantages, challenges, and areas for future research related to registry-based randomized controlled trials are summarized.

...read moreread less

217 citations

Journal Article•

Using Microsoft Excel to code and thematically analyse qualitative data: a simple, cost-effective approach.

[...]

Ronan T. Bree¹, Gerry Gallagher¹•Institutions (1)

Dundalk Institute of Technology¹

30 Jun 2016-AISHE-J: The All Ireland Journal of Teaching and Learning in Higher Education

TL;DR: This report describes in detail the qualitative data analysis process designed and implemented by the two authors, and hopes it will assist educators enrolling on learning and teaching courses, or those performing research projects in the area who are considering employing qualitative evaluation methods.

...read moreread less

Abstract: As the number of learning and teaching continuing professional development (CPD) courses increases in Higher Education Institutions (HEIs), so too does the accompanying number of learning innovations being implemented and evaluated. The evaluation process requires valid and reliable data collection and analysis procedures to be established. In many cases, qualitative methods such as interviews, focus groups and free-text responses are employed for this purpose. These methods generate large volumes of data, which must be coded and analysed in a thorough and professional manner. While commercial software packages can assist in this analysis, in a difficult economic climate, the cost of campus-wide licenses for such can be quite prohibitive. In a recent publication aimed at enhancing the learning environment in practical sessions, Bree et al. (2014) implemented a simple, cost-effective technology-based analysis of captured focus group data with a widely used software suite. This report describes in detail the qualitative data analysis process designed and implemented by the two authors (a link to a screencast outlining the method is also provided). Ensuring data analysis processes are performed correctly will generate valid data, leading to an increase in the number of peer-reviewed publications describing learning and teaching innovations; with each of these ultimately enhancing the learning environment for students and developing higher-quality graduates. It is hoped this report will assist educators enrolling on learning and teaching courses, or those performing research projects in the area who are considering employing qualitative evaluation methods.

...read moreread less

215 citations

Journal Article•10.1080/01431161.2015.1117684•

Using Twitter for tasking remote-sensing data collection and damage assessment: 2013 Boulder flood case study

[...]

Guido Cervone¹, Elena Sava¹, Qunying Huang², Emily Schnebele¹, Jeff Harrison, Nigel Waters¹ - Show less +2 more•Institutions (2)

Pennsylvania State University¹, University of Wisconsin-Madison²

01 Jan 2016-Journal of remote sensing

TL;DR: In this article, a new methodology is introduced that leverages data harvested from social media for tasking the collection of remote-sensing imagery during disasters or emergencies, which is valuable in situations where environmental hazards such as hurricanes or severe weather affect very large areas.

...read moreread less

Abstract: A new methodology is introduced that leverages data harvested from social media for tasking the collection of remote-sensing imagery during disasters or emergencies. The images are then fused with multiple sources of contributed data for the damage assessment of transportation infrastructure. The capability is valuable in situations where environmental hazards such as hurricanes or severe weather affect very large areas. During these types of disasters it is paramount to ‘cue’ the collection of remote-sensing images to assess the impact of fast-moving and potentially life-threatening events. The methodology consists of two steps. First, real-time data from Twitter are monitored to prioritize the collection of remote-sensing images for evolving disasters. Commercial satellites are then tasked to collect high-resolution images of these areas. Second, a damage assessment of transportation infrastructure is carried out by fusing the tasked images with contributed data harvested from social media such as Flickr and Twitter, and any additional available data. To demonstrate its feasibility, the proposed methodology is applied and tested on the 2013 Colorado floods with a special emphasis in Boulder County and the cities of Boulder and Longmont.

...read moreread less

Journal Article•10.1016/J.APENERGY.2016.08.079•

Estimation of the building energy use intensity in the urban scale by integrating GIS and big data technology

[...]

Jun Ma¹, Jack Chin Pang Cheng¹•Institutions (1)

Hong Kong University of Science and Technology¹

01 Dec 2016-Applied Energy

TL;DR: A geographic information system integrated data mining methodology framework for estimating the building EUI in the urban scale, including preprocessing, feature selection, and algorithm optimization is proposed.

...read moreread less

Journal Article•10.1007/S11528-015-0014-3•

Examining Current Beliefs, Practices and Barriers about Technology Integration: A Case Study.

[...]

Pi-Sui Hsu¹•Institutions (1)

Northern Illinois University¹

16 Jan 2016-Techtrends

TL;DR: The authors examined the current beliefs, practices and barriers concerning technology integration of Kindergarten through Grade Six teachers in the midwestern United States, and found that a majority of the teachers held constructivist pedagogical beliefs about technology integration.

...read moreread less

Abstract: The purpose of this mixed-methods study was to examine the current beliefs, practices and barriers concerning technology integration of Kindergarten through Grade Six teachers in the midwestern United States. The three data collection methods were online surveys with 152 teachers as well as interviews and observations with 8 teachers. The findings indicated that a majority of the teachers held constructivist pedagogical beliefs about technology integration. This study found that the teachers who held constructivist pedagogical beliefs about technology use had high self-efficacy beliefs about technology use, placed positive value on the use of technology, and had two or more practices of high-level learning in their lessons. Language arts was the subject that gained the most attention for technology integration. Four barriers were students’ lack of computer skills, teachers’ lack of training in technology, teachers’ lack of time to implement technology-integrated lessons, and teachers’ lack of technical support.

...read moreread less

Sampling data and data collection in qualitative research

[...]

Dean Whitehead¹, Lisa Whitehead•Institutions (1)

Australian National University¹

1 Jan 2016

Journal Article•10.2196/JMIR.4738•

Garbage in, Garbage Out: Data Collection, Quality Assessment and Reporting Standards for Social Media Data Use in Health Research, Infodemiology and Digital Disease Detection

[...]

Yoonsang Kim¹, Jidong Huang¹, Sherry Emery•Institutions (1)

University of Illinois at Chicago¹

26 Feb 2016-Journal of Medical Internet Research

TL;DR: A conceptual framework for the filtering and quality evaluation of social data that addresses several common challenges and moves toward establishing a standard of reporting social data is set forth.

...read moreread less

Abstract: Background: Social media have transformed the communications landscape. People increasingly obtain news and health information online and via social media. Social media platforms also serve as novel sources of rich observational data for health research (including infodemiology, infoveillance, and digital disease detection detection). While the number of studies using social data is growing rapidly, very few of these studies transparently outline their methods for collecting, filtering, and reporting those data. Keywords and search filters applied to social data form the lens through which researchers may observe what and how people communicate about a given topic. Without a properly focused lens, research conclusions may be biased or misleading. Standards of reporting data sources and quality are needed so that data scientists and consumers of social media research can evaluate and compare methods and findings across studies. Objective: We aimed to develop and apply a framework of social media data collection and quality assessment and to propose a reporting standard, which researchers and reviewers may use to evaluate and compare the quality of social data across studies. Methods: We propose a conceptual framework consisting of three major steps in collecting social media data: develop, apply, and validate search filters. This framework is based on two criteria: retrieval precision (how much of retrieved data is relevant) and retrieval recall (how much of the relevant data is retrieved). We then discuss two conditions that estimation of retrieval precision and recall rely on—accurate human coding and full data collection—and how to calculate these statistics in cases that deviate from the two ideal conditions. We then apply the framework on a real-world example using approximately 4 million tobacco-related tweets collected from the Twitter firehose. Results: We developed and applied a search filter to retrieve e-cigarette–related tweets from the archive based on three keyword categories: devices, brands, and behavior. The search filter retrieved 82,205 e-cigarette–related tweets from the archive and was validated. Retrieval precision was calculated above 95% in all cases. Retrieval recall was 86% assuming ideal conditions (no human coding errors and full data collection), 75% when unretrieved messages could not be archived, 86% assuming no false negative errors by coders, and 93% allowing both false negative and false positive errors by human coders. Conclusions: This paper sets forth a conceptual framework for the filtering and quality evaluation of social data that addresses several common challenges and moves toward establishing a standard of reporting social data. Researchers should clearly delineate data sources, how data were accessed and collected, and the search filter building process and how retrieval precision and recall were calculated. The proposed framework can be adapted to other public social media platforms. [J Med Internet Res 2016;18(2):e41]

...read moreread less

Journal Article•10.1371/JOURNAL.PONE.0150652•

Feature Selection via Chaotic Antlion Optimization.

[...]

Hossam M. Zawbaa¹, Hossam M. Zawbaa², Eid Emary³, Eid Emary⁴, Crina Grosan⁵, Crina Grosan² - Show less +2 more•Institutions (5)

Beni-Suef University¹, Babeș-Bolyai University², Arab Open University³, Cairo University⁴, Brunel University London⁵

10 Mar 2016-PLOS ONE

TL;DR: An optimization approach for the feature selection problem that considers a “chaotic” version of the antlion optimizer method, a nature-inspired algorithm that mimics the hunting mechanism of antlions in nature to improve the tradeoff between exploration and exploitation.

...read moreread less

Abstract: This work was partially supported by the IPROCOM Marie Curie initial training network, funded through the People Programme (Marie Curie Actions) of the European Union’s Seventh Framework Programme FP7/2007-2013/ under REA grants agreement No. 316555, and by the Romanian National Authority for Scientific Research, CNDIUEFISCDI, project number PN-II-PT-PCCA-2011-3.2- 0917. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.

...read moreread less

Proceedings Article•

Using Randomized Response for Differential Privacy Preserving Data Collection

[...]

Yue Wang¹, Xintao Wu², Donghui Hu³•Institutions (3)

University of North Carolina at Charlotte¹, University of Arkansas², Hefei University of Technology³

1 Jan 2016

TL;DR: This paper studies how to enforce differential privacy by using the randomized response in the data collection scenario and theoretically derive the explicit formula of the mean squared error of various derived statistics based on the randomized responded theory and proves the randomizedresponse outperforms the Laplace mechanism.

...read moreread less

Abstract: This paper studies how to enforce differential privacy by using the randomized response in the data collection scenario. Given a client’s value, the randomized algorithm executed by the client reports to the untrusted server a perturbed value. The use of randomized response in surveys enables easy estimations of accurate population statistics while preserving the privacy of the individual respondents. We compare the randomized response with the standard Laplace mechanism which is based on query-output independent adding of Laplace noise. Our research starts from the simple case with one single binary attribute and extends to the general case with multiple polychotomous attributes. We measure utility preservation in terms of the mean squared error of the estimate for various calculations including individual value estimate, proportion estimate, and various derived statistics. We theoretically derive the explicit formula of the mean squared error of various derived statistics based on the randomized response theory and prove the randomized response outperforms the Laplace mechanism. We evaluate our algorithms on YesiWell database including sensitive biomarker data and social network relationships of patients. Empirical evaluation results show effectiveness of our proposed techniques. Especially the use of the randomized response for collecting data incurs fewer utility loss than the output perturbation when the sensitivity of functions is high.

...read moreread less

Journal Article•10.1109/ACCESS.2016.2647619•

Analyzing Healthcare Big Data With Prediction for Future Health Condition

[...]

Prasan Kumar Sahoo¹, Suvendu Kumar Mohapatra¹, Shih-Lin Wu¹•Institutions (1)

Chang Gung University¹

01 Jan 2016-IEEE Access

TL;DR: A probabilistic data collection mechanism is designed and the correlation analysis of those collected data is performed and a stochastic prediction model is designed to foresee the future health condition of the most correlated patients based on their current health status.

...read moreread less

Abstract: In healthcare management, a large volume of multi-structured patient data is generated from the clinical reports, doctor's notes, and wearable body sensors. The analysis of healthcare parameters and the prediction of the subsequent future health conditions are still in the informative stage. A cloud-enabled big data analytic platform is the best way to analyze the structured and unstructured data generated from healthcare management systems. In this paper, a probabilistic data collection mechanism is designed and the correlation analysis of those collected data is performed. Finally, a stochastic prediction model is designed to foresee the future health condition of the most correlated patients based on their current health status. Performance evaluation of the proposed protocols is realized through extensive simulations in the cloud environment, which gives about 98% accuracy of prediction, and maintains 90% of CPU and bandwidth utilization to reduce the analysis time.

...read moreread less

Journal Article•10.1080/1743727X.2014.979146•

Handling missing data: analysis of a challenging data set using multiple imputation

[...]

Maria Pampaka¹, Graeme Hutcheson¹, Julian Williams¹•Institutions (1)

University of Manchester¹

02 Jan 2016-International Journal of Research & Method in Education

TL;DR: This paper illustrates the issues with an educational, longitudinal survey in which missing data was significant, but for which the author was able to collect much of these missing data through subsequent data collection, with the model from the actual enhanced sample.

...read moreread less

Abstract: Missing data is endemic in much educational research. However, practices such as step-wise regression common in the educational research literature have been shown to be dangerous when significant data are missing, and multiple imputation (MI) is generally recommended by statisticians. In this paper, we provide a review of these advances and their implications for educational research. We illustrate the issues with an educational, longitudinal survey in which missing data was significant, but for which we were able to collect much of these missing data through subsequent data collection. We thus compare methods, that is, step-wise regression (basically ignoring the missing data) and MI models, with the model from the actual enhanced sample. The value of MI is discussed and the risks involved in ignoring missing data are considered. Implications for research practice are discussed.

...read moreread less

Journal Article•10.1016/J.CHB.2016.01.016•

Online versus in-person interviews with adolescents

[...]

Jennifer D. Shapka¹, José F. Domene², Shereen Khan¹, Leigh Yang¹•Institutions (2)

University of British Columbia¹, University of New Brunswick²

01 May 2016-Computers in Human Behavior

TL;DR: Results indicated that interviews conducted online produced fewer words and took longer to complete, and involved more rapport-building, however, there were no mean differences in the level of self-disclosure and the formality of the interviews.

...read moreread less

Journal Article•10.4300/JGME-D-16-00098.1•

Design: Selection of Data Collection Methods

[...]

Elise Paradis, Bridget C. O’Brien, Laura Nimmon, Glen Bandiera, Maria Athina Martimianakis - Show less +1 more

02 May 2016-Journal of Graduate Medical Education

TL;DR: Imagine that residents in your program have been less than complimentary about interprofessional rounds (IPRs) and the program director asks you to determine what residents are learning about in collaboration with other health professionals during IPRs.

...read moreread less

Abstract: Qualitative research is often employed when there is a problem and no clear solutions exist, as in the case above that elicits the following questions: Why are residents complaining about rounds? How could we make rounds better? In this context, collecting “good” information or words (qualitative data) is intended to produce information that helps you to answer your research questions, capture the phenomenon of interest, and account for context and the rich texture of the human experience. You may also aim to challenge previous thinking and invite further inquiry. Coherence or alignment between all aspects of the research project is essential. In this Rip Out we focus on data collection, but in qualitative research, the entire project must be considered.1,2 Careful design of the data collection phase requires the following: deciding who will do what, where, when, and how at the different stages of the research process; acknowledging the role of the researcher as an instrument of data collection; and carefully considering the context studied and the participants and informants involved in the research.

...read moreread less

Journal Article•10.1080/13645579.2014.957069•

Participant recruitment and data collection through Facebook: the role of personality factors

[...]

Sean C. Rife¹, Kelly L. Cate², Michal Kosinski³, David Stillwell³•Institutions (3)

Murray State University¹, University of North Georgia², University of Cambridge³

02 Jan 2016-International Journal of Social Research Methodology

TL;DR: In this article, the authors compared demographic and personality data collected over Facebook with data collected through a standalone website, and data collected from college undergraduates at two universities, and found statistically significant differences exist between Facebook data and the comparison data-sets, but since 80%...

...read moreread less

Abstract: As participant recruitment and data collection over the Internet have become more common, numerous observers have expressed concern regarding the validity of research conducted in this fashion. One growing method of conducting research over the Internet involves recruiting participants and administering questionnaires over Facebook, the world’s largest social networking service. If Facebook is to be considered a viable platform for social research, it is necessary to demonstrate that Facebook users are sufficiently heterogeneous and that research conducted through Facebook is likely to produce results that can be generalized to a larger population. The present study examines these questions by comparing demographic and personality data collected over Facebook with data collected through a standalone website, and data collected from college undergraduates at two universities. Results indicate that statistically significant differences exist between Facebook data and the comparison data-sets, but since 80% ...

...read moreread less

Journal Article•10.1109/TVCG.2015.2467671•

Beyond Weber's Law: A Second Look at Ranking Visualizations of Correlation

[...]

Matthew Kay¹, Jeffrey Heer¹•Institutions (1)

University of Washington¹

31 Jan 2016-IEEE Transactions on Visualization and Computer Graphics

TL;DR: This model deviates from Weber's Law, but provides improved predictive accuracy and generalization, and it is found that compared to other visualizations, scatterplots are unique in combining low variance between individuals and high precision on both positively- and negatively correlated data.

...read moreread less

Abstract: Models of human perception – including perceptual “laws” – can be valuable tools for deriving visualization design recommendations. However, it is important to assess the explanatory power of such models when using them to inform design. We present a secondary analysis of data previously used to rank the effectiveness of bivariate visualizations for assessing correlation (measured with Pearson's r) according to the well-known Weber-Fechner Law. Beginning with the model of Harrison et al. [1], we present a sequence of refinements including incorporation of individual differences, log transformation, censored regression, and adoption of Bayesian statistics. Our model incorporates all observations dropped from the original analysis, including data near ceilings caused by the data collection process and entire visualizations dropped due to large numbers of observations worse than chance. This model deviates from Weber's Law, but provides improved predictive accuracy and generalization. Using Bayesian credibility intervals, we derive a partial ranking that groups visualizations with similar performance, and we give precise estimates of the difference in performance between these groups. We find that compared to other visualizations, scatterplots are unique in combining low variance between individuals and high precision on both positively- and negatively-correlated data. We conclude with a discussion of the value of data sharing and replication, and share implications for modeling similar experimental data.

...read moreread less

Journal Article•10.1109/CC.2016.7405730•

A secure-efficient data collection algorithm based on self-adaptive sensing model in mobile Internet of vehicles

[...]

Liang Wei¹, Ruan Zhiqiang², Tang Mingdong³, Li Peng³•Institutions (3)

Xiamen University of Technology¹, Minjiang University², Hunan University of Science and Technology³

12 Feb 2016-China Communications

TL;DR: The simulation and experiments show that the vehicular node can realize secure and real-time data collection and the proposed algorithm is superior in vehicular network life cycle, power consumption and reliability of data collection by comparing to other algorithms.

...read moreread less

Abstract: Existing research on data collection using wireless mobile vehicle network emphasizes the reliable delivery of information. However, other performance requirements such as life cycle of nodes, stability and security are not set as primary design objectives. This makes data collection ability of vehicular nodes in real application environment inferior. By considering the features of nodes in wireless IoV, such as large scales of deployment, volatility and low time delay, an efficient data collection algorithm is proposed for mobile vehicle network environment. An adaptive sensing model is designed to establish vehicular data collection protocol. The protocol adopts group management in model communication. The vehicular sensing node in group can adjust network sensing chain according to sensing distance threshold with surrounding nodes. It will dynamically choose a combination of network sensing chains on basis of remaining energy and location characteristics of surrounding nodes. In addition, secure data collection between sensing nodes is undertaken as well. The simulation and experiments show that the vehicular node can realize secure and real-time data collection. Moreover, the proposed algorithm is superior in vehicular network life cycle, power consumption and reliability of data collection by comparing to other algorithms.

...read moreread less

Journal Article•10.1177/1740774516653238•

Improving the value of clinical research through the use of Common Data Elements

[...]

Jerry Sheehan¹, Steven Hirschfeld¹, Erin D. Foster¹, Udi E. Ghitza², Kerry Goetz¹, Joanna Karpinski¹, Lisa Lang¹, Richard P. Moser¹, Joanne Odenkirchen¹, Dianne Reeves¹, Yaffa R. Rubinstein¹, Ellen M. Werner¹, Michael F. Huerta¹ - Show less +9 more•Institutions (2)

National Institutes of Health¹, National Institute on Drug Abuse²

15 Jun 2016-Clinical Trials

TL;DR: The use of Common Data Elements can facilitate cross-study comparisons, data aggregation, and meta-analyses; simplify training and operations; improve overall efficiency; promote interoperability between different systems; and improve the quality of data collection.

...read moreread less

Abstract: The use of Common Data Elements can facilitate cross-study comparisons, data aggregation, and meta-analyses; simplify training and operations; improve overall efficiency; promote interoperability between different systems; and improve the quality of data collection. A Common Data Element is a combination of a precisely defined question (variable) paired with a specified set of responses to the question that is common to multiple datasets or used across different studies. Common Data Elements, especially when they conform to accepted standards, are identified by research communities from variable sets currently in use or are newly developed to address a designated data need. There are no formal international specifications governing the construction or use of Common Data Elements. Consequently, Common Data Elements tend to be made available by research communities on an empiric basis. Some limitations of Common Data Elements are that there may still be differences across studies in the interpretation and implementation of the Common Data Elements, variable validity in different populations, and inhibition by some existing research practices and the use of legacy data systems. Current National Institutes of Health efforts to support Common Data Element use are linked to the strengthening of National Institutes of Health Data Sharing policies and the investments in data repositories. Initiatives include cross-domain and domain-specific resources, construction of a Common Data Element Portal, and establishment of trans-National Institutes of Health working groups to address technical and implementation topics. The National Institutes of Health is seeking to lower the barriers to Common Data Element use through greater awareness and encourage the culture change necessary for their uptake and use. As National Institutes of Health, other agencies, professional societies, patient registries, and advocacy groups continue efforts to develop and promote the responsible use of Common Data Elements, particularly if linked to accepted data standards and terminologies, continued engagement with and feedback from the research community will remain important.

...read moreread less

Journal Article•10.1186/S13643-016-0368-4•

Data extraction for complex meta-analysis (DECiMAL) guide

[...]

Hugo Pedder¹, Grammati Sarri, Edna Keeney², Vanessa Delgado Nunes¹, Sofia Dias² - Show less +1 more•Institutions (2)

Royal College of Obstetricians and Gynaecologists¹, University of Bristol²

13 Dec 2016-Systematic Reviews

TL;DR: This guide (data extraction for complex meta-analysis (DECiMAL)) suggests a number of points to consider when collecting data, primarily aimed at systematic reviewers preparing data forMeta-analysis.

...read moreread less

Abstract: As more complex meta-analytical techniques such as network and multivariate meta-analyses become increasingly common, further pressures are placed on reviewers to extract data in a systematic and consistent manner. Failing to do this appropriately wastes time, resources and jeopardises accuracy. This guide (data extraction for complex meta-analysis (DECiMAL)) suggests a number of points to consider when collecting data, primarily aimed at systematic reviewers preparing data for meta-analysis. Network meta-analysis (NMA), multiple outcomes analysis and analysis combining different types of data are considered in a manner that can be useful across a range of data collection programmes. The guide has been shown to be both easy to learn and useful in a small pilot study.

...read moreread less

Journal Article•10.1177/1094428116633502•

Understanding Relative and Absolute Change in Discontinuous Growth Models Coding Alternatives and Implications for Hypothesis Testing

[...]

Paul D. Bliese¹, Jonas W. B. Lang²•Institutions (2)

University of South Carolina¹, Ghent University²

02 Mar 2016-Organizational Research Methods

TL;DR: In this article, the authors build off the basic discontinuous growth model and illustrate how alternative specifications of time-related variables allow one to examine relative versus absolute change in transition and post-transition slopes.

...read moreread less

Abstract: Organizational researchers routinely have access to repeated measures from numerous time periods punctuated by one or more discontinuities. Discontinuities may be planned, such as when a researcher introduces an unexpected change in the context of a skill acquisition task. Alternatively, discontinuities may be unplanned, such as when a natural disaster or economic event occurs during an ongoing data collection. In this article, we build off the basic discontinuous growth model and illustrate how alternative specifications of time-related variables allow one to examine relative versus absolute change in transition and post-transition slopes. Our examples focus on interpreting time-varying covariates in a variety of situations (multiple discontinuities, linear and quadratic models, and models where discontinuities occur at different times). We show that the ability to test relative and absolute differences provides a high degree of precision in terms of specifying and testing hypotheses.

...read moreread less

Journal Article•10.1002/MAR.20876•

Relationship Quality in Business to Business Relationships—Reviewing the Current Literatures and Proposing a New Measurement Model

[...]

Zhizhong Jiang¹, Eric Shiu¹, Stephan C. Henneberg², Peter Naudé³•Institutions (3)

University of Birmingham¹, Queen Mary University of London², University of Manchester³

01 Apr 2016-Psychology & Marketing

TL;DR: In this article, the authors present a comprehensive review on the measures of relationship quality, and propose the CLOSES scale as a new monitoring tool, which reflects the intensity of communication, long-term orientation, and social and economic satisfaction of a focal actor in a business relationship.

...read moreread less

Abstract: Relationship quality is a central theme in business to business relationships, and it is becoming increasingly important from a theoretical as well as practical perspective to understand and monitor relationship quality. Despite its pivotal role, measurement issues of relationship quality have not been systematically investigated, confounded by a lack of consensus on the dimensions and contents of this construct. This paper presents a comprehensive review on the measures of relationship quality, and proposes the CLOSES scale as a new monitoring tool. This higher order, multidimensional scale reflects the intensity of communication (C), long-term orientation (LO), and social and economic satisfaction (SES) of a focal actor in a business relationship. Tested with data collected from 404 construction companies and cross-checked with a second round of data collection from 201 companies in other various industries, using partially multiple respondents, this new scale shows good reliability, convergent, discriminant, and nomological validity, as well as cross-industry transferability. Thus, future academic research as well as practical management of business relationships is enriched by providing a valid and reliable tool that is not tied to a specific industry setting, to capture the important construct of relationship quality.

...read moreread less

Journal Article•10.3758/S13428-015-0632-X•

Online versus offline: The Web as a medium for response time data collection

[...]

Andrey Chetverikov¹, Philipp Upravitelev•Institutions (1)

Saint Petersburg State University¹

01 Sep 2016-Behavior Research Methods

TL;DR: The findings show that Web-based experiments are an acceptable source of RT data, comparable to a common keyboard-based setup in the laboratory, though the high diversity of browsers, operating systems, and CPU performance may have a detrimental effect, though it can be compensated for by increased sample sizes and trial numbers.

...read moreread less

Abstract: The Internet provides a convenient environment for data collection in psychology. Modern Web programming languages, such as JavaScript or Flash (ActionScript), facilitate complex experiments without the necessity of experimenter presence. Yet there is always a question of how much noise is added due to the differences between the setups used by participants and whether it is compensated for by increased ecological validity and larger sample sizes. This is especially a problem for experiments that measure response times (RTs), because they are more sensitive (and hence more susceptible to noise) than, for example, choices per se. We used a simple visual search task with different set sizes to compare laboratory performance with Web performance. The results suggest that although the locations (means) of RT distributions are different, other distribution parameters are not. Furthermore, the effect of experiment setting does not depend on set size, suggesting that task difficulty is not important in the choice of a data collection method. We also collected an additional online sample to investigate the effects of hardware and software diversity on the accuracy of RT data. We found that the high diversity of browsers, operating systems, and CPU performance may have a detrimental effect, though it can partly be compensated for by increased sample sizes and trial numbers. In sum, the findings show that Web-based experiments are an acceptable source of RT data, comparable to a common keyboard-based setup in the laboratory.

...read moreread less

Journal Article•10.1177/1049732315591150•

Deliberative Discussion Focus Groups

[...]

Erin Rothwell¹, Rebecca Anderson¹, Jeffrey R. Botkin¹•Institutions (1)

University of Utah¹

01 May 2016-Qualitative Health Research

TL;DR: A new approach for the conduct of focus groups in health research is discussed, identifying ways to educate and inform participants about the topic of interest prior to the focus group discussion to promote more quality data from informed opinions.

...read moreread less

Abstract: This article discusses a new approach for the conduct of focus groups in health research. Identifying ways to educate and inform participants about the topic of interest prior to the focus group discussion can promote more quality data from informed opinions. Data on this deliberative discussion approach are provided from research within three federally funded studies. As healthcare continues to improve from scientific and technological advancements, educating the research participants prior to data collection about these complexities is essential to gather quality data.

...read moreread less

...

Expand