# INFERRING AND REVISING THEORIES WITH CONFIDENCE: ANALYZING BILINGUALISM IN THE 1901 CANADIAN CENSUS

@article{Drummond2006INFERRINGAR, title={INFERRING AND REVISING THEORIES WITH CONFIDENCE: ANALYZING BILINGUALISM IN THE 1901 CANADIAN CENSUS}, author={Chris Drummond and Stan Matwin and Chad Gaffield}, journal={Applied Artificial Intelligence}, year={2006}, volume={20}, pages={1 - 33} }

This paper shows how machine learning can help in analyzing and understanding historical change. Using data from the Canadian census of 1901, we discover the influences on bilingualism in Canada at the beginning of the last century. The discovered theories partly agree with, and partly complement, the existing views of historians on this question. Our approach, based around a decision tree, not only infers theories directly from data, but also evaluates existing theories and revises them to… Expand

#### Topics from this paper

#### 7 Citations

IMPACT OF HIGH-LEVEL KNOWLEDGE ON ECONOMIC WELFARE THROUGH INTERACTIVE DATA MINING

- Computer Science
- Appl. Artif. Intell.
- 2011

A novel algorithm for finding the most important relations with the use of data mining based on interactive data mining, specialized for the analysis of macroeconomic data that often contains incomplete and noisy attributes and, initially, complex relations. Expand

What (not) to expect when classifying rare events

- Computer Science, Medicine
- Briefings Bioinform.
- 2018

It is proved that in the balanced case, where there is equal proportion of events and non-events, any classifier that satisfies one of these constraints will always satisfy all. Expand

Inner Ensembles: Using Ensemble Methods in Learning Step

- Computer Science
- 2014

The results show that the overall performance of Inner Ensembles is significantly better than the original methods, and Inner Ensembleles provide similar performance improvements as regular ensembles. Expand

Statistical Inference, Learning and Models in Big Data

- Computer Science, Mathematics
- ArXiv
- 2015

An overview of the topics covered is given, describing challenges and strategies that seem common to many different areas of application and including some examples of applications to make these challenges and Strategies more concrete. Expand

'A Flag that Knows No Colour Line': Aboriginal Veteranship in Canada, 1914-1939

- History
- 2017

Historians have rightly considered the period from 1914 to 1939 as the time when Canadian Indigenous soldiers and veterans of the First World War faced unique challenges because of their legal status… Expand

The Class Imbalance Problem

- Computer Science
- 2017

It is shown that there exist a wide range of real-world applications involving extremely skewed (imbalanced) data sets and the class imbalance problem stems from the fact that the class of interest occupies only a negligible volume within the complete pattern space. Expand

Real Time Robot Policy Adaptation Based on Intelligent Algorithms

- Computer Science
- EANN/AIAI
- 2011

Results show that evolution can generate an optimal relation between the robot performance and exploration-exploitation of reinforcement learning, enabling the robot to adapt online its strategy as the environment conditions change. Expand

#### References

SHOWING 1-10 OF 32 REFERENCES

Inferring and Revising Theories with Confidence: Analyzing the 1901 Canadian Census

- Computer Science
- 2000

Using data from the Canadian census of 1901, the influences on bilingualism in Canada at beginning of the last century are discovered and a semantic measure of similarity between trees is proposed to limit the changes made. Expand

Pruning Decision Trees and Lists

- Computer Science
- 2000

This thesis presents pruning algorithms for decision trees and lists that are based on significance tests and explains why pruning is often necessary to obtain small and accurate models and shows that the performance of standard pruned algorithms can be improved by taking the statistical significance of observations into account. Expand

Linearity, Nonlinearity, and the Competing Constructions of Social Hierarchy in Early Twentieth-Century Canada: The Question of Language in 1901

- Sociology
- 2000

ince the 1960s, scholars have increasingly emphasized the ways in which routinely generated sources such as the census should be understood as creations of a “quantitative mentality” or “statistical… Expand

Induction over the unexplained: Using overly-general domain theories to aid concept learning

- Machine Learning
- 2004

This paper describes and evaluates an approach to combining empirical and explanation-based learning calledInduction Over the Unexplained (IOU). IOU is intended for learning concepts that can be… Expand

Knowledge Discovery Through Induction with Randomization Testing

- Computer Science
- 1991

design IRT embodies a view of induction as a four-phase process (shown in Figure 1). The process alters a current model by generating a group of new competitor models, fitting those competitor models… Expand

What if there were no significance tests

- Psychology
- 1997

Contents: Preface. Part I: Overview. L.L. Harlow, Significance Testing Introduction and Overview. Part II: The Debate: Against and For Significance Testing. J.Cohen, The Earth Is Round. F.L. Schmidt,… Expand

Tree Induction Vs Logistic Regression: A Learning Curve Analysis

- Mathematics, Computer Science
- J. Mach. Learn. Res.
- 2003

A large-scale experimental comparison of logistic regression and tree induction is presented, assessing classification accuracy and the quality of rankings based on class-membership probabilities, and a learning-curve analysis is used to examine the relationship of these measures to the size of the training set. Expand

The Effects of Training Set Size on Decision Tree Complexity

- Computer Science
- ICML
- 1997

This paper presents experiments with 19 datasets and 5 decision tree pruning algorithms that show that increasing training set size often results in a linear increase in tree size, even when that… Expand

Tree Induction for Probability-Based Ranking

- Mathematics, Computer Science
- Machine Learning
- 2004

It is concluded that PETs, with these simple modifications, should be considered when rankings based on class-membership probability are required, and is shown that using a simple, common smoothing method—the Laplace correction—uniformly improves probability-based rankings. Expand

C4.5: Programs for Machine Learning

- Computer Science
- 1992

A complete guide to the C4.5 system as implemented in C for the UNIX environment, which starts from simple core learning methods and shows how they can be elaborated and extended to deal with typical problems such as missing data and over hitting. Expand