Classical linguistic research methods
The earliest classical linguistic techniques for studying languages were versions of the comparative method. The comparative method attempts to discover the linguistic relatedness of languages by assembling word lists of the languages you wish to compare, and trying to establish which words are cognates of each other (words of common descent). From the changes observed between cognates, you try to construct a proto-language, and from that, a family-tree of the language family's evolution and relationships.
This comparative method was first successfully used to construct a useful proto-language in 1819 by Jacob Grimm and his colleague Rasmus Rusk; and they attempted reconstructing Proto-Indo-European (PIE) from the differences in Germanic languages and non-Germanic languages. This research led to the creation of Grimm's Law, a series of 3 statements describing how voiced stops in PIE became voiceless in proto-Germanic (PG); how voiceless stops in PIE became fricatives in PG and how aspirated stops in PIE changed to non-aspirated stops (or fricatives) \cite{GRIMM}. While these laws are not without exception, they were the first a major turning point in the field of linguistics, and one of the first examples of a sound law describing how phonemes evolve.
The comparative method is intrinsically genealogical, and many modern day linguists consider the tree model implied as overly simplistic, and express doubts that biological comparison is misleading \cite{Fran_ois}. A tree model implies a series of distinct nodes, which in turn implies sudden and irrevocable change in a population preventing further contact ie many different proto-languages coexisting without further interaction. This idea is clearly misleading in areas of continuous landmass \cite{Argunova_1994}. Distinct nodes also imply linguistic uniformity within a proto-language, despite the existence of dialects within even small language communities (However the real life implications of this simplification are doubted) \cite{campbell2004}. Furthermore critics of the tree model claim it ignores situations where many dialects within a language evolve into distinct languages, over a long period of time where innovations are shared (known as areal diffusion). Populations bordering others borrow their neighbours linguistic features and tend to share similar cultures\cite{P_iankova_1996} - influencing linguistic change in a shared direction, this is known as linkage. However in island populations, separated by vast distances and only limited contact with each other, these issues are largely mitigated.
While the comparative method can imply relatedness, early comparative work was flawed by an emphasis of quantity of words that should be collected from each language. The demand for more words led to poor datasets, incorrect cognate pairings and due to cultural difference, sometimes a lack of appropriate cognates \cite{Ray1911,Elbert1953}. This problem with linguistics was tackled in 1952 courtesy of Morris Swadesh, with his eponymous Swadesh list. The list comprised of 215 basic words that he believed would be common to every language, regardless of culture. Swadesh's belief that these words were non-cultural led him to the idea that they would evolve at the same rate regardless of language or population, due to their universal importance in communication. Originally, the universality of the words was purely based on his own opinion \cite{Makihara_2006}, however later revisions narrow down the list to just 100 words, and would comprise of 4 main types of words: Basic nouns (including familial terms, objects found in nature and anatomical parts), basic verbs (including basic actions and reactions), basic adjectives (colours & relative temperatures) and pronouns (you, I, they, etc.) \cite{Heggarty_2012}. Over time, the adoption of the Swadesh list marked a change in linguistics, emphasising the importance of a smaller selection of high quality data, which in turn was more likely to be etymologically homologous.
While the comparative method and the Swadesh list can imply relatedness, on their own they don't offer a way to determine absolute branch lengths or the age of a language family. However Swadesh's idea of a constant rate of word evolution of his "basic vocabulary items" led to the development of glottochronolgy - a field of linguistics which attempts to date the most recent common ancestor of a pair of languages. The first attempt to accurately date a language split was by Swadesh himself in the paper Salish Internal Relationships, and quantified the elapsed time since a language split with the following equation:
\(i\ =\ \frac{\log\ \left(c\right)}{2\cdot\log\left(r\right)}\)
Where c is the percentage of shared cognates, i is the indicated period of depth in time (an arbitrary unit) and r is the percentage of basic vocabulary retained after 1 period. r was estimated in \cite{Swadesh1950} as 0.85 by comparing contemporary English to old English (a time difference of roughly 1000 years). This constant was formalised and re-estimated in 1953 by Robert Lees in The Basis of Glottochonrology \cite{Lees_1953}. Now coined the glottochronological constant, Lees re-estimated the constant by using a 1000 year comparison between 13 languages and generating a mean of the result: 0.80484±.0176 (with a 90% certainty). i was then replaced by t, now indicating a 1000 year time difference, not an arbitrary unit.
\(t\ =\ \frac{\log\left(c\right)}{2\log\left(r\right)}\)
The field of glottochronology was contentious since its birth, and many anthropologists doubt the effectiveness of the field in absolute dating, however glottochronology has been showed to be effective at establishing chronology of migratory events, especially on isolated populations. \cite{Lees_1956,Hirsch_1954}
History of the study of of Polynesian languages
Polynesia is a section of Oceania, consisting of over 1,000 islands, forming a triangle stretching from New Zealand to Hawaii to Easter Island, known to be colonised by humans for hundreds of years. Since their discovery to the western world, the question was posed how they arrived there. Early hypotheses suggested a migration from South America, and while some modern genetic evidence suggests limited peopling of Easter Island by Native Americans \cite{Thorsby_2016}, it's clear today that Polynesian migration started from the west. Historically it was the naturalist Joseph Banks who began to discover the breadth of the language family Polynesian languages belong to. He recorded vocabulary used from Polynesia, Micronesia and South East Asia \cite{banks1790}, and was able to draw a linguistic relationship by directly comparing Polynesian languages, which showed no clear difference, and then applying this comparative method to the examination of Polynesian and Micronesian languages, and follow the relationship back to South East Asia. This relationship was discovered to have spread to Madagascar, forming what we now know as the Austronesian language family. Given the discovery of the widest spread language family of all time, linguists tried to extrapolate the path of migration of these people using known data.
One of the earliest attempts at this migratory plot was by William Churchill, who compared Melanesian and Polynesian languages to test the two prevailing theories of Polynesian origin at the time, and to attempt to plot migratory routes through East Asia \cite{Churchill1911} (See Figure 1.). The sieve theory posited that the islands of Melanesia and west Polynesia acted like meshes in a sieve, catching seafarers originating from central Polynesia who were blown westwards by the strong prevailing winds (known today as the "roaring forties") . This theory also posited that the initial peopling of central Polynesia occurred by seafarers from East Malaysia, who travelled with the current north of New Guinea, through the Marshall Islands to Samoa and Fiji. The migration theory on the other hand suggested that the migration began in India, and they journeyed eastwards through the Malay Archipelago, travelling slowly (and suggests of many generations between migrations), they eventually reached Fiji, and settled there for generations before voyaging again. Churchill tested these theories by comparing cognate lists of 18 languages, and comparing word differences and language differences, he discovered that the linguistic features of Melanesian islands share many deep commonalities with the Polynesian languages, while Marshallese (which according to the sieve theory will have been one of the oldest Polynesian languages) lacks many of these features, and has many more unique features. This was the first major linguistic work showing the migration to Polynesia occurred through Melanesia, not Micronesia. He also suggested the migration occurred in two "swarms", the first travelled north of Papa New Guinea, between the Bismarck Archipelago and Solomon Islands until settling in Samoa; while the second "swarm" travelled south of New Guinea, through the Torres Straits and settled in Fiji (used interchangably with Viti, as Viti is the largest island of what we know call Fiji). His work however, even by the standards of the time was very flawed, as pointed out in a review of his book in Nature: Churchill compared languages with vastly different lengths of word lists (of varying quality), leading to false conclusions \cite{Ray1911}. Later linguists took issue with Churchill's work as there was no attempt was at proto-language reconstructions and Churchill was notoriously poor at identifying cognates and was often fooled by analogous terminology and performed no phoneme-grapheme correspondences (letter-sound correspondences) \cite{Elbert_1953}. All that being said, Churchill's work was an enormous step forward for linguistic based migratory study into Polynesia.