About: Tilde is a research topic. Over the lifetime, 7 publications have been published within this topic receiving 76 citations. The topic is also known as: ~ & ~.
TL;DR: A Multilingual Open Data corpus for European languages that was built in scope of the MODEL project is described, and a summary of challenges and chosen solutions are described, too.
Abstract: This paper describes a Multilingual Open Data corpus for European languages that was built in scope of the MODEL project. We describe the approach chosen to select data sources, which data sources were used, how the source data was handled, what tools were used and what data was obtained in the result of the project. Obtained data quality is presented, and a summary of challenges and chosen solutions are described, too. This paper may serve as a guide and reference in case someone might try to do something similar, as well as a guide to the new open data obtained.
TL;DR: The authors examines some lexical residues of british colonialism in a language spoken in the western province of Cameroon, west africa, which is a dialect within the indigenous language Gh[open o]màlà, which was spoken in four villages in the province of cameroon: bahouan, baham, bayangam and bandjoun.
Abstract: this study examines some lexical residues of british colonialism in a language spoken in the western province of cameroon, west africa. the language is w$\\tilde{e}$, a dialect within the indigenous language gh[open o]màlà, which is spoken in four villages in the western province of cameroon: bahouan, baham, bayangam and bandjoun. the paper also seeks to answer the question: why are there so many english lexical items in this dialect, changed almost beyond recognition, reminiscent of the presence of the british themselves in cameroon, although this part of the country was formerly under french rule.
TL;DR: This work presents the use of two Inductive Logic Programming (ILP) techniques to construct rules for extracting instances of various named entity classes thereby reducing the efforts of a linguist/developer.
Abstract: Developing linguistically sound and data-compliant rules for named entity annotation is usually an intensive and time consuming process for any developer or linguist. In this work, we present the use of two Inductive Logic Programming (ILP) techniques to construct rules for extracting instances of various named entity classes thereby reducing the efforts of a linguist/developer. Using ILP for rule development not only reduces the amount of effort required but also provides an interactive framework wherein a linguist can incorporate his intuition about named entities such as in form of mode declarations for refinements (suitably exposed for ease of use by the linguist) and the background knowledge (in the form of linguistic resources). We have a small amount of tagged data - approximately 3884 sentences for Marathi and 22748 sentences in Hindi. The paucity of tagged data for Indian languages makes manual development of rules more challenging, However, the ability to fold in background knowledge and domain expertise in ILP techniques comes to our rescue and we have been able to develop rules that are mostly linguistically sound that yield results comparable to rules handcrafted by linguists. The ILP approach has two advantages over the approach of hand-crafting all rules: (i) the development time reduces by a factor of 240 when ILP is used instead of involving a linguist for the entire rule development and (ii) the ILP technique has the computational edge that it has a complete and consistent view of all significant patterns in the data at the level of abstraction specified through the mode declarations. The point (ii) enables the discovery of rules that could be missed by the linguist and also makes it possible to scale the rule development to a larger training dataset. The rules thus developed could be optionally edited by linguistic experts and consolidated either (a) through default ordering (as in TILDE[1]) or (b) with an ordering induced using [2] or (c) by using the rules as features in a statistical graphical model such a conditional random field (CRF) [3]. We report results using WARMR [4] and TILDE to learn rules for named entities of Indian languages namely Hindi and Marathi.
TL;DR: The best known running time for this problem is in O(P 1/4 ) time, where P is the total processing time of all n jobs in the input as mentioned in this paper .
Abstract: This paper is concerned with the \(1|| \sum p_j U_j\) problem, the problem of minimizing the total processing time of tardy jobs on a single machine. This is not only a fundamental scheduling problem, but also an important problem from a theoretical point of view as it generalizes the Subset Sum problem and is closely related to the 0/1-Knapsack problem. The problem is well-known to be NP-hard, but only in a weak sense, meaning it admits pseudo-polynomial time algorithms. The best known running time follows from the famous Lawler and Moore algorithm that solves a more general weighted version in \(O(P \cdot n)\) time, where P is the total processing time of all n jobs in the input. This algorithm has been developed in the late 60s, and has yet to be improved to date. In this paper we develop two new algorithms for problem, each improving on Lawler and Moore’s algorithm in a different scenario. Our first algorithm runs in \({\tilde{O}}(P^{7/4})\) time, and outperforms Lawler and Moore’s algorithm in instances where \(n={\tilde{\omega }}(P^{3/4})\). Our second algorithm runs in \({\tilde{O}}(\min \{P \cdot D_{\#}, P + D\})\) time, where \(D_{\#}\) is the number of different due dates in the instance, and D is the sum of all different due dates. This algorithm improves on Lawler and Moore’s algorithm when \(n={\tilde{\omega }}(D_{\#})\) or \(n={\tilde{\omega }}(D/P)\). Further, it extends the known \({\tilde{O}}(P)\) algorithm for the single due date special case of \(1||\sum p_jU_j\) in a natural way. Both algorithms rely on basic primitive operations between sets of integers and vectors of integers for the speedup in their running times. The second algorithm relies on fast polynomial multiplication as its main engine, and can be easily extended to the case of a fixed number of machines. For the first algorithm we define a new “skewed” version of \((\max ,\min )\)-Convolution which is interesting in its own right.
TL;DR: In this paper , two learning algorithms, CO-UCB and CO-AAE, were proposed to solve the same instance of a K-armed stochastic bandit problem.
Abstract: This paper tackles a multi-agent bandit setting where M agents cooperate together to solve the same instance of a K-armed stochastic bandit problem. The agents are heterogeneous: each agent has limited access to a local subset of arms and the agents are asynchronous with different gaps between decision-making rounds. The goal for each agent is to find its optimal local arm, and agents can cooperate by sharing their observations with others. While cooperation between agents improves the performance of learning, it comes with an additional complexity of communication between agents. For this heterogeneous multi-agent setting, we propose two learning algorithms, CO-UCB and CO-AAE. We prove that both algorithms achieve order-optimal regret, which is $O\left({{\sum _{i:{{\bar \Delta }_i} > 0}}\log T/{{\tilde \Delta }_i}}\right)$, where ${\tilde \Delta _i}$ is the minimum suboptimality gap between the reward mean of arm i and any local optimal arm. In addition, a careful selection of the valuable information for cooperation, CO-AAE achieves a low communication complexity of O(log T). Last, numerical experiments verify the efficiency of both algorithms.