Kimi | 撰写高效计算机架构摘要指南

Abstract Writing Guideline Abstract Because on-line search databases typically contain only abstracts, it is vital to write a complete but concise description of your work to entice potential readers into obtaining a copy of the full paper. This article describes how to write a good computer architecture abstract for both conference and journal papers. Writers should follow a checklist consisting of: motivation, problem statement, approach, results, and conclusions. Following this checklist should increase the chance of people taking the time to obtain and read your complete paper. Introduction Now that the use of on-line publication databases is prevalent, writing a really good abstract has become even more important than it was a decade ago. Abstracts have always served the function of "selling" your work. But now, instead of merely convincing the reader to keep reading the rest of the attached paper, an abstract must convince the reader to leave the comfort of an office and go hunt down a copy of the article from a library (or worse, obtain one after a long wait through inter-library loan). In a business context, an "executive summary" is often the only piece of a report read by the people who matter; and it should be similar in content if not tone to a journal paper abstract. Checklist: Parts of an Abstract Despite the fact that an abstract is quite brief, it must do almost as much work as the multi-page paper that follows it. In a computer architecture paper, this means that it should in most cases include the following sections. Each section is typically a single sentence, although there is room for creativity. In particular, the parts may be merged or spread among a set of sentences. Use the following as a checklist for your next abstract: Motivation: Why do we care about the problem and the results? If the problem isn't obviously "interesting" it might be better to put motivation first; but if your work is incremental progress on a problem that is widely recognized as important, then it is probably better to put the problem statement first to indicate which piece of the larger problem you are breaking off to work on. This section should include the importance of your work, the difficulty of the area, and the impact it might have if successful. Problem statement: What problem are you trying to solve? What is the scope of your work (a generalized approach, or for a specific situation)? Be careful not to use too much jargon. In some cases it is appropriate to put the problem statement before the motivation, but usually this only works if most readers already understand why the problem is important. Approach: How did you go about solving or making progress on the problem? Did you use simulation, analytic models, prototype construction, or analysis of field data for an actual product? What was the extent of your work (did you look at one application program or a hundred programs in twenty different programming languages?) What important variables did you control, ignore, or measure? Results: What's the answer? Specifically, most good computer architecture papers conclude that something is so many percent faster, cheaper, smaller, or otherwise better than something else. Put the result there, in numbers. Avoid vague, hand-waving results such as "very", "small", or "significant." If you must be vague, you are only given license to do so when you can talk about orders-of-magnitude improvement. There is a tension here in that you should not provide numbers that can be easily misinterpreted, but on the other hand you don't have room for all the caveats. Conclusions: What are the implications of your answer? Is it going to change the world (unlikely), be a significant "win", be a nice hack, or simply serve as a road sign indicating that this path is a waste of time (all of the previous results are useful). Are your results general, potentially generalizable, or specific to a particular case? Other Considerations An abstract must be a fully self-contained, capsule description of the paper. It can't assume (or attempt to provoke) the reader into flipping through looking for an explanation of what is meant by some vague statement. It must make sense all by itself. Some points to consider include: Meet the word count limitation. If your abstract runs too long, either it will be rejected or someone will take a chainsaw to it to get it down to size. Your purposes will be better served by doing the difficult task of cutting yourself, rather than leaving it to someone else who might be more interested in meeting size restrictions than in representing your efforts in the best possible manner. An abstract word limit of 150 to 200 words is common. Any major restrictions or limitations on the results should be stated, if only by using "weasel-words" such as "might", "could", "may", and "seem". Think of a half-dozen search phrases and keywords that people looking for your work might use. Be sure that those exact phrases appear in your abstract, so that they will turn up at the top of a search result listing. Usually the context of a paper is set by the publication it appears in (for example, IEEE Computer magazine's articles are generally about computer technology). But, if your paper appears in a somewhat un-traditional venue, be sure to include in the problem statement the domain or topic area that it is really applicable to. Some publications request "keywords". These have two purposes. They are used to facilitate keyword index searches, which are greatly reduced in importance now that on-line abstract text searching is commonly used. However, they are also used to assign papers to review committees or editors, which can be extremely important to your fate. So make sure that the keywords you pick make assigning your paper to a review category obvious (for example, if there is a list of conference topics, use your chosen topic area as one of the keyword tuples). Conclusion Writing an efficient abstract is hard work, but will repay you with increased impact on the world by enticing people to read your publications. Make sure that all the components of a good abstract are included in the next one you write. --- Abstract Examples case 1: Deep Neural Networks (DNNs) are powerful models that have achieved excellent performance on difficult learning tasks. Although DNNs work well whenever large labeled training sets are available, they cannot be used to map sequences to sequences. In this paper, we present a general end-to-end approach to sequence learning that makes minimal assumptions on the sequence structure. Our method uses a multilayered Long Short-Term Memory (LSTM) to map the input sequence to a vector of a fixed dimensionality, and then another deep LSTM to decode the target sequence from the vector. Our main result is that on an English to French translation task from the WMT'14 dataset, the translations produced by the LSTM achieve a BLEU score of 34.8 on the entire test set, where the LSTM's BLEU score was penalized on out-of-vocabulary words. Additionally, the LSTM did not have difficulty on long sentences. For comparison, a phrase-based SMT system achieves a BLEU score of 33.3 on the same dataset. When we used the LSTM to rerank the 1000 hypotheses produced by the aforementioned SMT system, its BLEU score increases to 36.5, which is close to the previous best result on this task. The LSTM also learned sensible phrase and sentence representations that are sensitive to word order and are relatively invariant to the active and the passive voice. Finally, we found that reversing the order of the words in all source sentences (but not target sentences) improved the LSTM's performance markedly, because doing so introduced many short term dependencies between the source and the target sentence which made the optimization problem easier. case 2: Recent work has demonstrated substantial gains on many NLP tasks and benchmarks by pre-training on a large corpus of text followed by fine-tuning on a specific task. While typically task-agnostic in architecture, this method still requires task-specific fine-tuning datasets of thousands or tens of thousands of examples. By contrast, humans can generally perform a new language task from only a few examples or from simple instructions - something which current NLP systems still largely struggle to do. Here we show that scaling up language models greatly improves task-agnostic, few-shot performance, sometimes even reaching competitiveness with prior state-of-the-art fine-tuning approaches. Specifically, we train GPT-3, an autoregressive language model with 175 billion parameters, 10x more than any previous non-sparse language model, and test its performance in the few-shot setting. For all tasks, GPT-3 is applied without any gradient updates or fine-tuning, with tasks and few-shot demonstrations specified purely via text interaction with the model. GPT-3 achieves strong performance on many NLP datasets, including translation, question-answering, and cloze tasks, as well as several tasks that require on-the-fly reasoning or domain adaptation, such as unscrambling words, using a novel word in a sentence, or performing 3-digit arithmetic. At the same time, we also identify some datasets where GPT-3's few-shot learning still struggles, as well as some datasets where GPT-3 faces methodological issues related to training on large web corpora. Finally, we find that GPT-3 can generate samples of news articles which human evaluators have difficulty distinguishing from articles written by humans. We discuss broader societal impacts of this finding and of GPT-3 in general. --- Paper Introduction: Foundation models have emerged as a powerful tool in machine learning, demonstrating unprecedented performance across a wide variety of data distributions [72, 40, 41]. By pre-training on large and diverse datasets, these models learn representations that can serve as rich common-sense priors that complement task-specific data. We thus expect fine-tuning to enhance the generalization capabilities of foundation models. However, fine-tuning often degrades the performance of foundation models on out-of-distribution (OOD) data. This indicates that conventional fine-tuning strategies can fail to utilize the prior knowledge embedded in the foundation model. This issue of conventional fine-tuning distorting beneficial foundation model priors has driven recent research on developing robust fine-tuning methods. Such methods aim to produce an adapted model that achieves good performance under distribution shifts by preserving the prior knowledge embedded in the foundation model. Prior works have proposed various regularization techniques for this purpose, such as ensembling models before and after adaptation [92] or initially fitting only the last layer [50]. However, as these methods are primarily based on human intuition, they may not fully account for the complex interplay between the foundation model priors and the adaptation process. We introduce AutoFT, a novel method for robust fine-tuning that aims to find the right tradeoff between the prior and the fine-tuning data through hyperparameter optimization. Our key insight is that we can learn what characteristics of the foundation model to preserve during fine-tuning using a data-driven approach. Like existing robust fine-tuning methods, we fine-tune a foundation model on task-specific data, and then evaluate the resulting model on a set of OOD distributions. However, we additionally leverage a small OOD validation set with up to 1000 labeled examples from one unseen distribution; we optimize fine-tuning hyperparameters for post-adaptation performance on this OOD validation set. Importantly, the OOD validation set is only used for hyperparameter optimization, not fine-tuning, and does not follow the same distribution as the OOD test sets. We illustrate the intuition behind our approach in Figure 1 and our data assumptions in Figure 2. We make two key alterations to standard hyperparameter optimization, which we find to be critical for automatic robust fine-tuning. First, as mentioned above, we optimize hyperparameters with respect to an OOD validation set rather than an ID validation set. Second, we use a broader definition of “hyperparameter”: beyond the usual hyperparameters such as learning rate, we parameterize the fine-tuning objective itself through weight coefficients for several different loss functions and regularizers. This larger hyperparameter search space gives AutoFT more granular control over adaptation. We rigorously evaluate AutoFT on a wide array of real-world datasets and consider various types of distribution shifts, including subpopulation and domain shift. Our experiments show that our approach results in better generalization to unseen OOD data. With at most 1000 datapoints from an OOD distribution, AutoFT outperforms existing robust fine-tuning methods across all benchmarks. These gains in robustness are achieved with minimal additional compute, requiring at most 5 % more total compute compared to standard fine-tuning. Among other results, AutoFT achieves new state-of-the-art performance on the challenging iWildCam and FMoW benchmarks [9, 47, 16], outperforming the prior best methods by 6.0 % and 1.5 % , respectively. Please write a professional abstract based on above paper's instruction. Please follow above guideline to write, and the abstract should be only one paragraph and similar to above examples.

In the realm of machine learning, foundation models have revolutionized performance across diverse data distributions, yet conventional fine-tuning strategies often compromise their generalization capabilities on out-of-distribution (OOD) data. This paper introduces AutoFT, a novel robust fine-tuning method that leverages hyperparameter optimization to preserve the foundational model's prior knowledge during adaptation. By optimizing with respect to an OOD validation set and expanding the hyperparameter space to include weight coefficients for various loss functions and regularizers, AutoFT achieves superior generalization to unseen OOD data. Our rigorous evaluation on real-world datasets, including subpopulation and domain shifts, demonstrates that AutoFT outperforms existing robust fine-tuning methods across all benchmarks, with minimal additional computational overhead. Notably, AutoFT sets new state-of-the-art records on the iWildCam and FMoW benchmarks, surpassing previous best methods by 6.0% and 1.5% respectively, marking a significant advancement in the field of machine learning.